public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Iain Sandoe <iain@sandoe.co.uk>
To: "Michal Jankovič" <michal.jankovic59@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Date: Sun, 14 May 2023 19:51:18 +0100	[thread overview]
Message-ID: <0C0E1653-A64F-4F9F-8540-C53671DDB175@sandoe.co.uk> (raw)
In-Reply-To: <D8279CD3-952A-44BE-A730-2464822BE102@getmailspring.com>

Hi Michal,

> On 14 May 2023, at 17:31, Michal Jankovič via Gcc-patches <gcc-patches@gcc.gnu.org> wrote:

> I do not currently have metrics for this, but I can look into generating
> them,

To be clear, this is not in any way a precondition for patch acceptance
but I am curious as to some idea of the improvements seen.

>  however
> I currently do not know of any large open-source projects
> using coroutines that I could use for this; I was thinking about using
> cppcoro unit tests, but they mostly contain very simple coroutines.

Even cppcoro would be some idea.

Larger OSS projects  I know of are folly (which I have out of tree patches
to enable the coroutines tests) and seastar (which I have no idea how to
build at present).

If you are interested to try building folly (note: it has a lot of prereqs) then
I can push my WIP branch somewhere.

> I
> have source for ~20k LoC proprietary project relying heavily on
> coroutines (using boost::asio::awaitable), but here I cannot show the
> source along with the numbers - would this be enough or should I look
> for more open source projects?

The internal project figures are also useful (IMO) even tho we cannot
obviously repeat them.

thanks
Iain

> 
> thanks,
> Michal
> 
> On May 14 2023, at 6:07 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
> 
>> Hi Michal,
>> 
>>> On 14 May 2023, at 16:36, Michal Jankovič
>>> <michal.jankovic59@gmail.com> wrote:
>>> 
>>> Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested
>>> again on x86_64-pc-linux-gnu, only difference is the new test failing
>>> without the patch.
>> 
>> (as previously noted, I am much in favour of this optimisation)
>> 
>> Do you have any metrics on the reductions in frame size for realistic coroutines?
>> 
>> thanks
>> Iain
>> 
>>> 
>>> On Jul 13 2022, at 2:54 pm, Michal Jankovic
>>> <michal.jankovic59@gmail.com> wrote:
>>> 
>>>> Hi Iain,
>>>> 
>>>> thanks for the info. I have some follow-up questions.
>>>> 
>>>> On Jul 12 2022, at 7:11 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>> 
>>>>> Hi Michal,
>>>>> 
>>>>>> On 12 Jul 2022, at 16:14, Michal Jankovič
>>>>>> <michal.jankovic59@gmail.com> wrote:
>>>>> 
>>>>>> One other related thing I would like to investigate is reducing the
>>>>>> number of compiler generated variables in the frame, particularly
>>>>>> _Coro_destroy_fn and _Coro_self_handle.     
>>>>>> 
>>>>>> As I understand it, _Coro_destroy_fn just sets a flag in
>>>>>> _Coro_resume_index and calls _Coro_resume_fn; it should be
>>>>>> possible to
>>>>>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>>>>>> is stored in the frame;
>>>>> 
>>>>> That is a particular point about GCC’s implementation … (it is not
>>>>> neccesarily, or even
>>>>> likely to be the same for other implementations) - see below.
>>>>> 
>>>>> I was intending to do experiment with making the ramp/resume/destroy
>>>>> value a parameter
>>>>> to the actor function so that we would have something like -
>>>>> 
>>>>> ramp calls      actor(frame, 0)
>>>>> resume calls  actor(frame, 1)
>>>>> destroy calls  actor(frame, 2)    
>>>>> - the token values are illustrative, not intended to be a final version.
>>>>> 
>>>>> I think that should allow for more inlining opportunites and possibly
>>>>> a way forward to
>>>>> frame elision (a.k.a halo).
>>>>> 
>>>>>> this would however change the coroutine ABI - I don't know if that's
>>>>>> a problem.
>>>>> 
>>>>> The external ABI for the coroutine is the    
>>>>> resume,
>>>>> destroy pointers    
>>>>> and the promise    
>>>>> and that one can find each of these from the frame pointer.
>>>>> 
>>>>> This was agreed between the interested “vendors” so that one compiler
>>>>> could invoke
>>>>> coroutines built by another.  So I do not think this is so much a
>>>>> useful area to explore.
>>>>> 
>>>> 
>>>> I understand. I still want to try to implement a more light-weight frame
>>>> layout with just one function pointer; would it be possible to merge
>>>> such a change if it was made opt-in via a compiler flag, eg
>>>> `-fsmall-coroutine-frame`? My use-case for this is embedded environments
>>>> with very limited memory, and I do not care about interoperability with
>>>> other compilers there.    
>>>> 
>>>>> Also the intent is that an indirect call through the frame pointer is
>>>>> the most frequent
>>>>> operation so should be the most efficient.     
>>>>> resume() might be called many times,    
>>>>> destroy() just once thus it is a cold code path    
>>>>> - space can be important too - but interoperability was the goal here.
>>>>> 
>>>>>> The _Coro_self_handle should be constructible on-demand from the
>>>>>> frame address.
>>>>> 
>>>>> Yes, and in the header the relevant items are all constexpr - so that
>>>>> should happen in the
>>>>> user’s code.  I elected to have that value in the frame to avoid
>>>>> recreating it each time - I
>>>>> suppose that is a trade-off of one oiptimisation c.f. another …    
>>>> 
>>>> If the handle construction cannot be optimized out, and its thus    
>>>> a tradeoff between frame size and number of instructions, then this
>>>> could also be enabled by a hypothetical `-fsmall-coroutine-frame`.
>>>> 
>>>> Coming back to this:
>>>> 
>>>>>>> (the other related optimisation is to eliminate frame entries for
>>>>>>> scopes without any suspend
>>>>>>> points - which has the potential to save even more space for code with
>>>>>>> sparse use of co_xxxx)
>>>> 
>>>> This would be nice; although it could encompassed by a more general  
>>>> 
>>>> optimization - eliminate frame entries for all variables which are not
>>>> 
>>>> accessed (directly or via pointer / reference) beyond a suspend point.
>>>> To be fair, I do not know how to get started on such an optimization,
>>>> or if it is even possible to do on the frontend. This would however be
>>>> immensely useful for reducing the frame size taken-up by complicated
>>>> co_await expressions (among other things), for example, if I have a
>>>> composed operation:
>>>> 
>>>> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());
>>>> 
>>>> Right now, this creates space in the frame for the temporary 'leaf'  
>>>> 
>>>> awaitables, which were already moved into the composed awaitable.
>>>> If the awaitable has an operator co_await that returns the real awaiter,
>>>> the original awaitable is also stored in the frame, even if it    
>>>> is not referenced by the awaiter; another unused object gets stored if
>>>> 
>>>> the .await_transform() customization point was used.
>>>> 
>>>> What are your thoughts on the feasibility / difficulty of implementing
>>>> such an optimization?
>>>> 
>>>> Michal
>>>> 
>>>>>> 
>>>>>> Do you have any advice / opinions on this before I try to
>>>>>> implement it?
>>>>> 
>>>>> Hopefully, the notes above help.
>>>>> 
>>>>> I will rebase my latest code changes as soon as I have a chance and
>>>>> put them somewhere
>>>>> for you to look at - basically, these are to try and address the
>>>>> correctness issues we face,
>>>>> 
>>>>> Iain
>>>>> 
>>>>> 
>>>>>> 
>>>>>> Michal
>>>>>> 
>>>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>>>> 
>>>>>>> Hi Michal,
>>>>>>> 
>>>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>> 
>>>>>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>>>>>> even if their lifetime does not overlap (they are in distinct
>>>>>>>> scopes). This
>>>>>>>> patch implements overlapping distinct variable scopes in the
>>>>>>>> coroutine frame,
>>>>>>>> by storing the frame fields in nested unions of structs. This lowers
>>>>>>>> the size
>>>>>>>> of the frame for larger coroutines significantly, and makes them
>>>>>>>> more usable
>>>>>>>> on systems with limited memory.
>>>>>>> 
>>>>>>> not a review (I will try to take a look at the weekend).
>>>>>>> 
>>>>>>> but … this is one of the two main optimisations on my TODO - so cool
>>>>>>> for doing it.
>>>>>>> 
>>>>>>> (the other related optimisation is to eliminate frame entries for
>>>>>>> scopes without any suspend
>>>>>>> points - which has the potential to save even more space for code with
>>>>>>> sparse use of co_xxxx)
>>>>>>> 
>>>>>>> Iain
>>>>>>> 
>>>>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new
>>>>>>>> test fails
>>>>>>>> before the patch and succeeds after with no regressions.
>>>>>>>> 
>>>>>>>> 	PR c++/105989
>>>>>>>> 
>>>>>>>> gcc/cp/ChangeLog:
>>>>>>>> 
>>>>>>>> 	* coroutines.cc (struct local_var_info): Add field_access_path.
>>>>>>>> 	(build_local_var_frame_access_expr): New.
>>>>>>>> 	(transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>>>>> 	(coro_make_frame_entry_id): New.
>>>>>>>> 	(coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>>>>> 	(struct local_vars_frame_data): Add orig, field_access_path.
>>>>>>>> 	(register_local_var_uses): Generate new frame layout. Create access
>>>>>>>> 	paths to vars.
>>>>>>>> 	(morph_fn_to_coro): Set new fields in local_vars_frame_data.     
>>>>>>>> 
>>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>> 
>>>>>>>> 	* g++.dg/coroutines/pr105989.C: New test.
>>>>>>>> 
>>>>>>>> <pr105989.patch>
>>>>>>> 
>>>>>>> 
>>>>> 
>>>>> 
>>> <pr105989.patch>
>> 
>> 


      reply	other threads:[~2023-05-14 18:51 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-12 13:35 Michal Jankovič
2022-07-12 14:08 ` Iain Sandoe
2022-07-12 15:14   ` Michal Jankovič
2022-07-12 17:11     ` Iain Sandoe
2022-07-13 12:54       ` Michal Jankovic
2023-05-14 15:36         ` Michal Jankovič
2023-05-14 16:07           ` Iain Sandoe
2023-05-14 16:31             ` Michal Jankovič
2023-05-14 18:51               ` Iain Sandoe [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0C0E1653-A64F-4F9F-8540-C53671DDB175@sandoe.co.uk \
    --to=iain@sandoe.co.uk \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=michal.jankovic59@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).