From: "Michal Jankovič" <michal.jankovic59@gmail.com>
To: Iain Sandoe <iain@sandoe.co.uk>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH] c++: coroutines - Overlap variables in frame [PR105989]
Date: Sun, 14 May 2023 18:31:59 +0200 [thread overview]
Message-ID: <D8279CD3-952A-44BE-A730-2464822BE102@getmailspring.com> (raw)
In-Reply-To: <4229A144-BA40-4653-A37D-171E28EAD6FF@sandoe.co.uk>
Hi Iain,
I do not currently have metrics for this, but I can look into generating
them, however I currently do not know of any large open-source projects
using coroutines that I could use for this; I was thinking about using
cppcoro unit tests, but they mostly contain very simple coroutines. I
have source for ~20k LoC proprietary project relying heavily on
coroutines (using boost::asio::awaitable), but here I cannot show the
source along with the numbers - would this be enough or should I look
for more open source projects?
thanks,
Michal
On May 14 2023, at 6:07 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
> Hi Michal,
>
>> On 14 May 2023, at 16:36, Michal Jankovič
>> <michal.jankovic59@gmail.com> wrote:
>>
>> Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested
>> again on x86_64-pc-linux-gnu, only difference is the new test failing
>> without the patch.
>
> (as previously noted, I am much in favour of this optimisation)
>
> Do you have any metrics on the reductions in frame size for realistic coroutines?
>
> thanks
> Iain
>
>>
>> On Jul 13 2022, at 2:54 pm, Michal Jankovic
>> <michal.jankovic59@gmail.com> wrote:
>>
>>> Hi Iain,
>>>
>>> thanks for the info. I have some follow-up questions.
>>>
>>> On Jul 12 2022, at 7:11 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>
>>>> Hi Michal,
>>>>
>>>>> On 12 Jul 2022, at 16:14, Michal Jankovič
>>>>> <michal.jankovic59@gmail.com> wrote:
>>>>
>>>>> One other related thing I would like to investigate is reducing the
>>>>> number of compiler generated variables in the frame, particularly
>>>>> _Coro_destroy_fn and _Coro_self_handle.
>>>>>
>>>>> As I understand it, _Coro_destroy_fn just sets a flag in
>>>>> _Coro_resume_index and calls _Coro_resume_fn; it should be
>>>>> possible to
>>>>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn
>>>>> is stored in the frame;
>>>>
>>>> That is a particular point about GCC’s implementation … (it is not
>>>> neccesarily, or even
>>>> likely to be the same for other implementations) - see below.
>>>>
>>>> I was intending to do experiment with making the ramp/resume/destroy
>>>> value a parameter
>>>> to the actor function so that we would have something like -
>>>>
>>>> ramp calls actor(frame, 0)
>>>> resume calls actor(frame, 1)
>>>> destroy calls actor(frame, 2)
>>>> - the token values are illustrative, not intended to be a final version.
>>>>
>>>> I think that should allow for more inlining opportunites and possibly
>>>> a way forward to
>>>> frame elision (a.k.a halo).
>>>>
>>>>> this would however change the coroutine ABI - I don't know if that's
>>>>> a problem.
>>>>
>>>> The external ABI for the coroutine is the
>>>> resume,
>>>> destroy pointers
>>>> and the promise
>>>> and that one can find each of these from the frame pointer.
>>>>
>>>> This was agreed between the interested “vendors” so that one compiler
>>>> could invoke
>>>> coroutines built by another. So I do not think this is so much a
>>>> useful area to explore.
>>>>
>>>
>>> I understand. I still want to try to implement a more light-weight frame
>>> layout with just one function pointer; would it be possible to merge
>>> such a change if it was made opt-in via a compiler flag, eg
>>> `-fsmall-coroutine-frame`? My use-case for this is embedded environments
>>> with very limited memory, and I do not care about interoperability with
>>> other compilers there.
>>>
>>>> Also the intent is that an indirect call through the frame pointer is
>>>> the most frequent
>>>> operation so should be the most efficient.
>>>> resume() might be called many times,
>>>> destroy() just once thus it is a cold code path
>>>> - space can be important too - but interoperability was the goal here.
>>>>
>>>>> The _Coro_self_handle should be constructible on-demand from the
>>>>> frame address.
>>>>
>>>> Yes, and in the header the relevant items are all constexpr - so that
>>>> should happen in the
>>>> user’s code. I elected to have that value in the frame to avoid
>>>> recreating it each time - I
>>>> suppose that is a trade-off of one oiptimisation c.f. another …
>>>
>>> If the handle construction cannot be optimized out, and its thus
>>> a tradeoff between frame size and number of instructions, then this
>>> could also be enabled by a hypothetical `-fsmall-coroutine-frame`.
>>>
>>> Coming back to this:
>>>
>>>>>> (the other related optimisation is to eliminate frame entries for
>>>>>> scopes without any suspend
>>>>>> points - which has the potential to save even more space for code with
>>>>>> sparse use of co_xxxx)
>>>
>>> This would be nice; although it could encompassed by a more general
>>>
>>> optimization - eliminate frame entries for all variables which are not
>>>
>>> accessed (directly or via pointer / reference) beyond a suspend point.
>>> To be fair, I do not know how to get started on such an optimization,
>>> or if it is even possible to do on the frontend. This would however be
>>> immensely useful for reducing the frame size taken-up by complicated
>>> co_await expressions (among other things), for example, if I have a
>>> composed operation:
>>>
>>> co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2());
>>>
>>> Right now, this creates space in the frame for the temporary 'leaf'
>>>
>>> awaitables, which were already moved into the composed awaitable.
>>> If the awaitable has an operator co_await that returns the real awaiter,
>>> the original awaitable is also stored in the frame, even if it
>>> is not referenced by the awaiter; another unused object gets stored if
>>>
>>> the .await_transform() customization point was used.
>>>
>>> What are your thoughts on the feasibility / difficulty of implementing
>>> such an optimization?
>>>
>>> Michal
>>>
>>>>>
>>>>> Do you have any advice / opinions on this before I try to
>>>>> implement it?
>>>>
>>>> Hopefully, the notes above help.
>>>>
>>>> I will rebase my latest code changes as soon as I have a chance and
>>>> put them somewhere
>>>> for you to look at - basically, these are to try and address the
>>>> correctness issues we face,
>>>>
>>>> Iain
>>>>
>>>>
>>>>>
>>>>> Michal
>>>>>
>>>>> On Jul 12 2022, at 4:08 pm, Iain Sandoe <iain@sandoe.co.uk> wrote:
>>>>>
>>>>>> Hi Michal,
>>>>>>
>>>>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches
>>>>>>> <gcc-patches@gcc.gnu.org> wrote:
>>>>>>>
>>>>>>> Currently, coroutine frames store all variables of a coroutine separately,
>>>>>>> even if their lifetime does not overlap (they are in distinct
>>>>>>> scopes). This
>>>>>>> patch implements overlapping distinct variable scopes in the
>>>>>>> coroutine frame,
>>>>>>> by storing the frame fields in nested unions of structs. This lowers
>>>>>>> the size
>>>>>>> of the frame for larger coroutines significantly, and makes them
>>>>>>> more usable
>>>>>>> on systems with limited memory.
>>>>>>
>>>>>> not a review (I will try to take a look at the weekend).
>>>>>>
>>>>>> but … this is one of the two main optimisations on my TODO - so cool
>>>>>> for doing it.
>>>>>>
>>>>>> (the other related optimisation is to eliminate frame entries for
>>>>>> scopes without any suspend
>>>>>> points - which has the potential to save even more space for code with
>>>>>> sparse use of co_xxxx)
>>>>>>
>>>>>> Iain
>>>>>>
>>>>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new
>>>>>>> test fails
>>>>>>> before the patch and succeeds after with no regressions.
>>>>>>>
>>>>>>> PR c++/105989
>>>>>>>
>>>>>>> gcc/cp/ChangeLog:
>>>>>>>
>>>>>>> * coroutines.cc (struct local_var_info): Add field_access_path.
>>>>>>> (build_local_var_frame_access_expr): New.
>>>>>>> (transform_local_var_uses): Use build_local_var_frame_access_expr.
>>>>>>> (coro_make_frame_entry_id): New.
>>>>>>> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id.
>>>>>>> (struct local_vars_frame_data): Add orig, field_access_path.
>>>>>>> (register_local_var_uses): Generate new frame layout. Create access
>>>>>>> paths to vars.
>>>>>>> (morph_fn_to_coro): Set new fields in local_vars_frame_data.
>>>>>>>
>>>>>>> gcc/testsuite/ChangeLog:
>>>>>>>
>>>>>>> * g++.dg/coroutines/pr105989.C: New test.
>>>>>>>
>>>>>>> <pr105989.patch>
>>>>>>
>>>>>>
>>>>
>>>>
>> <pr105989.patch>
>
>
next prev parent reply other threads:[~2023-05-14 16:32 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-07-12 13:35 Michal Jankovič
2022-07-12 14:08 ` Iain Sandoe
2022-07-12 15:14 ` Michal Jankovič
2022-07-12 17:11 ` Iain Sandoe
2022-07-13 12:54 ` Michal Jankovic
2023-05-14 15:36 ` Michal Jankovič
2023-05-14 16:07 ` Iain Sandoe
2023-05-14 16:31 ` Michal Jankovič [this message]
2023-05-14 18:51 ` Iain Sandoe
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=D8279CD3-952A-44BE-A730-2464822BE102@getmailspring.com \
--to=michal.jankovic59@gmail.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=iain@sandoe.co.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).