Rebased the patch to GCC 14 trunk. Bootstrapped and regression tested again on x86_64-pc-linux-gnu, only difference is the new test failing without the patch. On Jul 13 2022, at 2:54 pm, Michal Jankovic wrote: > Hi Iain, > > thanks for the info. I have some follow-up questions. > > On Jul 12 2022, at 7:11 pm, Iain Sandoe wrote: > >> Hi Michal, >> >>> On 12 Jul 2022, at 16:14, Michal Jankovič >>> wrote: >> >>> One other related thing I would like to investigate is reducing the >>> number of compiler generated variables in the frame, particularly >>> _Coro_destroy_fn and _Coro_self_handle. >>> >>> As I understand it, _Coro_destroy_fn just sets a flag in >>> _Coro_resume_index and calls _Coro_resume_fn; it should be possible to >>> move this logic to __builtin_coro_destroy, so that only _Coro_resume_fn >>> is stored in the frame; >> >> That is a particular point about GCC’s implementation … (it is not >> neccesarily, or even >> likely to be the same for other implementations) - see below. >> >> I was intending to do experiment with making the ramp/resume/destroy >> value a parameter >> to the actor function so that we would have something like - >> >> ramp calls actor(frame, 0) >> resume calls actor(frame, 1) >> destroy calls actor(frame, 2) >> - the token values are illustrative, not intended to be a final version. >> >> I think that should allow for more inlining opportunites and possibly >> a way forward to >> frame elision (a.k.a halo). >> >>> this would however change the coroutine ABI - I don't know if that's >>> a problem. >> >> The external ABI for the coroutine is the >> resume, >> destroy pointers >> and the promise >> and that one can find each of these from the frame pointer. >> >> This was agreed between the interested “vendors” so that one compiler >> could invoke >> coroutines built by another. So I do not think this is so much a >> useful area to explore. >> > > I understand. I still want to try to implement a more light-weight frame > layout with just one function pointer; would it be possible to merge > such a change if it was made opt-in via a compiler flag, eg > `-fsmall-coroutine-frame`? My use-case for this is embedded environments > with very limited memory, and I do not care about interoperability with > other compilers there. > >> Also the intent is that an indirect call through the frame pointer is >> the most frequent >> operation so should be the most efficient. >> resume() might be called many times, >> destroy() just once thus it is a cold code path >> - space can be important too - but interoperability was the goal here. >> >>> The _Coro_self_handle should be constructible on-demand from the >>> frame address. >> >> Yes, and in the header the relevant items are all constexpr - so that >> should happen in the >> user’s code. I elected to have that value in the frame to avoid >> recreating it each time - I >> suppose that is a trade-off of one oiptimisation c.f. another … > > If the handle construction cannot be optimized out, and its thus > a tradeoff between frame size and number of instructions, then this > could also be enabled by a hypothetical `-fsmall-coroutine-frame`. > > Coming back to this: > >>>> (the other related optimisation is to eliminate frame entries for >>>> scopes without any suspend >>>> points - which has the potential to save even more space for code with >>>> sparse use of co_xxxx) > > This would be nice; although it could encompassed by a more general > optimization - eliminate frame entries for all variables which are not > > accessed (directly or via pointer / reference) beyond a suspend point. > To be fair, I do not know how to get started on such an optimization, > or if it is even possible to do on the frontend. This would however be > immensely useful for reducing the frame size taken-up by complicated > co_await expressions (among other things), for example, if I have a > composed operation: > > co_await when_either(get_leaf_awaitable_1(), get_leaf_awaitable_2()); > > Right now, this creates space in the frame for the temporary 'leaf' > awaitables, which were already moved into the composed awaitable. > If the awaitable has an operator co_await that returns the real awaiter, > the original awaitable is also stored in the frame, even if it > is not referenced by the awaiter; another unused object gets stored if > > the .await_transform() customization point was used. > > What are your thoughts on the feasibility / difficulty of implementing > such an optimization? > > Michal > >>> >>> Do you have any advice / opinions on this before I try to implement it? >> >> Hopefully, the notes above help. >> >> I will rebase my latest code changes as soon as I have a chance and >> put them somewhere >> for you to look at - basically, these are to try and address the >> correctness issues we face, >> >> Iain >> >> >>> >>> Michal >>> >>> On Jul 12 2022, at 4:08 pm, Iain Sandoe wrote: >>> >>>> Hi Michal, >>>> >>>>> On 12 Jul 2022, at 14:35, Michal Jankovič via Gcc-patches >>>>> wrote: >>>>> >>>>> Currently, coroutine frames store all variables of a coroutine separately, >>>>> even if their lifetime does not overlap (they are in distinct >>>>> scopes). This >>>>> patch implements overlapping distinct variable scopes in the >>>>> coroutine frame, >>>>> by storing the frame fields in nested unions of structs. This lowers >>>>> the size >>>>> of the frame for larger coroutines significantly, and makes them >>>>> more usable >>>>> on systems with limited memory. >>>> >>>> not a review (I will try to take a look at the weekend). >>>> >>>> but … this is one of the two main optimisations on my TODO - so cool >>>> for doing it. >>>> >>>> (the other related optimisation is to eliminate frame entries for >>>> scopes without any suspend >>>> points - which has the potential to save even more space for code with >>>> sparse use of co_xxxx) >>>> >>>> Iain >>>> >>>>> Bootstrapped and regression tested on x86_64-pc-linux-gnu; new >>>>> test fails >>>>> before the patch and succeeds after with no regressions. >>>>> >>>>> PR c++/105989 >>>>> >>>>> gcc/cp/ChangeLog: >>>>> >>>>> * coroutines.cc (struct local_var_info): Add field_access_path. >>>>> (build_local_var_frame_access_expr): New. >>>>> (transform_local_var_uses): Use build_local_var_frame_access_expr. >>>>> (coro_make_frame_entry_id): New. >>>>> (coro_make_frame_entry): Delegate to coro_make_frame_entry_id. >>>>> (struct local_vars_frame_data): Add orig, field_access_path. >>>>> (register_local_var_uses): Generate new frame layout. Create access >>>>> paths to vars. >>>>> (morph_fn_to_coro): Set new fields in local_vars_frame_data. >>>>> >>>>> gcc/testsuite/ChangeLog: >>>>> >>>>> * g++.dg/coroutines/pr105989.C: New test. >>>>> >>>>> >>>> >>>> >> >> >