From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=E9Mq=FA=sandoe.co.uk=iain@sourceware.org>
Received: from smtp002.apm-internet.net (smtp002.apm-internet.net [85.119.248.221])
	by sourceware.org (Postfix) with ESMTPS id 502E23858D28
	for <gcc-patches@gcc.gnu.org>; Sat, 16 Sep 2023 19:11:00 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 502E23858D28
Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=sandoe.co.uk
Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sandoe.co.uk
Received: (qmail 40262 invoked from network); 16 Sep 2023 19:10:58 -0000
X-APM-Out-ID: 16948914584026
X-APM-Authkey: 257869/1(257869/1) 6
Received: from unknown (HELO smtpclient.apple) (81.138.1.83)
  by smtp002.apm-internet.net with SMTP; 16 Sep 2023 19:10:58 -0000
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\))
Subject: Re: [PATCH] core: Support heap-based trampolines
From: Iain Sandoe <iain@sandoe.co.uk>
In-Reply-To: <CAFiYyc0LwAd2qOoAUL2O8nFg3g5sw-9Yfb59TA7J7EoGX32Rjw@mail.gmail.com>
Date: Sat, 16 Sep 2023 20:10:57 +0100
Cc: FX Coudert <fxcoudert@gmail.com>,
 Jeff Law <jeffreyalaw@gmail.com>,
 GCC Patches <gcc-patches@gcc.gnu.org>,
 Maxim Blinov <maxim.blinov@embecosm.com>,
 Eric Botcazou <ebotcazou@adacore.com>
Content-Transfer-Encoding: quoted-printable
Message-Id: <91E7404E-E536-4230-A8E1-F20D419E74FE@sandoe.co.uk>
References: <BEFDC64A-7127-4F12-B650-73516FB2AFC7@gmail.com>
 <CAFiYyc1-N4yoEAUxoz+KvgWGPs_S_G8Cvme__8g2Lig8EfZcEA@mail.gmail.com>
 <1E30AA03-E0C1-447C-8A06-1B516B86992D@gmail.com>
 <20E41E5A-C9FE-4B9B-AEE6-CA8D18EC6635@gmail.com>
 <CAFiYyc0LwAd2qOoAUL2O8nFg3g5sw-9Yfb59TA7J7EoGX32Rjw@mail.gmail.com>
To: Richard Biener <richard.guenther@gmail.com>
X-Mailer: Apple Mail (2.3696.120.41.1.4)
X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,KAM_COUK,KAM_DMARC_STATUS,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org
List-Id: <gcc-patches.gcc.gnu.org>

Hi Richard,

> On 14 Sep 2023, at 11:18, Richard Biener <richard.guenther@gmail.com> =
wrote:
>=20
> On Wed, Sep 6, 2023 at 5:44=E2=80=AFPM FX Coudert =
<fxcoudert@gmail.com> wrote:
>>=20

>> ping**2 on the revised patch, for Richard or another global reviewer. =
So far all review feedback is that it=E2=80=99s a step forward, and =
it=E2=80=99s been widely used for both aarch64-darwin and x86_64-darwin =
distributions for almost three years now.
>>=20
>> OK to commit?
>=20
> I just noticed that ftrampoline-impl isn't Optimize, thus it's not
> streamed with LTO.

I think this is fine, the nested pass runs before LTO streaming and =
lowers to the relevant built-ins for the chosen impl.
The builtins are distinct and can co-exist in the linked exe,

>  How does mixing different -ftrampoline-impl for different LTO TUs =
behave?

Assuming that a target can support multiple implementations, then each =
is applied local to a single TU.  The nested functions are scoped within =
their parent and thus should not be candidates for merging by LTO.

For a target that cannot support both, then one or more of the TUs =
should be rejected before we even get to LTO.

>  How does mis-specifying -ftrampoline-impl at LTO link time compared =
to compile-time behave?

The flag should be a  NOP at LTO link time (but I do not think we want =
to reject it, that would probably create other issues?)

>  Is the state fully reflected during pre-IPA compilation and the flag =
not needed after that? =20

yes, that is my understanding, nested runs very early.

> It appears so, but did you check?

I actually checked on x86_64-darwin (which does support both) and we =
see=E2=80=A6
here with two tus with nested fns and a third with the main().

$ nm -mapv ./nn.ltrans0.ltrans.o

as expected, two instances of the nested =E2=80=9Cbar=E2=80=9D.

00000000000001a8 (__TEXT,__cstring) non-external lC0
000000000000001f (__TEXT,__text) non-external _bar.0.lto_priv.0=20
00000000000001d0 (__TEXT,__cstring) non-external lC1
00000000000000ec (__TEXT,__text) non-external _bar.0.lto_priv.1
000000000000007c (__TEXT,__text) external _foo_1
0000000000000149 (__TEXT,__text) external _foo_2
0000000000000000 (__TEXT,__text) external _main

>>> these for heap-based:
                 (undefined) external ___builtin_nested_func_ptr_created=20=

                 (undefined) external ___builtin_nested_func_ptr_deleted

>>> this for stack-based.
                 (undefined) external ___enable_execute_stack

(and the code executes as expected).

> OK if that's a non-issue.

thanks, we'll wait a day or two in case of any follow-on comments,
Iain

P.S. I was investigating some unrelated unwinder issues a couple of =
weeks ago, but that did highlight that we have a possibility to avoid =
the leaks from longjump if we hang on the forced_unwind() machinery =
[TODO, tho, not part of this initial patch]


>=20
> Thanks,
> Richard.
>=20
>> FX
>>=20
>>=20
>>=20
>>> Le 5 ao=C3=BBt 2023 =C3=A0 16:20, FX Coudert <fxcoudert@gmail.com> a =
=C3=A9crit :
>>>=20
>>> Hi Richard,
>>>=20
>>> Thanks for your feedback. Here is an amended version of the patch, =
taking into consideration your requests and the following discussion. =
There is no configure option for the libgcc part, and the documentation =
is amended. The patch is split into three commits for core, target and =
libgcc.
>>>=20
>>> Currently regtesting on x86_64 linux and darwin (it was fine before =
I split up into three commits, so I=E2=80=99m re-testing to make sure I =
didn=E2=80=99t screw anything up).
>>>=20
>>> OK to commit?
>>> FX
>>=20