From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtp002.apm-internet.net (smtp002.apm-internet.net [85.119.248.221]) by sourceware.org (Postfix) with ESMTPS id 502E23858D28 for ; Sat, 16 Sep 2023 19:11:00 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org 502E23858D28 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=sandoe.co.uk Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=sandoe.co.uk Received: (qmail 40262 invoked from network); 16 Sep 2023 19:10:58 -0000 X-APM-Out-ID: 16948914584026 X-APM-Authkey: 257869/1(257869/1) 6 Received: from unknown (HELO smtpclient.apple) (81.138.1.83) by smtp002.apm-internet.net with SMTP; 16 Sep 2023 19:10:58 -0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.4\)) Subject: Re: [PATCH] core: Support heap-based trampolines From: Iain Sandoe In-Reply-To: Date: Sat, 16 Sep 2023 20:10:57 +0100 Cc: FX Coudert , Jeff Law , GCC Patches , Maxim Blinov , Eric Botcazou Content-Transfer-Encoding: quoted-printable Message-Id: <91E7404E-E536-4230-A8E1-F20D419E74FE@sandoe.co.uk> References: <1E30AA03-E0C1-447C-8A06-1B516B86992D@gmail.com> <20E41E5A-C9FE-4B9B-AEE6-CA8D18EC6635@gmail.com> To: Richard Biener X-Mailer: Apple Mail (2.3696.120.41.1.4) X-Spam-Status: No, score=-3.1 required=5.0 tests=BAYES_00,KAM_COUK,KAM_DMARC_STATUS,RCVD_IN_DNSWL_LOW,SPF_HELO_NONE,SPF_PASS,TXREP autolearn=no autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Richard, > On 14 Sep 2023, at 11:18, Richard Biener = wrote: >=20 > On Wed, Sep 6, 2023 at 5:44=E2=80=AFPM FX Coudert = wrote: >>=20 >> ping**2 on the revised patch, for Richard or another global reviewer. = So far all review feedback is that it=E2=80=99s a step forward, and = it=E2=80=99s been widely used for both aarch64-darwin and x86_64-darwin = distributions for almost three years now. >>=20 >> OK to commit? >=20 > I just noticed that ftrampoline-impl isn't Optimize, thus it's not > streamed with LTO. I think this is fine, the nested pass runs before LTO streaming and = lowers to the relevant built-ins for the chosen impl. The builtins are distinct and can co-exist in the linked exe, > How does mixing different -ftrampoline-impl for different LTO TUs = behave? Assuming that a target can support multiple implementations, then each = is applied local to a single TU. The nested functions are scoped within = their parent and thus should not be candidates for merging by LTO. For a target that cannot support both, then one or more of the TUs = should be rejected before we even get to LTO. > How does mis-specifying -ftrampoline-impl at LTO link time compared = to compile-time behave? The flag should be a NOP at LTO link time (but I do not think we want = to reject it, that would probably create other issues?) > Is the state fully reflected during pre-IPA compilation and the flag = not needed after that? =20 yes, that is my understanding, nested runs very early. > It appears so, but did you check? I actually checked on x86_64-darwin (which does support both) and we = see=E2=80=A6 here with two tus with nested fns and a third with the main(). $ nm -mapv ./nn.ltrans0.ltrans.o as expected, two instances of the nested =E2=80=9Cbar=E2=80=9D. 00000000000001a8 (__TEXT,__cstring) non-external lC0 000000000000001f (__TEXT,__text) non-external _bar.0.lto_priv.0=20 00000000000001d0 (__TEXT,__cstring) non-external lC1 00000000000000ec (__TEXT,__text) non-external _bar.0.lto_priv.1 000000000000007c (__TEXT,__text) external _foo_1 0000000000000149 (__TEXT,__text) external _foo_2 0000000000000000 (__TEXT,__text) external _main >>> these for heap-based: (undefined) external ___builtin_nested_func_ptr_created=20= (undefined) external ___builtin_nested_func_ptr_deleted >>> this for stack-based. (undefined) external ___enable_execute_stack (and the code executes as expected). > OK if that's a non-issue. thanks, we'll wait a day or two in case of any follow-on comments, Iain P.S. I was investigating some unrelated unwinder issues a couple of = weeks ago, but that did highlight that we have a possibility to avoid = the leaks from longjump if we hang on the forced_unwind() machinery = [TODO, tho, not part of this initial patch] >=20 > Thanks, > Richard. >=20 >> FX >>=20 >>=20 >>=20 >>> Le 5 ao=C3=BBt 2023 =C3=A0 16:20, FX Coudert a = =C3=A9crit : >>>=20 >>> Hi Richard, >>>=20 >>> Thanks for your feedback. Here is an amended version of the patch, = taking into consideration your requests and the following discussion. = There is no configure option for the libgcc part, and the documentation = is amended. The patch is split into three commits for core, target and = libgcc. >>>=20 >>> Currently regtesting on x86_64 linux and darwin (it was fine before = I split up into three commits, so I=E2=80=99m re-testing to make sure I = didn=E2=80=99t screw anything up). >>>=20 >>> OK to commit? >>> FX >>=20