From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id 18479385E006 for ; Thu, 26 Mar 2020 17:12:17 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 18479385E006 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=segher@kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 02QHCGRn011416; Thu, 26 Mar 2020 12:12:16 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 02QHCGqB011415; Thu, 26 Mar 2020 12:12:16 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Thu, 26 Mar 2020 12:12:16 -0500 From: Segher Boessenkool To: Alan Modra Cc: gcc-patches@gcc.gnu.org Subject: Re: [RS6000] PR94145, make PLT loads volatile Message-ID: <20200326171216.GW22482@gate.crashing.org> References: <20200312024850.GE5384@bubble.grove.modra.org> <20200312165717.GG22482@gate.crashing.org> <20200312233601.GH5384@bubble.grove.modra.org> <20200313154038.GR22482@gate.crashing.org> <20200313230002.GB23597@bubble.grove.modra.org> <20200318215359.GO22482@gate.crashing.org> <20200323015103.GS4583@bubble.grove.modra.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200323015103.GS4583@bubble.grove.modra.org> User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-51.0 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, RCVD_IN_DNSWL_NONE, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Mar 2020 17:12:18 -0000 Hi! On Mon, Mar 23, 2020 at 12:21:03PM +1030, Alan Modra wrote: > On Wed, Mar 18, 2020 at 04:53:59PM -0500, Segher Boessenkool wrote: > > Could you please send a new patch (could be the same patch even) that > > is easier to review for me? > > The PLT is volatile. On PowerPC it is a bss style section which the > dynamic loader initialises to point at resolver stubs (called glink on > PowerPC64) to support lazy resolution of function addresses. The > first call to a given function goes via the dynamic loader symbol > resolver, which updates the PLT entry for that function and calls the > function. The second call, if there is one and we don't have a > multi-threaded race, will use the updated PLT entry and thus avoid > the relatively slow symbol resolver path. Okay, so it isn't volatile, we have the guarantee that it will stay the same after we have called the function once (on this same execution thread)? > Calls via the PLT are like calls via a function pointer, except that > no initialised function pointer is volatile like the PLT. All > initialised function pointers are resolved at program startup to point > at the function or are left as NULL. There is no support for lazy > resolution of any user visible function pointer. > > So why does any of this matter to gcc? Well, normally the PLT call > mechanism happens entirely behind gcc's back, but since we implemented > inline PLT calls (effectively putting the PLT code stub that loads the > PLT entry inline and making that code sequence scheduled), the load of > the PLT entry is visible to gcc. That load then is subject to gcc > optimization, for example in > > /* -S -mcpu=future -mpcrel -mlongcall -O2. */ > int foo (int); > void bar (void) > { > while (foo(0)) > foo (99); > } > > we see the PLT load for foo being hoisted out of the loop and stashed > in a call-saved register. If that happens to be the first call to > foo, then the stashed value is that for the resolver stub, and every > call to foo in the loop will then go via the slow resolver path. Not > a good idea. Also, if foo turns out to be a local function and the > linker replaces the PLT calls with direct calls to foo then gcc has > just wasted a call-saved register. So you are saying that calling the PLT directly is always faster than calling via a function pointer, even if that is the correct resolved address? That is the part I am worried about. I think that is right, but I don't quite see it, and I don't know what is done at runtime nearly well enough :-/ > This patch teaches gcc that the PLT loads are volatile. The change > doesn't affect other loads of function pointers and thus has no effect > on normal indirect function calls. Note that because the > "optimization" this patch prevents can only occur over function calls, > the only place gcc can stash PLT loads is in call-saved registers or > in other memory. I'm reasonably confident that this change will be > neutral or positive for the "ld -z now" case where the PLT is not > volatile, in code where there is any register pressure. That is good enough for me :-) > Even if gcc > could be taught to recognise cases where the PLT is resolved, you'd > need to discount use of registers to cache PLT loads by some factor > involving the chance that those calls would be converted to direct > calls.. > PR target/94145 > * config/rs6000/rs6000.c (rs6000_longcall_ref): Use unspec_volatile > for PLT16_LO and PLT_PCREL. > * config/rs6000/rs6000.md (UNSPEC_PLT16_LO, UNSPEC_PLT_PCREL): Remove. > (UNSPECV_PLT16_LO, UNSPECV_PLT_PCREL): Define. > (pltseq_plt16_lo_, pltseq_plt_pcrel): Use unspec_volatile. Okay for trunk. Thank you! Segher