From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pj1-x1042.google.com (mail-pj1-x1042.google.com [IPv6:2607:f8b0:4864:20::1042]) by sourceware.org (Postfix) with ESMTPS id 803DE393880E for ; Fri, 13 Mar 2020 23:00:09 +0000 (GMT) Received: by mail-pj1-x1042.google.com with SMTP id nu11so1898186pjb.1 for ; Fri, 13 Mar 2020 16:00:09 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=sglKr6Y47P1GEYsi3vCZp07yK8UpfMk/WBQkfddT8JQ=; b=UXUa7briqooenkhQYR5+Fxs9YV39IOAd2MuHhqnzPsEU0eY49hW1iq9KUX3+kqWjZp L49w+fq2LdgQtkDPjb4EoNvSSE35mfUybQnCEfuh4K+JT7XsFnwaH6fw/4UZpcQfZIWA 3kFSmv9iLyhoxZaSMwPrvO809WriWmMN6PuqePwNBzpHhC+V4SEIpKMn7QxX37K5Nnbj 2tzDmXt7FrVwCwqmpbH2HXonaDCkBvxaQPCTevFuKESAPOF0HQ7VxgX/cgqMw8gw+YyZ NN7igQg+xYvm8gQtyjskF2Fse3kSONJaZfh+bZREdC79j3TEFuhg6NKjJJxqIm0uiMcy COsw== X-Gm-Message-State: ANhLgQ01E0G3QC8JKsiDAt7wS2jvbTtZTiIptcNjBNAp4tTxkIHgfWUR 9iKM2lQJ3lOTq9ekSVC24Pgljqmjl9Y= X-Google-Smtp-Source: ADFU+vtKf8r5Tf2zy6296Rd/cn8QuMER8vjkbHoYt17aREs5VSXlewxhGKT7WO1aBL0qRpa8DeE3AA== X-Received: by 2002:a17:90a:8586:: with SMTP id m6mr11962052pjn.121.1584140407618; Fri, 13 Mar 2020 16:00:07 -0700 (PDT) Received: from bubble.grove.modra.org ([2406:3400:51d:8cc0:9045:24ed:3c39:dd58]) by smtp.gmail.com with ESMTPSA id e12sm44040548pff.168.2020.03.13.16.00.06 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Mar 2020 16:00:06 -0700 (PDT) Received: by bubble.grove.modra.org (Postfix, from userid 1000) id 1631489EA4; Sat, 14 Mar 2020 09:30:03 +1030 (ACDT) Date: Sat, 14 Mar 2020 09:30:02 +1030 From: Alan Modra To: Segher Boessenkool Cc: gcc-patches@gcc.gnu.org Subject: Re: [RS6000] make PLT loads volatile Message-ID: <20200313230002.GB23597@bubble.grove.modra.org> References: <20200312024850.GE5384@bubble.grove.modra.org> <20200312165717.GG22482@gate.crashing.org> <20200312233601.GH5384@bubble.grove.modra.org> <20200313154038.GR22482@gate.crashing.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200313154038.GR22482@gate.crashing.org> User-Agent: Mutt/1.9.4 (2018-02-28) X-Spam-Status: No, score=-0.2 required=5.0 tests=DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 13 Mar 2020 23:00:10 -0000 On Fri, Mar 13, 2020 at 10:40:38AM -0500, Segher Boessenkool wrote: > Hi! > > On Fri, Mar 13, 2020 at 10:06:01AM +1030, Alan Modra wrote: > > On Thu, Mar 12, 2020 at 11:57:17AM -0500, Segher Boessenkool wrote: > > > On Thu, Mar 12, 2020 at 01:18:50PM +1030, Alan Modra wrote: > > > > With lazy PLT resolution the first load of a PLT entry may be a value > > > > pointing at a resolver stub. gcc's loop processing can result in the > > > > PLT load in inline PLT calls being hoisted out of a loop in the > > > > mistaken idea that this is an optimisation. It isn't. If the value > > > > hoisted was that for a resolver stub then every call to that function > > > > in the loop will go via the resolver, slowing things down quite > > > > dramatically. > > > > > > > > The PLT really is volatile, so teach gcc about that. > > > > > > It would be nice if we could keep it cached after it has been resolved > > > once, this has potential for regressing performance if we don't? And > > > LD_BIND_NOW should keep working just as fast as it is now, too? > > > > Using a call-saved register to cache a load out of the PLT looks > > really silly > > Who said anything about using call-saved registers? GCC will usually > make a stack slot for this, and only use a non-volatile register when > that is profitable. (I know it is a bit too aggressive with it, but > that is a generic problem). Using a stack slot comes about due to hoisting then running out of call-saved registers in the loop. Score another reason not to hoist PLT loads. > > when the inline PLT call is turned back into a direct > > call by the linker. > > Ah, so yeah, for direct calls we do not want this. I was thinking this > was about indirect calls (via a bctrl that is), dunno how I got that > misperception. Sorry. > > What is this like for indirect calls (at C level)? Does your patch do > anything to those? No effect at all. To put your mind at rest on this point you can verify quite easily by noticing that UNSPECV_PLT* is only generated in rs6000_longcall_ref, and calls to that function are conditional on GET_CODE (func_desc) == SYMBOL_REF. -- Alan Modra Australia Development Lab, IBM