From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from gate.crashing.org (gate.crashing.org [63.228.1.57]) by sourceware.org (Postfix) with ESMTP id C9BFE3858402 for ; Thu, 26 Aug 2021 12:16:10 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C9BFE3858402 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=kernel.crashing.org Authentication-Results: sourceware.org; spf=fail smtp.mailfrom=kernel.crashing.org Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.14.1) with ESMTP id 17QCFAq0012811; Thu, 26 Aug 2021 07:15:10 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id 17QCF9T9012810; Thu, 26 Aug 2021 07:15:09 -0500 X-Authentication-Warning: gate.crashing.org: segher set sender to segher@kernel.crashing.org using -f Date: Thu, 26 Aug 2021 07:15:09 -0500 From: Segher Boessenkool To: David Edelsohn Cc: GCC Patches Subject: Re: [PATCH] Inline IBM long double __gcc_qsub Message-ID: <20210826121509.GX1583@gate.crashing.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.4.2.3i X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, JMQ_SPF_NEUTRAL, KAM_DMARC_STATUS, TXREP, T_SPF_HELO_PERMERROR, T_SPF_PERMERROR autolearn=no autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gcc-patches@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 26 Aug 2021 12:16:12 -0000 Hi! On Wed, Aug 25, 2021 at 08:23:32PM -0400, David Edelsohn wrote: > rs6000: inline ldouble __gcc_qsub > > While performing some tests of IEEE 128 float for PPC64LE, Michael > Meissner noticed that __gcc_qsub is substantially slower than > __gcc_qadd. __gcc_qsub valls __gcc_add with the second operand ("calls", "__gcc_qadd") > negated. Because the functions normally are invoked through > libgcc shared object, the extra PLT overhead has a large impact > on the overall time of the function. Instead of trying to be > fancy with function decorations to prevent interposition, this > patch inlines the definition of __gcc_qadd into __gcc_qsub with > the negation propagated through the function. Looks good to me, and it is a good way to resolve this. This code is too old (and unimportant) to do serious engineering on. If we want any serious optimisation on it we should do that at tree level (why does that not happen yet anyway?), and inline all of this. This patch is really just to make benchmark results saner ;-) Thanks David! Segher