From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 447 invoked by alias); 11 Nov 2014 07:10:17 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 430 invoked by uid 89); 11 Nov 2014 07:10:17 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.5 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 11 Nov 2014 07:10:15 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id sAB7A3P1022444; Tue, 11 Nov 2014 01:10:03 -0600 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id sAB7A2cf022443; Tue, 11 Nov 2014 01:10:02 -0600 Date: Tue, 11 Nov 2014 07:10:00 -0000 From: Segher Boessenkool To: Michael Meissner , Alan Lawrence , "gcc-patches@gcc.gnu.org" , David Edelsohn Subject: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal Message-ID: <20141111071001.GA15842@gate.crashing.org> References: <544A3E0B.2000803@arm.com> <544A40D1.1040605@arm.com> <20141110223624.GA19330@ibm-tiger.the-meissners.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20141110223624.GA19330@ibm-tiger.the-meissners.org> User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2014-11/txt/msg00915.txt.bz2 On Mon, Nov 10, 2014 at 05:36:24PM -0500, Michael Meissner wrote: > However, the double pattern is completely broken. This cannot go in. [snip] > It is unacceptable to have to do the inner loop doing a load, vector add, and > store in the loop. Before the patch, the final reduction used *vsx_reduc_splus_v2df; after the patch, it is *vsx_reduc_plus_v2df_scalar. The former does a vector add, the latter a float add. And it uses the same pseudoregister for the accumulator throughout. IRA decides a register is more expensive than memory for this, I suppose because it wants both V2DF and DF? It doesn't seem to like the subreg very much. The new code does look nicer otherwise :-) Segher