From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 32402 invoked by alias); 10 Nov 2014 22:37:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 32392 invoked by uid 89); 10 Nov 2014 22:37:35 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2 X-HELO: e8.ny.us.ibm.com Received: from e8.ny.us.ibm.com (HELO e8.ny.us.ibm.com) (32.97.182.138) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 10 Nov 2014 22:37:33 +0000 Received: from /spool/local by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Mon, 10 Nov 2014 17:37:31 -0500 Received: from d01dlp03.pok.ibm.com (9.56.250.168) by e8.ny.us.ibm.com (192.168.1.108) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Mon, 10 Nov 2014 17:37:29 -0500 Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23]) by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 39D90C90026 for ; Mon, 10 Nov 2014 17:29:33 -0500 (EST) Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216]) by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id sAAMbSgb27066620 for ; Mon, 10 Nov 2014 22:37:28 GMT Received: from d01av02.pok.ibm.com (localhost [127.0.0.1]) by d01av02.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id sAAMaP3r015189 for ; Mon, 10 Nov 2014 17:36:25 -0500 Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-206.usma.ibm.com [9.32.77.206]) by d01av02.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id sAAMaPi4015166; Mon, 10 Nov 2014 17:36:25 -0500 Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500) id 8FFB34205E; Mon, 10 Nov 2014 17:36:24 -0500 (EST) Date: Mon, 10 Nov 2014 22:39:00 -0000 From: Michael Meissner To: Alan Lawrence Cc: "gcc-patches@gcc.gnu.org" , David Edelsohn , Segher Boessenkool Subject: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal Message-ID: <20141110223624.GA19330@ibm-tiger.the-meissners.org> Mail-Followup-To: Michael Meissner , Alan Lawrence , "gcc-patches@gcc.gnu.org" , David Edelsohn , Segher Boessenkool References: <544A3E0B.2000803@arm.com> <544A40D1.1040605@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <544A40D1.1040605@arm.com> User-Agent: Mutt/1.5.20 (2009-12-10) X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 14111022-0029-0000-0000-000001136AAE X-IsSubscribed: yes X-SW-Source: 2014-11/txt/msg00892.txt.bz2 On Fri, Oct 24, 2014 at 01:06:41PM +0100, Alan Lawrence wrote: > This migrates the reduction patterns in altivec.md and vector.md to > the new names. I've not touched paired.md as I wasn't really sure > how to fix that (how do I vec_extractv2sf ?), moreover the testing I > did didn't seem to exercise any of those patterns (iow: I'm not sure > what would be an appropriate target machine?). > > I note the reduc_uplus_v16qi (which I've removed, as unsigned and > signed addition should be equivalent) differed from > reduc_splus_v16qi in using gen_altivec_vsum4ubs rather than > gen_altivec_vsum4sbs. Testcases gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c} > thus produce assembly which differs from previously (only) in that > "vsum4ubs" becomes "vsum4sbs". These tests are still passing so I > assume this is OK. > > The combining of signed and unsigned addition also improves gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c} > : these are now reduced using direct vector reduction, rather than > with shifts as previously (because there was only a reduc_splus > rather than the reduc_uplus these tests looked for). I checked the integer vector add reductions, and it seems to generate the same value with old/new code, and I like eliminating the vector shift. > ((Side note: the RTL changes to vector.md are to match the combine > patterns in vsx.md; now that we now longer depend upon combine to > generate those patterns (as the optab outputs them directly), one > might wish to remove the smaller pattern from vsx.md, and/or > simplify the RTL. I theorize that a reduction of a two-element > vector is just adding the first element to the second, so maybe to > something like > > [(parallel [(set (match_operand:DF 0 "vfloat_operand" "") > (VEC_reduc:V2DF > (vec_select:DF > (match_operand:V2DF 1 "vfloat_operand" "") > (parallel [(const_int 1)])) > (vec_select:DF > (match_dup 1) > (parallel [(const_int 0)])))) > (clobber (match_scratch:V2DF 2 ""))])] > > but I think it's best for me to leave that to the port maintainers.)) > > Bootstrapped and check-gcc on powerpc64-none-linux-gnu > (gcc110.fsffrance.org, with thanks to the GCC Compile Farm). However, the double pattern is completely broken. This cannot go in. Consider this source: #include #include #include #include #ifndef TYPE #define TYPE double #endif #ifndef OTYPE #define OTYPE TYPE #endif #ifndef SIZE #define SIZE 1024 #endif #ifndef ALIGN #define ALIGN 32 #endif TYPE a[SIZE] __attribute__((__aligned__(ALIGN))); OTYPE sum (void) __attribute__((__noinline__)); OTYPE sum (void) { size_t i; OTYPE s = (OTYPE) 0; for (i = 0; i < SIZE; i++) s += a[i]; return s; } If I compile with today's trunk, and -mcpu=power8 -ffast-math -O3, I get code that I expect (though it could xxpermdi instead of xxsldwi): sum: .quad .L.sum,.TOC.@tocbase,0 .previous .type sum, @function .L.sum: li 10,512 addis 9,2,.LC1@toc@ha # gpr load fusion, type long ld 9,.LC1@toc@l(9) xxlxor 0,0,0 mtctr 10 .p2align 4,,15 .L2: lxvd2x 12,0,9 addi 9,9,16 xvadddp 0,0,12 bdnz .L2 xxsldwi 12,0,0,2 xvadddp 1,12,0 xxpermdi 1,1,1,2 blr .long 0 However, the code produced by the patches gives: sum: .quad .L.sum,.TOC.@tocbase,0 .previous .type sum, @function .L.sum: xxlxor 0,0,0 addi 10,1,-16 li 8,512 addis 9,2,.LC1@toc@ha # gpr load fusion, type long ld 9,.LC1@toc@l(9) mtctr 8 stxvd2x 0,0,10 .p2align 5,,31 .L2: addi 10,1,-16 lxvd2x 0,0,9 addi 9,9,16 lxvd2x 12,0,10 xvadddp 12,12,0 stxvd2x 12,0,10 bdnz .L2 lfd 0,-16(1) xxpermdi 1,12,12,2 fadd 1,0,1 blr .long 0 It is unacceptable to have to do the inner loop doing a load, vector add, and store in the loop. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797