From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-383513-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 32402 invoked by alias); 10 Nov 2014 22:37:35 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 32392 invoked by uid 89); 10 Nov 2014 22:37:35 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.6 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.2
X-HELO: e8.ny.us.ibm.com
Received: from e8.ny.us.ibm.com (HELO e8.ny.us.ibm.com) (32.97.182.138) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Mon, 10 Nov 2014 22:37:33 +0000
Received: from /spool/local	by e8.ny.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <meissner@ibm-tiger.the-meissners.org>;	Mon, 10 Nov 2014 17:37:31 -0500
Received: from d01dlp03.pok.ibm.com (9.56.250.168)	by e8.ny.us.ibm.com (192.168.1.108) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Mon, 10 Nov 2014 17:37:29 -0500
Received: from b01cxnp22033.gho.pok.ibm.com (b01cxnp22033.gho.pok.ibm.com [9.57.198.23])	by d01dlp03.pok.ibm.com (Postfix) with ESMTP id 39D90C90026	for <gcc-patches@gcc.gnu.org>; Mon, 10 Nov 2014 17:29:33 -0500 (EST)
Received: from d01av02.pok.ibm.com (d01av02.pok.ibm.com [9.56.224.216])	by b01cxnp22033.gho.pok.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id sAAMbSgb27066620	for <gcc-patches@gcc.gnu.org>; Mon, 10 Nov 2014 22:37:28 GMT
Received: from d01av02.pok.ibm.com (localhost [127.0.0.1])	by d01av02.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id sAAMaP3r015189	for <gcc-patches@gcc.gnu.org>; Mon, 10 Nov 2014 17:36:25 -0500
Received: from ibm-tiger.the-meissners.org (dhcp-9-32-77-206.usma.ibm.com [9.32.77.206])	by d01av02.pok.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id sAAMaPi4015166;	Mon, 10 Nov 2014 17:36:25 -0500
Received: by ibm-tiger.the-meissners.org (Postfix, from userid 500)	id 8FFB34205E; Mon, 10 Nov 2014 17:36:24 -0500 (EST)
Date: Mon, 10 Nov 2014 22:39:00 -0000
From: Michael Meissner <meissner@linux.vnet.ibm.com>
To: Alan Lawrence <alan.lawrence@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,        David Edelsohn <dje.gcc@gmail.com>,        Segher Boessenkool <segher@kernel.crashing.org>
Subject: Re: [PATCH 10/11][RS6000] Migrate reduction optabs to reduc_..._scal
Message-ID: <20141110223624.GA19330@ibm-tiger.the-meissners.org>
Mail-Followup-To: Michael Meissner <meissner@linux.vnet.ibm.com>,	Alan Lawrence <alan.lawrence@arm.com>,	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,	David Edelsohn <dje.gcc@gmail.com>,	Segher Boessenkool <segher@kernel.crashing.org>
References: <544A3E0B.2000803@arm.com> <544A40D1.1040605@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <544A40D1.1040605@arm.com>
User-Agent: Mutt/1.5.20 (2009-12-10)
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 14111022-0029-0000-0000-000001136AAE
X-IsSubscribed: yes
X-SW-Source: 2014-11/txt/msg00892.txt.bz2

On Fri, Oct 24, 2014 at 01:06:41PM +0100, Alan Lawrence wrote:
> This migrates the reduction patterns in altivec.md and vector.md to
> the new names. I've not touched paired.md as I wasn't really sure
> how to fix that (how do I vec_extractv2sf ?), moreover the testing I
> did didn't seem to exercise any of those patterns (iow: I'm not sure
> what would be an appropriate target machine?).
> 
> I note the reduc_uplus_v16qi (which I've removed, as unsigned and
> signed addition should be equivalent) differed from
> reduc_splus_v16qi in using gen_altivec_vsum4ubs rather than
> gen_altivec_vsum4sbs.  Testcases gcc.dg/vect/{slp-24-big-array.c,slp-24.c,vect-reduc-1char-big-array.c,vert-reduc-1char.c}
> thus produce assembly which differs from previously (only) in that
> "vsum4ubs" becomes "vsum4sbs". These tests are still passing so I
> assume this is OK.
> 
> The combining of signed and unsigned addition also improves gcc.dg/vect/{vect-outer-4i.c,vect-reduc-1short.c,vect-reduc-dot-u8b.c,vect-reduc-pattern-1c-big-array.c,vect-reduc-pattern-1c.c}
> : these are now reduced using direct vector reduction, rather than
> with shifts as previously (because there was only a reduc_splus
> rather than the reduc_uplus these tests looked for).

I checked the integer vector add reductions, and it seems to generate the same
value with old/new code, and I like eliminating the vector shift.

> ((Side note: the RTL changes to vector.md are to match the combine
> patterns in vsx.md; now that we now longer depend upon combine to
> generate those patterns (as the optab outputs them directly), one
> might wish to remove the smaller pattern from vsx.md, and/or
> simplify the RTL. I theorize that a reduction of a two-element
> vector is just adding the first element to the second, so maybe to
> something like
> 
>   [(parallel [(set (match_operand:DF 0 "vfloat_operand" "")
> 		   (VEC_reduc:V2DF
> 		    (vec_select:DF
> 		     (match_operand:V2DF 1 "vfloat_operand" "")
> 		     (parallel [(const_int 1)]))
> 		    (vec_select:DF
> 		     (match_dup 1)
> 		     (parallel [(const_int 0)]))))
> 	      (clobber (match_scratch:V2DF 2 ""))])]
> 
> but I think it's best for me to leave that to the port maintainers.))
> 
> Bootstrapped and check-gcc on powerpc64-none-linux-gnu
> (gcc110.fsffrance.org, with thanks to the GCC Compile Farm).

However, the double pattern is completely broken.  This cannot go in.

Consider this source:

#include <stdio.h>
#include <stddef.h>
#include <stdlib.h>
#include <string.h>

#ifndef TYPE
#define TYPE double
#endif

#ifndef OTYPE
#define OTYPE TYPE
#endif

#ifndef SIZE
#define SIZE 1024
#endif

#ifndef ALIGN
#define ALIGN 32
#endif

TYPE a[SIZE] __attribute__((__aligned__(ALIGN)));

OTYPE sum (void) __attribute__((__noinline__));

OTYPE
sum (void)
{
  size_t i;
  OTYPE s = (OTYPE) 0;

  for (i = 0; i < SIZE; i++)
    s += a[i];

  return s;
}

If I compile with today's trunk, and -mcpu=power8 -ffast-math -O3, I get code
that I expect (though it could xxpermdi instead of xxsldwi):

sum:
	.quad	.L.sum,.TOC.@tocbase,0
	.previous
	.type	sum, @function
.L.sum:
	li 10,512
	addis 9,2,.LC1@toc@ha		# gpr load fusion, type long
	ld 9,.LC1@toc@l(9)
	xxlxor 0,0,0
	mtctr 10
	.p2align 4,,15
.L2:
	lxvd2x 12,0,9
	addi 9,9,16
	xvadddp 0,0,12
	bdnz .L2
	xxsldwi 12,0,0,2
	xvadddp 1,12,0
	xxpermdi 1,1,1,2
	blr
	.long 0

However, the code produced by the patches gives:

sum:
	.quad	.L.sum,.TOC.@tocbase,0
	.previous
	.type	sum, @function
.L.sum:
	xxlxor 0,0,0
	addi 10,1,-16
	li 8,512
	addis 9,2,.LC1@toc@ha		# gpr load fusion, type long
	ld 9,.LC1@toc@l(9)
	mtctr 8
	stxvd2x 0,0,10
	.p2align 5,,31
.L2:
	addi 10,1,-16
	lxvd2x 0,0,9
	addi 9,9,16
	lxvd2x 12,0,10
	xvadddp 12,12,0
	stxvd2x 12,0,10
	bdnz .L2
	lfd 0,-16(1)
	xxpermdi 1,12,12,2
	fadd 1,0,1
	blr
	.long 0

It is unacceptable to have to do the inner loop doing a load, vector add, and
store in the loop.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meissner@linux.vnet.ibm.com, phone: +1 (978) 899-4797