From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-407579-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 19031 invoked by alias); 16 Sep 2015 16:11:10 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 19016 invoked by uid 89); 16 Sep 2015 16:11:10 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: e39.co.us.ibm.com
Received: from e39.co.us.ibm.com (HELO e39.co.us.ibm.com) (32.97.110.160) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Wed, 16 Sep 2015 16:11:09 +0000
Received: from /spool/local	by e39.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;	Wed, 16 Sep 2015 10:11:07 -0600
Received: from d03dlp02.boulder.ibm.com (9.17.202.178)	by e39.co.us.ibm.com (192.168.1.139) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Wed, 16 Sep 2015 10:11:04 -0600
X-MailFrom: wschmidt@linux.vnet.ibm.com
X-RcptTo: gcc-patches@gcc.gnu.org
Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16])	by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 510BF3E40044	for <gcc-patches@gcc.gnu.org>; Wed, 16 Sep 2015 10:11:04 -0600 (MDT)
Received: from d03av03.boulder.ibm.com (d03av03.boulder.ibm.com [9.17.195.169])	by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t8GGB4Iv4718886	for <gcc-patches@gcc.gnu.org>; Wed, 16 Sep 2015 09:11:04 -0700
Received: from d03av03.boulder.ibm.com (localhost [127.0.0.1])	by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t8GGB202031517	for <gcc-patches@gcc.gnu.org>; Wed, 16 Sep 2015 10:11:03 -0600
Received: from [9.80.47.176] ([9.80.47.176])	by d03av03.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t8GGAwTP030932;	Wed, 16 Sep 2015 10:10:59 -0600
Message-ID: <1442419857.10907.0.camel@gnopaine>
Subject: Re: [PATCH, rs6000] Add expansions for min/max vector reductions
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: Alan Lawrence <alan.lawrence@arm.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,        "dje.gcc@gmail.com"	 <dje.gcc@gmail.com>, rguenther@suse.de,        alan.hayward@arm.com, ramana.gcc@googlemail.com
Date: Wed, 16 Sep 2015 16:14:00 -0000
In-Reply-To: <55F98AD2.4080408@arm.com>
References: <1442413689.2896.45.camel@gnopaine> <55F98AD2.4080408@arm.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15091616-0033-0000-0000-000005E6CC04
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg01204.txt.bz2


On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote:
> On 16/09/15 15:28, Bill Schmidt wrote:
> > 2015-09-16  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> >
> >          * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, UNSPEC_REDUC_SMIN,
> >          UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL,
> >          UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL,
> >          UNSPEC_REDUC_UMIN_SCAL): New enumerated constants.
> >          (reduc_smax_v2di): New define_expand.
> >          (reduc_smax_scal_v2di): Likewise.
> >          (reduc_smin_v2di): Likewise.
> >          (reduc_smin_scal_v2di): Likewise.
> >          (reduc_umax_v2di): Likewise.
> >          (reduc_umax_scal_v2di): Likewise.
> >          (reduc_umin_v2di): Likewise.
> >          (reduc_umin_scal_v2di): Likewise.
> >          (reduc_smax_v4si): Likewise.
> >          (reduc_smin_v4si): Likewise.
> >          (reduc_umax_v4si): Likewise.
> >          (reduc_umin_v4si): Likewise.
> >          (reduc_smax_v8hi): Likewise.
> >          (reduc_smin_v8hi): Likewise.
> >          (reduc_umax_v8hi): Likewise.
> >          (reduc_umin_v8hi): Likewise.
> >          (reduc_smax_v16qi): Likewise.
> >          (reduc_smin_v16qi): Likewise.
> >          (reduc_umax_v16qi): Likewise.
> >          (reduc_umin_v16qi): Likewise.
> >          (reduc_smax_scal_<mode>): Likewise.
> >          (reduc_smin_scal_<mode>): Likewise.
> >          (reduc_umax_scal_<mode>): Likewise.
> >          (reduc_umin_scal_<mode>): Likewise.
> 
> You shouldn't need the non-_scal reductions. Indeed, they shouldn't be used if 
> the _scal are present. The non-_scal's were previously defined as producing a 
> vector with one element holding the result and the other elements all zero, and 
> this was only ever used with a vec_extract immediately after; the _scal pattern 
> now includes the vec_extract as well. Hence the non-_scal patterns are 
> deprecated / considered legacy, as per md.texi.

Thanks -- I had misread the description of the non-scalar versions,
missing the part where the other elements are zero.  What I really
want/need is an optab defined as computing the maximum value in all
elements of the vector.  This seems like a strange thing to want, but
Alan Hayward's proposed patch will cause us to generate the scalar
version, followed by a broadcast of the vector.  Since our patterns
already generate the maximum value in all positions, this creates an
unnecessary extract followed by an unnecessary broadcast.

As discussed elsewhere, we *could* remove the unnecessary code by
recognizing this in simplify-rtx, etc., but the vectorization cost
modeling would be wrong.  It would have still told us to model this as a
vec_to_scalar for the reduc_max_scal, and a vec_stmt for the broadcast.
This would overcount the cost of the reduction compared to what we would
actually generate.

To get this right for all targets, one could envision having a new optab
for a reduction-to-vector, which most targets wouldn't implement, but
PowerPC and AArch32, at least, would.  If a target has a
reduction-to-vector, the vectorizer would have to generate a different
GIMPLE code that mapped to this; otherwise it would do the
REDUC_MAX_EXPR and the broadcast.  This obviously starts to get
complicated, since adding a GIMPLE code certainly has a nontrivial
cost. :/

Perhaps the practical thing is to have the vectorizer also do an
add_stmt_cost with some new token that indicates the cost model should
make an adjustment if the back end doesn't need the extract/broadcast.
Targets like PowerPC and AArch32 could then subtract the unnecessary
cost, and remove the unnecessary code in simplify-rtx.

Copying Richi and ARM folks for opinions on the best design.  I want to
be able to model this stuff as accurately as possible, but obviously we
need to avoid unnecessary effects on other architectures.

In any case, I will remove implementing the deprecated optabs, and I'll
also try to look at Alan L's patch shortly.

Thanks,
Bill


> 
> I proposed a patch to migrate PPC off the old patterns, but have forgotten to 
> ping it recently - last at 
> https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01024.html ... (ping?!)
> 
> --Alan
>