From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-407183-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 24810 invoked by alias); 11 Sep 2015 14:48:58 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 24794 invoked by uid 89); 11 Sep 2015 14:48:58 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.2 required=5.0 tests=AWL,BAYES_00,KAM_LAZY_DOMAIN_SECURITY,T_RP_MATCHES_RCVD autolearn=no version=3.3.2
X-HELO: e35.co.us.ibm.com
Received: from e35.co.us.ibm.com (HELO e35.co.us.ibm.com) (32.97.110.153) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Fri, 11 Sep 2015 14:48:57 +0000
Received: from /spool/local	by e35.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted	for <gcc-patches@gcc.gnu.org> from <wschmidt@linux.vnet.ibm.com>;	Fri, 11 Sep 2015 08:48:55 -0600
Received: from d03dlp02.boulder.ibm.com (9.17.202.178)	by e35.co.us.ibm.com (192.168.1.135) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted;	Fri, 11 Sep 2015 08:48:52 -0600
X-MailFrom: wschmidt@linux.vnet.ibm.com
X-RcptTo: gcc-patches@gcc.gnu.org
Received: from b03cxnp07029.gho.boulder.ibm.com (b03cxnp07029.gho.boulder.ibm.com [9.17.130.16])	by d03dlp02.boulder.ibm.com (Postfix) with ESMTP id 090573E40047	for <gcc-patches@gcc.gnu.org>; Fri, 11 Sep 2015 08:48:52 -0600 (MDT)
Received: from d03av04.boulder.ibm.com (d03av04.boulder.ibm.com [9.17.195.170])	by b03cxnp07029.gho.boulder.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t8BEmp2q49348666	for <gcc-patches@gcc.gnu.org>; Fri, 11 Sep 2015 07:48:51 -0700
Received: from d03av04.boulder.ibm.com (loopback [127.0.0.1])	by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t8BEmpSs027901	for <gcc-patches@gcc.gnu.org>; Fri, 11 Sep 2015 08:48:51 -0600
Received: from [9.76.192.187] (sig-9-76-192-187.ibm.com [9.76.192.187])	by d03av04.boulder.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t8BEmo68027770;	Fri, 11 Sep 2015 08:48:51 -0600
Message-ID: <1441982930.2795.24.camel@gnopaine>
Subject: Re: [PATCH] vectorizing conditional expressions (PR tree-optimization/65947)
From: Bill Schmidt <wschmidt@linux.vnet.ibm.com>
To: Richard Sandiford <richard.sandiford@arm.com>
Cc: Ramana Radhakrishnan <ramana.gcc@googlemail.com>,        Alan Hayward <alan.hayward@arm.com>,        "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Date: Fri, 11 Sep 2015 15:14:00 -0000
In-Reply-To: <87d1xp2fng.fsf@e105548-lin.cambridge.arm.com>
References: <D217578B.7FE4%alan.hayward@arm.com>	 <1441923254.4772.37.camel@oc8801110288.ibm.com>	 <D2184E16.8003%alan.hayward@arm.com>	<1441977591.2795.11.camel@gnopaine>	 <CAJA7tRbTHE7Wo2CBKPNC1FG5RigHT5MmBeB1HX_ur-5pKdMqWA@mail.gmail.com>	 <87d1xp2fng.fsf@e105548-lin.cambridge.arm.com>
Content-Type: text/plain; charset="UTF-8"
Mime-Version: 1.0
Content-Transfer-Encoding: 7bit
X-TM-AS-MML: disable
X-Content-Scanned: Fidelis XPS MAILER
x-cbid: 15091114-0013-0000-0000-000018236135
X-IsSubscribed: yes
X-SW-Source: 2015-09/txt/msg00808.txt.bz2

On Fri, 2015-09-11 at 15:29 +0100, Richard Sandiford wrote:
> Ramana Radhakrishnan <ramana.gcc@googlemail.com> writes:
> > On Fri, Sep 11, 2015 at 2:19 PM, Bill Schmidt
> > <wschmidt@linux.vnet.ibm.com> wrote:
> >> Hi Alan,
> >>
> >> I probably wasn't clear enough.  The implementation in the vectorizer is
> >> fine and I'm not asking that to change per target.  What I'm objecting
> >> to is the equivalence between a REDUC_MAX_EXPR and a cost associated
> >> with vec_to_scalar.  This assumes that the back end will implement a
> >> REDUC_MAX_EXPR in a specific way that at least some back ends cannot.
> >> But those back ends should be free to model the cost of the
> >> REDUC_MAX_EXPR appropriately.  Therefore I am asking for a new
> >> vect_cost_for_stmt type to represent the cost of a REDUC_MAX_EXPR.  For
> >> ARM, this cost will be the same as a vec_to_scalar.  For others, it may
> >> not be; for powerpc, it certainly will not be.
> >>
> >> We can produce a perfectly fine sequence for a REDUC_MAX_EXPR during RTL
> >> expansion, and therefore it is not correct for us to explode this in
> >> tree-vect-generic.  This would expand the code size without providing
> >> any significant optimization opportunity, and could reduce the ability
> >> to, for instance, common REDUC_MAX_EXPRs.  It would also slow down the
> >> gimple vectorizers.
> >>
> >> I apologize if my loose use of language confused the issue.  It isn't
> >> the whole COND_REDUCTION I'm concerned with, but the REDUC_MAX_EXPRs
> >> that are used by it.
> >>
> >> (The costs in powerpc won't be enormous, but they are definitely
> >> mode-dependent in a way that vec_to_scalar is not.  We'll need 2*log(n)
> >> instructions, where n is the number of elements in the mode being
> >> vectorized.)
> >
> > IIUC, on AArch64 a reduc_max_expr matches with a single reduction
> > operation but on AArch32 Neon a reduc_smax gets implemented as a
> > sequence of vpmax instructions which sounds similar to the PowerPC
> > example as well. Thus mapping a reduc_smax expression to the cost of a
> > vec_to_scalar is probably not right in this particular situation.
> 
> But AIUI vec_to_scalar exists to represent reduction operations.
> (I see it was also used for strided stores.)  So for better or worse,
> I think the interface that Alan's patch uses is the defined interface
> for measuring the cost of a reduction.
>
> If a backend implemented reduc_umax_scal_optab in current sources,
> without Alan's patch, then that optab would be used for a "natural"
> unsigned max reduction (i.e. a reduction of a MAX_EXPR with unsigned
> inputs).  vec_to_scalar would be used to weigh the cost of the epilogue
> reduction statement in that case.
> 
> So if defining a new Power pattern might cause Alan's patch to trigger
> in cases where the transformation is actually too expensive, I would
> expect the same to be true for a natural umax without Alan's patch.
> The two cases ought to underestimate the true cost by the same degree.
> 
> In other words, whether the cost interface is flexible enough is
> definitely interesting but seems orthogonal to this patch.

That's a reasonable argument, but is this not a good opportunity to fix
an incorrect assumption in the vectorizer cost model?  I would prefer
for this issue not to get lost on a technicality.

The vectorizer cost model has many small flaws, and we all need to be
mindful of trying to improve it at every opportunity, rather than
allowing it to continue to degrade.  We just had a big discussion about
improving cost models at the last Cauldron, and my request is consistent
with that direction.

Saying that all reductions have equivalent performance is unlikely to be
true for many platforms.  On PowerPC, for example, a PLUS reduction has
very different cost from a MAX reduction.  If the model isn't
fine-grained enough, let's please be aggressive about fixing it.  I'm
fine if it's a separate patch, but in my mind this shouldn't be allowed
to languish.

Thanks,
Bill

> 
> Thanks,
> Richard
>