From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-380890-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 23054 invoked by alias); 17 Oct 2014 07:59:57 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 23034 invoked by uid 89); 17 Oct 2014 07:59:56 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-3.2 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD autolearn=ham version=3.3.2
X-HELO: mx2.suse.de
Received: from cantor2.suse.de (HELO mx2.suse.de) (195.135.220.15) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (CAMELLIA256-SHA encrypted) ESMTPS; Fri, 17 Oct 2014 07:59:55 +0000
Received: from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])	by mx2.suse.de (Postfix) with ESMTP id 4B4BBAAF3;	Fri, 17 Oct 2014 07:59:52 +0000 (UTC)
Date: Fri, 17 Oct 2014 08:00:00 -0000
From: Richard Biener <rguenther@suse.de>
To: Sebastian Pop <sebpop@gmail.com>
cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH][0/n] Merge from match-and-simplify
In-Reply-To: <20141016203852.GB29134@f1.c.bardezibar.internal>
Message-ID: <alpine.LSU.2.11.1410170951450.9891@zhemvz.fhfr.qr>
References: <alpine.LSU.2.11.1410151450430.20733@zhemvz.fhfr.qr> <20141016203852.GB29134@f1.c.bardezibar.internal>
User-Agent: Alpine 2.11 (LSU 23 2013-08-11)
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
X-SW-Source: 2014-10/txt/msg01665.txt.bz2

On Thu, 16 Oct 2014, Sebastian Pop wrote:

> Richard Biener wrote:
> > 
> > I have posted 5 patches as part of a larger series to merge
> > (parts) from the match-and-simplify branch.  While I think
> > there was overall consensus that the idea behind the project
> > is sound there are technical questions left for how the
> > thing should look in the end.  I've raised them in 3/n
> > which is the only patch of the series that contains any
> > patterns sofar.
> > 
> > To re-iterate here (as I expect most people will only look
> > at [0/n] patches ;)), the question is whether we are fine
> > with making fold-const (thus fold_{unary,binary,ternary})
> > not handle some cases it handles currently.
> 
> I have tested on aarch64 all the code in the match-and-simplify against trunk as
> of the last merge at r216315:
> 
> 2014-10-16  Richard Biener  <rguenther@suse.de>
> 
>         Merge from trunk r216235 through r216315.
> 
> Overall, I see a lot of perf regressions (about 2/3 of the tests) than
> improvements (1/3 of the tests).  I will try to reduce tests.

Note that the branch goes much further in exercising the machinery
than I want to merge at this point (that applies mostly to all
passes using the SSA propagator such as CCP and VRP and passes
exercising value-numbering - FRE and PRE).

It may also simply show the effect of now folding all statements
from tree-ssa-forwprop.c.  I have yet to investigate the testsuite
fallout of [1/n] to [5/n] - testresults have been very noisy lately
due to the C11 change and now ICF.

> For instance, saxpy regresses at -O3 on aarch64:
> 
> void saxpy(double* x, double* y, double* z) {
>     int i=0;
>     for (i = 0 ; i < ARRAY_SIZE; i++) {
>         z[i] = x[i] + scalar*y[i];
>     }
> }
> 
> $ diff -u base.s mas.s
> --- base.s      2014-10-16 15:30:15.351430000 -0500
> +++ mas.s       2014-10-16 15:30:16.183035000 -0500
> @@ -2,12 +2,14 @@
>         add     x1, x2, 800
>         ldr     q0, [x0, x2]
>         add     x3, x2, 1600
> +       cmp     x0, 784
>         ldr     q1, [x0, x1]
> +       add     x1, x0, 16
>         fmla    v0.2d, v1.2d, v2.2d
>         str     q0, [x0, x3]
> -       add     x0, x0, 16
> -       cmp     x0, 800
> +       mov     x0, x1
>         bne     .L140
>  .LBE179:
> -       subs    w4, w4, #1
> +       cmp     w4, 1
> +       sub     w4, w4, #1
>         bne     .L139

I don't understand AARCH64 assembly very well but the above looks like
RTL issues and/or IVOPTs issues?

Thanks for doing performance measurements.

Richard.