From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-405034-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 112546 invoked by alias); 12 Aug 2015 08:32:01 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 112536 invoked by uid 89); 12 Aug 2015 08:32:00 -0000
Authentication-Results: sourceware.org; auth=none
X-Virus-Found: No
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2
X-HELO: gate.crashing.org
Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 12 Aug 2015 08:31:58 +0000
Received: from gate.crashing.org (localhost.localdomain [127.0.0.1])	by gate.crashing.org (8.14.1/8.13.8) with ESMTP id t7C8Vnqo023227;	Wed, 12 Aug 2015 03:31:49 -0500
Received: (from segher@localhost)	by gate.crashing.org (8.14.1/8.14.1/Submit) id t7C8Vm4o023226;	Wed, 12 Aug 2015 03:31:48 -0500
Date: Wed, 12 Aug 2015 08:32:00 -0000
From: Segher Boessenkool <segher@kernel.crashing.org>
To: Richard Henderson <rth@redhat.com>
Cc: gcc-patches@gcc.gnu.org, David Edelsohn <dje.gcc@gmail.com>,        Marcus Shawcroft <marcus.shawcroft@arm.com>,        Richard Earnshaw <richard.earnshaw@arm.com>
Subject: Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
Message-ID: <20150812083148.GE4711@gate.crashing.org>
References: <1439341904-9345-1-git-send-email-rth@redhat.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1439341904-9345-1-git-send-email-rth@redhat.com>
User-Agent: Mutt/1.4.2.3i
X-IsSubscribed: yes
X-SW-Source: 2015-08/txt/msg00571.txt.bz2

Hi!

This looks really nice.  I'll try it out soon :-)

Some comments now...


On Tue, Aug 11, 2015 at 06:11:29PM -0700, Richard Henderson wrote:
> However, the way that aarch64 and alpha have done it hasn't
> been ideal, in that there's a fairly costly search that must
> be done every time.  I've thought before about changing this
> so that we would be able to cache results, akin to how we do
> it in expmed.c for multiplication.

Is there something that makes the cache not get too big?  Do we
care, anyway?

> Some notes about ppc64 in particular:
> 
>   * Constants aren't split until quite late, preventing all hope of
>     CSE'ing portions of the generated code.  My gut feeling is that
>     this is in general a mistake, but...

Constant arguments to IOR/XOR/AND that can be done with two machine
insns are split at expand.  Then combine comes along and just loves
to recombine them, but then they are split again at split1 (before
RA).

For AND this was optimal in my experiments; for IOR/XOR it has been
this way since the dawn of time.

Simple SETs aren't split at expand, maybe they should be.  But they
are split at split1.

>     I did attempt to fix it, and got nothing for my troubles except
>     poorer code generation for AND/IOR/XOR with non-trivial constants.

Could you give an example of code that isn't split early enough?

>     I'm somewhat surprised that the operands to the logicals aren't
>     visible at rtl generation time, given all the work done in gimple.

So am I, because that is not what I'm seeing?  E.g.

int f(int x) { return x | 0x12345678; }

is expanded as two IORs already.  There must be something in your
testcases that prevents this?

>     And failing that, combine has enough REG_EQUAL notes that it ought
>     to be able to put things back together and see the simpler pattern.
> 
>     Perhaps there's some other predication or costing error that's
>     getting in the way, and it simply wasn't obvious to me.   In any
>     case, nothing in this patch set addresses this at all.

The instruction (set (reg) (const_int 0x12345678)) is costed as 4
(i.e. one insn).  That cannot be good.  This is alternative #5 in
*movsi_internal1_single (there are many more variants of that
pattern).

>   * I go on to add 4 new methods of generating a constant, each of
>     which typically saves 2 insns over the current algorithm.  There
>     are a couple more that might be useful but...

New methods look to be really simple to add with your framework,
very nice :-)

>   * Constants are split *really* late.  In particular, after reload.

Yeah that is bad.  But I'm still not seeing it.  Hrm, maybe only
DImode ones?

>     It would be awesome if we could at least have them all split before
>     register allocation

And before sched1, yeah.

>     so that we arrange to use ADDI and ADDIS when
>     that could save a few instructions.  But that does of course mean
>     avoiding r0 for the input.

That is no problem at all before RA.

>     Again, nothing here attempts to change
>     when constants are split.
> 
>   * This is the only platform for which I bothered collecting any sort
>     of performance data:
> 
>     As best I can tell, there is a 9% improvement in bootstrap speed
>     for ppc64.  That is, 10 minutes off the original 109 minute build.

That is, wow.  Wow :-)

Have you looked at generated code quality?


Segher