From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 112546 invoked by alias); 12 Aug 2015 08:32:01 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 112536 invoked by uid 89); 12 Aug 2015 08:32:00 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 X-HELO: gate.crashing.org Received: from gate.crashing.org (HELO gate.crashing.org) (63.228.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Wed, 12 Aug 2015 08:31:58 +0000 Received: from gate.crashing.org (localhost.localdomain [127.0.0.1]) by gate.crashing.org (8.14.1/8.13.8) with ESMTP id t7C8Vnqo023227; Wed, 12 Aug 2015 03:31:49 -0500 Received: (from segher@localhost) by gate.crashing.org (8.14.1/8.14.1/Submit) id t7C8Vm4o023226; Wed, 12 Aug 2015 03:31:48 -0500 Date: Wed, 12 Aug 2015 08:32:00 -0000 From: Segher Boessenkool To: Richard Henderson Cc: gcc-patches@gcc.gnu.org, David Edelsohn , Marcus Shawcroft , Richard Earnshaw Subject: Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Message-ID: <20150812083148.GE4711@gate.crashing.org> References: <1439341904-9345-1-git-send-email-rth@redhat.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1439341904-9345-1-git-send-email-rth@redhat.com> User-Agent: Mutt/1.4.2.3i X-IsSubscribed: yes X-SW-Source: 2015-08/txt/msg00571.txt.bz2 Hi! This looks really nice. I'll try it out soon :-) Some comments now... On Tue, Aug 11, 2015 at 06:11:29PM -0700, Richard Henderson wrote: > However, the way that aarch64 and alpha have done it hasn't > been ideal, in that there's a fairly costly search that must > be done every time. I've thought before about changing this > so that we would be able to cache results, akin to how we do > it in expmed.c for multiplication. Is there something that makes the cache not get too big? Do we care, anyway? > Some notes about ppc64 in particular: > > * Constants aren't split until quite late, preventing all hope of > CSE'ing portions of the generated code. My gut feeling is that > this is in general a mistake, but... Constant arguments to IOR/XOR/AND that can be done with two machine insns are split at expand. Then combine comes along and just loves to recombine them, but then they are split again at split1 (before RA). For AND this was optimal in my experiments; for IOR/XOR it has been this way since the dawn of time. Simple SETs aren't split at expand, maybe they should be. But they are split at split1. > I did attempt to fix it, and got nothing for my troubles except > poorer code generation for AND/IOR/XOR with non-trivial constants. Could you give an example of code that isn't split early enough? > I'm somewhat surprised that the operands to the logicals aren't > visible at rtl generation time, given all the work done in gimple. So am I, because that is not what I'm seeing? E.g. int f(int x) { return x | 0x12345678; } is expanded as two IORs already. There must be something in your testcases that prevents this? > And failing that, combine has enough REG_EQUAL notes that it ought > to be able to put things back together and see the simpler pattern. > > Perhaps there's some other predication or costing error that's > getting in the way, and it simply wasn't obvious to me. In any > case, nothing in this patch set addresses this at all. The instruction (set (reg) (const_int 0x12345678)) is costed as 4 (i.e. one insn). That cannot be good. This is alternative #5 in *movsi_internal1_single (there are many more variants of that pattern). > * I go on to add 4 new methods of generating a constant, each of > which typically saves 2 insns over the current algorithm. There > are a couple more that might be useful but... New methods look to be really simple to add with your framework, very nice :-) > * Constants are split *really* late. In particular, after reload. Yeah that is bad. But I'm still not seeing it. Hrm, maybe only DImode ones? > It would be awesome if we could at least have them all split before > register allocation And before sched1, yeah. > so that we arrange to use ADDI and ADDIS when > that could save a few instructions. But that does of course mean > avoiding r0 for the input. That is no problem at all before RA. > Again, nothing here attempts to change > when constants are split. > > * This is the only platform for which I bothered collecting any sort > of performance data: > > As best I can tell, there is a 9% improvement in bootstrap speed > for ppc64. That is, 10 minutes off the original 109 minute build. That is, wow. Wow :-) Have you looked at generated code quality? Segher