From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 51186 invoked by alias); 2 Feb 2016 10:03:00 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 51167 invoked by uid 89); 2 Feb 2016 10:02:59 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-0.7 required=5.0 tests=AWL,BAYES_00,RP_MATCHES_RCVD,SPF_PASS autolearn=ham version=3.3.2 spammy=SPEC2006, prefers, general_regs, GENERAL_REGS X-HELO: cam-smtp0.cambridge.arm.com Received: from fw-tnat.cambridge.arm.com (HELO cam-smtp0.cambridge.arm.com) (217.140.96.140) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with (AES256-SHA encrypted) ESMTPS; Tue, 02 Feb 2016 10:02:55 +0000 Received: from arm.com (e107456-lin.cambridge.arm.com [10.2.206.78]) by cam-smtp0.cambridge.arm.com (8.13.8/8.13.8) with ESMTP id u12A2qlT021503; Tue, 2 Feb 2016 10:02:52 GMT Date: Tue, 02 Feb 2016 10:03:00 -0000 From: James Greenhalgh To: Wilco Dijkstra Cc: "gcc-patches@gcc.gnu.org" , nd Subject: Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS Message-ID: <20160202100251.GA4912@arm.com> References: <20151216095418.GA39374@arm.com> <20151216142708.GA10510@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-IsSubscribed: yes X-SW-Source: 2016-02/txt/msg00095.txt.bz2 On Tue, Jan 26, 2016 at 05:39:24PM +0000, Wilco Dijkstra wrote: > ping (note the regressions discussed below are addressed by https://gcc.gnu.org/ml/gcc-patches/2016-01/msg01761.html) OK, but please be extra vigilant for any fallout on AArch64 after this and the follow-up linked above is applied. Thanks, James > James Greenhalgh wrote: > > On Wed, Dec 16, 2015 at 01:05:21PM +0000, Wilco Dijkstra wrote: > > > James Greenhalgh wrote: > > > > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote: > > > > > ping > > > > > > > > > > > -----Original Message----- > > > > > > From: Wilco Dijkstra [mailto:Wilco.Dijkstra@arm.com] > > > > > > Sent: 06 November 2015 20:06 > > > > > > To: 'gcc-patches@gcc.gnu.org' > > > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > > > > > > > > > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS > > > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the register > > > > > > allocator always uses ALL_REGS even when it has a much higher cost. The > > > > > > hook changes the class to either FP_REGS or GENERAL_REGS depending on the > > > > > > mode of the register. This results in better register allocation overall, > > > > > > fewer spills and reduced codesize - particularly in SPEC2006 gamess. > > > > > > > > > > > > GCC regression passes with several minor fixes. > > > > > > > > > > > > OK for commit? > > > > > > > > > > > > ChangeLog: > > > > > > 2015-11-06 Wilco Dijkstra > > > > > > > > > > > > * gcc/config/aarch64/aarch64.c > > > > > > (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define. > > > > > > (aarch64_ira_change_pseudo_allocno_class): New function. > > > > > > * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2. > > > > > > * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > > (test_corners_sisd_di): Improve force to SIMD register. > > > > > > (test_corners_sisd_si): Likewise. > > > > > > * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build with -O2. > > > > > > * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c: > > > > > > Remove scan-assembler check for ldr. > > > > > > > > Drop the gcc/ from the ChangeLog. > > > > > > > > > > -- > > > > > > gcc/config/aarch64/aarch64.c | 22 ++++++++++++++++++++++ > > > > > > gcc/testsuite/gcc.target/aarch64/cvtf_1.c | 2 +- > > > > > > gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c | 4 ++-- > > > > > > gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c | 2 +- > > > > > > .../gcc.target/aarch64/vect-ld1r-compile-fp.c | 1 - > > > > > > > > These testsuite changes concern me a bit, and you don't mention them beyond > > > > saying they are minor fixes... > > > > > > Well any changes to register allocator preferencing would cause fallout in > > > tests that are assuming which register is allocated, especially if they use > > > nasty inline assembler hacks to do so... > > > > Sure, but the testcases here each operate on data that should live in > > FP_REGS given the initial conditions that the nasty hacks try to mimic - > > that's what makes the regressions notable. > > > > > > > > > > > #define FCVTDEF(ftype,itype) \ > > > > > > void \ > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > > index 363f554..8465c89 100644 > > > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b) > > > > > > { > > > > > > force_simd_di (b); > > > > > > b = b >> 63; > > > > > > + force_simd_di (b); > > > > > > b = b >> 0; > > > > > > b += b >> 65; /* { dg-warning "right shift count >= width of type" } */ > > > > > > - force_simd_di (b); > > > > > > > > This one I don't understand, but seems to say that we've decided to move > > > > b out of FP_REGS after getting it in there for b = b << 63; ? So this is > > > > another register allocator regression? > > > > > > No, basically the register allocator is now making better decisions as to > > > where to allocate integer variables. It will only allocate them to FP > > > registers if they are primarily used by other FP operations. The > > > force_simd_di inline assembler tries to mimic FP uses, and if there are > > > enough of them at the right places then everything works as expected. If > > > however you do 3 consecutive integer operations then the allocator will now > > > correctly prefer to allocate them to the integer registers (while previously > > > it wouldn't, which is inefficient). > > > > I'm not sure I understand this argument in the abstract (though I believe > > it for some of the supported cores for the AArch64 target). At an abstract > > level, given a set of operations which can execute in either FP_REGS or > > GENERAL_REGS and initial and post conditions that allocate all input and > > output registers from those operations to FP_REGS, I would expect those > > operations to take place using FP_REGS? Your patch seems to break this > > expectation? > > No my patch doesn't break that expectation. The goal is that if the cost of > allocating to either integer or FP registers is the same, we prefer the most > natural register file based on the type. We'll continue to allocate integer > operations to FP_REGS if that has the lowest cost. > > Like I mentioned in the explanation, the issue is that the register allocator simply > ignores the the much higher cost of ALL_REGS and uses it eventhough it results in > very suboptimal allocations and a large number of redundant int<->fp moves. > This patch fixes this by forcing the preference to FP_REGS or GENERAL_REGS if it > Is ALL_REGS. > > > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c > > > > > > index a49db3e..c5a9c52 100644 > > > > > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c > > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c > > > > > > @@ -1,6 +1,6 @@ > > > > > > /* Test vdup_lane intrinsics work correctly. */ > > > > > > /* { dg-do run } */ > > > > > > -/* { dg-options "-O1 --save-temps" } */ > > > > > > +/* { dg-options "-O2 --save-temps" } */ > > > > > > > > Another -O1 regression ? > > > > > > No, it's triggering a bug in the -O1 register preferencing that causes incorrect preferences to be > > > selected despite the costs being right. The cost calculation with -O1 for eg. > > > wrap_vdupb_lane_s8_0() in vdup_lane_2.c: > > > > > > Pass 0 for finding pseudo/allocno costs > > > > > > r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_REGS > > > a1 (r79,l0) best GENERAL_REGS, allocno GENERAL_REGS > > > r78: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_REGS > > > a0 (r78,l0) best GENERAL_REGS, allocno GENERAL_REGS > > > > > > a0(r78,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 FP_LO_REGS:5000,5000 FP_REGS:5000,5000 > > ALL_REGS:10000,10000 MEM:9000,9000 > > > a1(r79,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 FP_LO_REGS:0,0 FP_REGS:0,0 ALL_REGS:10000,10000 > > MEM:9000,9000 > > > > > > So it correctly prefers FP_REGS for r79 as it has the lowest cost, but then > > > forces the allocno and best register to GENERAL_REGS... We could work around > > > it by not having the "r" variant first in the aarch64_get_lane patterns and > > > further discouraging its use via "?r", but that's a different patch. > > > > Well, that patch (moving "r" alternative away from first) does seem to > > better fit with what we've done elsewhere in aarch64-simd.md (e.g. > > aarch64_combinez below). Does making this change obviate the need to > > change these testcases to -O1? If so, I'd rather break them with your patch > > and fix it in a follow-up than paper over the cracks. > > Yes, using "?r" works. I can easily add this to my combinez patch - the issue is that there are a > lot more patterns that have the same problem, so we also need a fix in the register allocator > (we need to do both as reload also has bugs where it completely ignores all the costs and > preferences, so the order really matters a lot...). > > So I looked a bit further, and the bug is that the preferencing also forces ALL_REGS if the > GENERAL_REGS and FP_REGS costs are not equal but both are lower than the memory cost > (again even if ALL_REGS cost is higher than the memory cost!). > > In that case TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS will force the preference > irrespectively of the best preference. To fix this we need to extend it with the best register > class (and possibly alternate class) so we can avoid forcing the wrong preference if there already > is a good preference (ie. not ALL_REGS). I'll write a patch for that - it's trivial but presumably too > late for GCC6 as it affects a target callback... > > Wilco