From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 109655 invoked by alias); 26 Jan 2016 17:39:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 109641 invoked by uid 89); 26 Jan 2016 17:39:34 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.8 required=5.0 tests=AWL,BAYES_00,SPF_PASS autolearn=ham version=3.3.2 spammy=prefers, 21PM, 21pm, sk:test_co X-HELO: eu-smtp-delivery-143.mimecast.com Received: from eu-smtp-delivery-143.mimecast.com (HELO eu-smtp-delivery-143.mimecast.com) (207.82.80.143) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 26 Jan 2016 17:39:31 +0000 Received: from emea01-db3-obe.outbound.protection.outlook.com (mail-db3lrp0077.outbound.protection.outlook.com [213.199.154.77]) (Using TLS) by eu-smtp-1.mimecast.com with ESMTP id uk-mta-20-rY7dL_StQC6NJf1ygfp7Wg-1; Tue, 26 Jan 2016 17:39:26 +0000 Received: from AM3PR08MB0088.eurprd08.prod.outlook.com (2a01:111:e400:8847::18) by AM3PR08MB0595.eurprd08.prod.outlook.com (2a01:111:e400:c408::17) with Microsoft SMTP Server (TLS) id 15.1.390.13; Tue, 26 Jan 2016 17:39:24 +0000 Received: from AM3PR08MB0088.eurprd08.prod.outlook.com ([fe80::9820:2c1d:325d:6c0f]) by AM3PR08MB0088.eurprd08.prod.outlook.com ([fe80::9820:2c1d:325d:6c0f%18]) with mapi id 15.01.0390.013; Tue, 26 Jan 2016 17:39:24 +0000 From: Wilco Dijkstra To: James Greenhalgh CC: "gcc-patches@gcc.gnu.org" , nd Subject: Re: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS Date: Tue, 26 Jan 2016 17:39:00 -0000 Message-ID: References: <20151216095418.GA39374@arm.com> <20151216142708.GA10510@arm.com>, In-Reply-To: x-microsoft-exchange-diagnostics: 1;AM3PR08MB0595;5:UVWek+A5VVu7YpfrMmTxZsUJfNYzVQREYw12jbhKHnkCrwjaZtS0rEssolTWQDTmqG1nrq/CoOvbdl4TkHvXiHq64Bo5jLFN5eQKW31IBKLPovuLHFFkbpJ3/jPYzos2C3IUSQ5b1AhBy6f0lTTI+A==;24:EqMhYCHCePwku0TZwHgKoxgH3dZkVEwRR8gxgLN6L8WhVFoKHBB4ZqvkNRNrrF8j7qelIo2zZPyCY6Zo1SfIS9PsETrOR1+XiynT0G06Pt8=;20:CJ+P6wLvIvAkpFQUdWcNYMvUxLXwlSFF6yzzirQrNVErphc8DydezjDwGMGYn/PAs6qGkJQTHDPIZig9VkR3dKzpepzpb3nbG9pgjNA175ajR5z6blNDfM6mKvPTAybR+5fmRyd/aFfkFyLZpjwPDhQQwegKN1njQ1CN0zgqJhE= x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:AM3PR08MB0595; x-ms-office365-filtering-correlation-id: 0799e28b-8563-414b-5698-08d32677a1c8 nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(123027)(601004)(2401047)(8121501046)(520078)(5005006)(10201501046)(3002001);SRVR:AM3PR08MB0595;BCL:0;PCL:0;RULEID:;SRVR:AM3PR08MB0595; x-forefront-prvs: 08331F819E x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(189002)(377424004)(24454002)(54534003)(13464003)(199003)(4326007)(2906002)(87936001)(19580405001)(86362001)(40100003)(92566002)(93886004)(3280700002)(5250100002)(5004730100002)(450100001)(102836003)(1096002)(110136002)(76576001)(1220700001)(11100500001)(2900100001)(74316001)(15975445007)(54356999)(5001960100002)(3846002)(2950100001)(81156007)(33656002)(189998001)(97736004)(6116002)(66066001)(19580395003)(5002640100001)(5008740100001)(50986999)(106356001)(105586002)(586003)(76176999)(101416001)(5003600100002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM3PR08MB0595;H:AM3PR08MB0088.eurprd08.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;MX:1;A:1;LANG:en; spamdiagnosticoutput: 1:23 spamdiagnosticmetadata: NSPM MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 26 Jan 2016 17:39:24.4877 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM3PR08MB0595 X-MC-Unique: rY7dL_StQC6NJf1ygfp7Wg-1 Content-Type: text/plain; charset=WINDOWS-1252 Content-Transfer-Encoding: quoted-printable X-SW-Source: 2016-01/txt/msg02016.txt.bz2 ping (note the regressions discussed below are addressed by https://gcc.gnu= .org/ml/gcc-patches/2016-01/msg01761.html) ________________________________________ From: Wilco Dijkstra Sent: 17 December 2015 13:37 To: James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd Subject: RE: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS James Greenhalgh wrote: > On Wed, Dec 16, 2015 at 01:05:21PM +0000, Wilco Dijkstra wrote: > > James Greenhalgh wrote: > > > On Tue, Dec 15, 2015 at 10:54:49AM +0000, Wilco Dijkstra wrote: > > > > ping > > > > > > > > > -----Original Message----- > > > > > From: Wilco Dijkstra [mailto:Wilco.Dijkstra@arm.com] > > > > > Sent: 06 November 2015 20:06 > > > > > To: 'gcc-patches@gcc.gnu.org' > > > > > Subject: [PATCH][AArch64] Add TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CL= ASS > > > > > > > > > > This patch adds support for the TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_= CLASS > > > > > hook. When the cost of GENERAL_REGS and FP_REGS is identical, the= register > > > > > allocator always uses ALL_REGS even when it has a much higher cos= t. The > > > > > hook changes the class to either FP_REGS or GENERAL_REGS dependin= g on the > > > > > mode of the register. This results in better register allocation = overall, > > > > > fewer spills and reduced codesize - particularly in SPEC2006 game= ss. > > > > > > > > > > GCC regression passes with several minor fixes. > > > > > > > > > > OK for commit? > > > > > > > > > > ChangeLog: > > > > > 2015-11-06 Wilco Dijkstra > > > > > > > > > > * gcc/config/aarch64/aarch64.c > > > > > (TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS): New define. > > > > > (aarch64_ira_change_pseudo_allocno_class): New function. > > > > > * gcc/testsuite/gcc.target/aarch64/cvtf_1.c: Build with -O2. > > > > > * gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > (test_corners_sisd_di): Improve force to SIMD register. > > > > > (test_corners_sisd_si): Likewise. > > > > > * gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c: Build wit= h -O2. > > > > > * gcc/testsuite/gcc.target/aarch64/vect-ld1r-compile-fp.c: > > > > > Remove scan-assembler check for ldr. > > > > > > Drop the gcc/ from the ChangeLog. > > > > > > > > -- > > > > > gcc/config/aarch64/aarch64.c | 22 ++++++++= ++++++++++++++ > > > > > gcc/testsuite/gcc.target/aarch64/cvtf_1.c | 2 +- > > > > > gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c | 4 ++-- > > > > > gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c | 2 +- > > > > > .../gcc.target/aarch64/vect-ld1r-compile-fp.c | 1 - > > > > > > These testsuite changes concern me a bit, and you don't mention them = beyond > > > saying they are minor fixes... > > > > Well any changes to register allocator preferencing would cause fallout= in > > tests that are assuming which register is allocated, especially if they= use > > nasty inline assembler hacks to do so... > > Sure, but the testcases here each operate on data that should live in > FP_REGS given the initial conditions that the nasty hacks try to mimic - > that's what makes the regressions notable. > > > > > > > > #define FCVTDEF(ftype,itype) \ > > > > > void \ > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c b/= gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > index 363f554..8465c89 100644 > > > > > --- a/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/scalar_shift_1.c > > > > > @@ -186,9 +186,9 @@ test_corners_sisd_di (Int64x1 b) > > > > > { > > > > > force_simd_di (b); > > > > > b =3D b >> 63; > > > > > + force_simd_di (b); > > > > > b =3D b >> 0; > > > > > b +=3D b >> 65; /* { dg-warning "right shift count >=3D width = of type" } */ > > > > > - force_simd_di (b); > > > > > > This one I don't understand, but seems to say that we've decided to m= ove > > > b out of FP_REGS after getting it in there for b =3D b << 63; ? So th= is is > > > another register allocator regression? > > > > No, basically the register allocator is now making better decisions as = to > > where to allocate integer variables. It will only allocate them to FP > > registers if they are primarily used by other FP operations. The > > force_simd_di inline assembler tries to mimic FP uses, and if there are > > enough of them at the right places then everything works as expected. = If > > however you do 3 consecutive integer operations then the allocator will= now > > correctly prefer to allocate them to the integer registers (while previ= ously > > it wouldn't, which is inefficient). > > I'm not sure I understand this argument in the abstract (though I believe > it for some of the supported cores for the AArch64 target). At an abstract > level, given a set of operations which can execute in either FP_REGS or > GENERAL_REGS and initial and post conditions that allocate all input and > output registers from those operations to FP_REGS, I would expect those > operations to take place using FP_REGS? Your patch seems to break this > expectation? No my patch doesn't break that expectation. The goal is that if the cost of allocating to either integer or FP registers is the same, we prefer the most natural register file based on the type. We'll continue to allocate integer operations to FP_REGS if that has the lowest cost. Like I mentioned in the explanation, the issue is that the register allocat= or simply ignores the the much higher cost of ALL_REGS and uses it eventhough it resu= lts in very suboptimal allocations and a large number of redundant int<->fp moves. This patch fixes this by forcing the preference to FP_REGS or GENERAL_REGS = if it Is ALL_REGS. > > > > > diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c b/gcc= /testsuite/gcc.target/aarch64/vdup_lane_2.c > > > > > index a49db3e..c5a9c52 100644 > > > > > --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c > > > > > +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c > > > > > @@ -1,6 +1,6 @@ > > > > > /* Test vdup_lane intrinsics work correctly. */ > > > > > /* { dg-do run } */ > > > > > -/* { dg-options "-O1 --save-temps" } */ > > > > > +/* { dg-options "-O2 --save-temps" } */ > > > > > > Another -O1 regression ? > > > > No, it's triggering a bug in the -O1 register preferencing that causes = incorrect preferences to be > > selected despite the costs being right. The cost calculation with -O1 f= or eg. > > wrap_vdupb_lane_s8_0() in vdup_lane_2.c: > > > > Pass 0 for finding pseudo/allocno costs > > > > r79: preferred FP_REGS, alternative GENERAL_REGS, allocno GENERAL_R= EGS > > a1 (r79,l0) best GENERAL_REGS, allocno GENERAL_REGS > > r78: preferred GENERAL_REGS, alternative NO_REGS, allocno GENERAL_R= EGS > > a0 (r78,l0) best GENERAL_REGS, allocno GENERAL_REGS > > > > a0(r78,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 F= P_LO_REGS:5000,5000 FP_REGS:5000,5000 > ALL_REGS:10000,10000 MEM:9000,9000 > > a1(r79,l0) costs: CALLER_SAVE_REGS:5000,5000 GENERAL_REGS:5000,5000 F= P_LO_REGS:0,0 FP_REGS:0,0 ALL_REGS:10000,10000 > MEM:9000,9000 > > > > So it correctly prefers FP_REGS for r79 as it has the lowest cost, but = then > > forces the allocno and best register to GENERAL_REGS... We could work a= round > > it by not having the "r" variant first in the aarch64_get_lane patterns= and > > further discouraging its use via "?r", but that's a different patch. > > Well, that patch (moving "r" alternative away from first) does seem to > better fit with what we've done elsewhere in aarch64-simd.md (e.g. > aarch64_combinez below). Does making this change obviate the need to > change these testcases to -O1? If so, I'd rather break them with your pat= ch > and fix it in a follow-up than paper over the cracks. Yes, using "?r" works. I can easily add this to my combinez patch - the iss= ue is that there are a lot more patterns that have the same problem, so we also need a fix in the = register allocator (we need to do both as reload also has bugs where it completely ignores all= the costs and preferences, so the order really matters a lot...). So I looked a bit further, and the bug is that the preferencing also forces= ALL_REGS if the GENERAL_REGS and FP_REGS costs are not equal but both are lower than the me= mory cost (again even if ALL_REGS cost is higher than the memory cost!). In that case TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS will force the preferen= ce irrespectively of the best preference. To fix this we need to extend it wit= h the best register class (and possibly alternate class) so we can avoid forcing the wrong pref= erence if there already is a good preference (ie. not ALL_REGS). I'll write a patch for that - it's= trivial but presumably too late for GCC6 as it affects a target callback... Wilco