From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 129506 invoked by alias); 30 Nov 2016 12:01:32 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 129435 invoked by uid 89); 30 Nov 2016 12:01:27 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,RCVD_IN_DNSWL_NONE,SPF_HELO_PASS,SPF_PASS autolearn=ham version=3.3.2 spammy=HAccept-Language:en-GB X-HELO: EUR01-VE1-obe.outbound.protection.outlook.com Received: from mail-ve1eur01on0057.outbound.protection.outlook.com (HELO EUR01-VE1-obe.outbound.protection.outlook.com) (104.47.1.57) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 30 Nov 2016 12:01:17 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com (10.175.46.18) by AM5PR0802MB2385.eurprd08.prod.outlook.com (10.175.43.147) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.761.9; Wed, 30 Nov 2016 12:01:13 +0000 Received: from AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) by AM5PR0802MB2610.eurprd08.prod.outlook.com ([10.175.46.18]) with mapi id 15.01.0761.009; Wed, 30 Nov 2016 12:01:13 +0000 From: Wilco Dijkstra To: Bernd Edlinger , Ramana Radhakrishnan CC: GCC Patches , Kyrill Tkachov , Richard Earnshaw , nd Subject: Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308) Date: Wed, 30 Nov 2016 12:01:00 -0000 Message-ID: References: ,, In-Reply-To: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco.Dijkstra@arm.com; x-ms-office365-filtering-correlation-id: 6dce9bc3-e9f7-4b43-c6b4-08d419189509 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001);SRVR:AM5PR0802MB2385; x-microsoft-exchange-diagnostics: 1;AM5PR0802MB2385;7:JDfeo26XDMWH7h8sXgScs/mJFd4CGFuYdsW9S6XkKAVB20SWv2ewwhA4Gw9SpsdaBFvE2fyYY5NZYcftuEeE8OmqdIeBWOjtBqNZHnbrd4iDKRJ30EqPF3M7fVw420GGIIlLpKvoZIu5hn1ZfaKXOgXyahce14Si/vJonzF0FqCbVEUKJAE9SDeEJxYrgiK5Yr/mbcCIEILEFML2DL3zfLa0DHK9Zt+unKqhGOwQ6bmw1GbrHaLo0eO8vXGUS2lQBl34jzRS3XOmFMzZtTc+ksXXOkOqxnn2gUIzQyYWaO27Ie9P4js//VFAIPhH3DBVfnlLJsWafb35TY2zByV2cv4HIuwpHYxjWvMELoc6Ao5GZulSwkTEBxldx6Nu4Mp66xOPB3Zrqhz9NtC6FsnByoIynozT64ExW5LUg3Y83mcrS6GvM6rhDJkKBbwvBgTpnLbMmUHXleMelRpcqER6BA== nodisclaimer: True x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(5005006)(8121501046)(3002001)(10201501046)(6055026)(6041248)(20161123564025)(20161123562025)(20161123555025)(20161123560025)(20161123558021)(6072148);SRVR:AM5PR0802MB2385;BCL:0;PCL:0;RULEID:;SRVR:AM5PR0802MB2385; x-forefront-prvs: 0142F22657 x-forefront-antispam-report: SFV:NSPM;SFS:(10009020)(6009001)(7916002)(199003)(24454002)(189002)(4326007)(92566002)(101416001)(33656002)(189998001)(7696004)(2906002)(76176999)(54356999)(50986999)(122556002)(81156014)(8676002)(81166006)(2950100002)(93886004)(2900100001)(76576001)(5001770100001)(97736004)(105586002)(3660700001)(5660300001)(106116001)(106356001)(3280700002)(74316002)(86362001)(68736007)(38730400001)(6116002)(102836003)(77096006)(229853002)(3846002)(7846002)(7736002)(6506003)(66066001)(8936002)(39410400001)(305945005)(39450400002)(9686002);DIR:OUT;SFP:1101;SCL:1;SRVR:AM5PR0802MB2385;H:AM5PR0802MB2610.eurprd08.prod.outlook.com;FPR:;SPF:None;PTR:InfoNoRecords;A:1;MX:1;LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Nov 2016 12:01:13.4426 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM5PR0802MB2385 X-SW-Source: 2016-11/txt/msg03020.txt.bz2 Bernd Edlinger wrote: > On 11/29/16 16:06, Wilco Dijkstra wrote: > > Bernd Edlinger wrote: > > > > -=A0 "TARGET_32BIT && reload_completed > > +=A0 "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_compl= eted) > >=A0=A0=A0=A0 && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" > > > > This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since w= e're > > already excluding NEON. > > Aehm, no.=A0 This would split the addi_neon insn before it is clear > if the reload pass will assign a VFP register. Hmm that's strange... This instruction shouldn't be used to also split some= random Neon pattern - for example arm_subdi3 doesn't do the same. To understand and reason about any of these complex patterns they should all work in the same= way... > But when I make *arm_cmpdi_insn split early, it ICEs: (insn 4870 4869 1636 87 (set (scratch:SI) =A0=A0=A0=A0=A0=A0=A0=A0 (minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4) =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 (subreg:SI (reg:DI 473 [ X= $14 ]) 4)) =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 (ltu:SI (reg:CC_C 100 cc) =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0 (const_int 0 [0])))) "pr77= 308-2.c":140 -1 =A0=A0=A0=A0=A0 (nil)) That's easy, we don't have a sbcs , r1, r2 pattern. A quick workar= ound is to create a temporary for operand[2] (if before reload) so it will match th= e standard sbcs pattern, and then the split works fine. > So it is certainly possible, but not really simple to improve the > stack size even further.=A0 But I would prefer to do that in a > separate patch. Yes separate patches would be fine. However there is a lot of scope to impr= ove this further. For example after your patch shifts and logical operations are exp= anded in expand, add/sub are in split1 after combine runs and everything else is spl= it after reload. It doesn't make sense to split different operations at different ti= mes - it means you're still going to get the bad DImode subregs and miss lots of optimizat= ion opportunities due to the mix of partly split and partly not-yet-split opera= tions. > BTW: there are also negd2_compare, *negdi_extendsidi, > *negdi_zero_extendsidi, *thumb2_negdi2. I have a patch to merge thumb2_negdi2 into arm_negdi2. For extends, if we s= plit them at expand time, then none of the combined alu+extend patterns will be neede= d, and that will be a huge simplification. > I think it would be a precondition to have test cases that exercise > each of these patterns before we try to split these instructions. Agreed. Wilco