From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 42482 invoked by alias); 29 Nov 2016 21:37:35 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 42461 invoked by uid 89); 29 Nov 2016 21:37:32 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=AWL,BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=xj, x7, x6, 1900 X-HELO: SNT004-OMC1S23.hotmail.com Received: from snt004-omc1s23.hotmail.com (HELO SNT004-OMC1S23.hotmail.com) (65.55.90.34) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Tue, 29 Nov 2016 21:37:21 +0000 Received: from EUR02-AM5-obe.outbound.protection.outlook.com ([65.55.90.7]) by SNT004-OMC1S23.hotmail.com over TLS secured channel with Microsoft SMTPSVC(7.5.7601.23008); Tue, 29 Nov 2016 13:37:20 -0800 Received: from HE1EUR02FT007.eop-EUR02.prod.protection.outlook.com (10.152.10.58) by HE1EUR02HT238.eop-EUR02.prod.protection.outlook.com (10.152.10.235) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.4; Tue, 29 Nov 2016 21:37:18 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com (10.152.10.55) by HE1EUR02FT007.mail.protection.outlook.com (10.152.10.244) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P384) id 15.1.734.4 via Frontend Transport; Tue, 29 Nov 2016 21:37:18 +0000 Received: from AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) by AM4PR0701MB2162.eurprd07.prod.outlook.com ([10.167.132.147]) with mapi id 15.01.0761.009; Tue, 29 Nov 2016 21:37:18 +0000 From: Bernd Edlinger To: Wilco Dijkstra , Ramana Radhakrishnan CC: GCC Patches , Kyrill Tkachov , Richard Earnshaw Subject: Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308) Date: Tue, 29 Nov 2016 21:37:00 -0000 Message-ID: References: In-Reply-To: authentication-results: arm.com; dkim=none (message not signed) header.d=none;arm.com; dmarc=none action=none header.from=hotmail.de; x-incomingtopheadermarker: OriginalChecksum:;UpperCasedChecksum:;SizeAsReceived:7810;Count:37 x-ms-exchange-messagesentrepresentingtype: 1 x-incomingheadercount: 37 x-eopattributedmessage: 0 x-microsoft-exchange-diagnostics: 1;HE1EUR02HT238;7:EWrLfBD0TUx+Th0K2a3afzwDtg7Yx+y2VrxXG0/JLTeR+UQP0ej5WmI8IqJXZqTYnXcQp52XRut/Bo1NRQmxqauF1Kgj8OGLr9HUEc5Ud33A7i9xt+DXJ6pDdUhxrDoZcMEtOzbiuExyPnoYTdBPuT9yR2paVlLQ6Yj2RVjsWaDvxjcfZxP/Icf+YBbtrDLvsBcxe7MJA5y78Leajcw1l3yIhCptAdtKSRWISzs5Zk6tpr+SBq9fWq9fiH+/gZJYq6pCZigHybprpxF2iNJKd7eTCKo+rVhWsTCpdrHa1rVa+Ce50nynnObReRdn2I9Uaii5hS0Aoqt0V5qu74EcaaEv+Eo2fyA6rUqPufyo/Bg= x-forefront-antispam-report: EFV:NLI;SFV:NSPM;SFS:(10019020)(98900003);DIR:OUT;SFP:1102;SCL:1;SRVR:HE1EUR02HT238;H:AM4PR0701MB2162.eurprd07.prod.outlook.com;FPR:;SPF:None;LANG:en; x-ms-office365-filtering-correlation-id: 89f81764-b060-48ae-1626-08d4189fe3e3 x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(1601124038)(1603103113)(1601125047);SRVR:HE1EUR02HT238; x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(432015012)(82015046);SRVR:HE1EUR02HT238;BCL:0;PCL:0;RULEID:;SRVR:HE1EUR02HT238; x-forefront-prvs: 01415BB535 spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="Windows-1252" Content-ID: <577897F380507F45AA4D3764323E0131@eurprd07.prod.outlook.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: outlook.com X-MS-Exchange-CrossTenant-originalarrivaltime: 29 Nov 2016 21:37:17.9148 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1EUR02HT238 X-SW-Source: 2016-11/txt/msg02938.txt.bz2 On 11/29/16 16:06, Wilco Dijkstra wrote: > Bernd Edlinger wrote: > > - "TARGET_32BIT && reload_completed > + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed) > && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))" > > This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we'= re > already excluding NEON. > Aehm, no. This would split the addi_neon insn before it is clear if the reload pass will assign a VFP register. With this change the stack usage with -mfpu=3Dneon increases from 2300 to around 2600 bytes. > This patch expands ADD and SUB earlier, so shouldn't we do the same obvio= us > change for the similar instructions CMP and NEG? > Good question. I think the cmp and neg pattern are more complicated and do typically have a more complicated data flow than the other patterns. I tried to create a test case which expands cmpdi and negdi patterns as follows: --- pr77308-1.c 2016-11-25 17:53:20.379141465 +0100 +++ pr77308-2.c 2016-11-29 20:46:51.266948631 +0100 @@ -68,10 +68,10 @@ #define B(x,j) (((SHA_LONG64)(*(((const unsigned char=20 *)(&x))+j)))<<((7-j)*8)) #define PULL64(x)=20 (B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7)) #define ROTR(x,s) (((x)>>s) | (x)<<(64-s)) -#define Sigma0(x) ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39)) -#define Sigma1(x) ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41)) -#define sigma0(x) ~(ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7)) -#define sigma1(x) ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6)) +#define Sigma0(x) (ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39) =3D=3D= =20 (x) ? -(x) : (x)) +#define Sigma1(x) (ROTR((x),14) ^ ROTR(-(x),18) ^ ROTR((x),41) <=20 (x) ? -(x) : (x)) +#define sigma0(x) (ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7) <=3D (x)=20 ? ~(x) : (x)) +#define sigma1(x) ((long long)(ROTR((x),19) ^ ROTR((x),61) ^=20 ((x)>>6)) < (long long)(x) ? -(x) : (x)) #define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z))) #define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z))) This expands *arm_negdi2, *arm_cmpdi_unsigned, *arm_cmpdi_insn. The stack usage is around 1900 bytes with previous patch, and 2300 bytes without. I tried to split *arm_negdi2 and *arm_cmpdi_unsined early, and it gives indeed smaller stack sizes in the test case above (~400 bytes). But when I make *arm_cmpdi_insn split early, it ICEs: --- arm.md.orig 2016-11-27 09:22:41.794790123 +0100 +++ arm.md 2016-11-29 21:51:51.438163078 +0100 @@ -7432,7 +7432,7 @@ (clobber (match_scratch:SI 2 "=3Dr"))] "TARGET_32BIT" "#" ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1" - "&& reload_completed" + "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)" [(set (reg:CC CC_REGNUM) (compare:CC (match_dup 0) (match_dup 1))) (parallel [(set (reg:CC CC_REGNUM) ontop of the latest patch, I got: gcc -S -Os pr77308-2.c -fdump-rtl-all-verbose pr77308-2.c: In function 'sha512_block_data_order': pr77308-2.c:169:1: error: unrecognizable insn: } ^ (insn 4870 4869 1636 87 (set (scratch:SI) (minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4) (subreg:SI (reg:DI 473 [ X$14 ]) 4)) (ltu:SI (reg:CC_C 100 cc) (const_int 0 [0])))) "pr77308-2.c":140 -1 (nil)) pr77308-2.c:169:1: internal compiler error: in extract_insn, at recog.c:2311 0xaf4cd8 _fatal_insn(char const*, rtx_def const*, char const*, int, char=20 const*) ../../gcc-trunk/gcc/rtl-error.c:108 0xaf4d09 _fatal_insn_not_found(rtx_def const*, char const*, int, char=20 const*) ../../gcc-trunk/gcc/rtl-error.c:116 0xac74ef extract_insn(rtx_insn*) ../../gcc-trunk/gcc/recog.c:2311 0x122427a decompose_multiword_subregs ../../gcc-trunk/gcc/lower-subreg.c:1467 0x122550d execute ../../gcc-trunk/gcc/lower-subreg.c:1734 So it is certainly possible, but not really simple to improve the stack size even further. But I would prefer to do that in a separate patch. BTW: there are also negd2_compare, *negdi_extendsidi, *negdi_zero_extendsidi, *thumb2_negdi2. I think it would be a precondition to have test cases that exercise each of these patterns before we try to split these instructions. Bernd.