From: Bernd Edlinger <bernd.edlinger@hotmail.de>
To: Wilco Dijkstra <Wilco.Dijkstra@arm.com>,
Ramana Radhakrishnan <ramana.gcc@googlemail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>,
Richard Earnshaw <Richard.Earnshaw@arm.com>
Subject: Re: [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)
Date: Tue, 29 Nov 2016 21:37:00 -0000 [thread overview]
Message-ID: <AM4PR0701MB21628562505A31E0C0630660E48D0@AM4PR0701MB2162.eurprd07.prod.outlook.com> (raw)
In-Reply-To: <VI1PR0802MB2621FFBFA3252B40E5978C9F838D0@VI1PR0802MB2621.eurprd08.prod.outlook.com>
On 11/29/16 16:06, Wilco Dijkstra wrote:
> Bernd Edlinger wrote:
>
> - "TARGET_32BIT && reload_completed
> + "TARGET_32BIT && ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)
> && ! (TARGET_NEON && IS_VFP_REGNUM (REGNO (operands[0])))"
>
> This is equivalent to "&& (!TARGET_IWMMXT || reload_completed)" since we're
> already excluding NEON.
>
Aehm, no. This would split the addi_neon insn before it is clear
if the reload pass will assign a VFP register.
With this change the stack usage with -mfpu=neon increases
from 2300 to around 2600 bytes.
> This patch expands ADD and SUB earlier, so shouldn't we do the same obvious
> change for the similar instructions CMP and NEG?
>
Good question. I think the cmp and neg pattern are more complicated
and do typically have a more complicated data flow than the other
patterns.
I tried to create a test case which expands cmpdi and negdi patterns
as follows:
--- pr77308-1.c 2016-11-25 17:53:20.379141465 +0100
+++ pr77308-2.c 2016-11-29 20:46:51.266948631 +0100
@@ -68,10 +68,10 @@
#define B(x,j) (((SHA_LONG64)(*(((const unsigned char
*)(&x))+j)))<<((7-j)*8))
#define PULL64(x)
(B(x,0)|B(x,1)|B(x,2)|B(x,3)|B(x,4)|B(x,5)|B(x,6)|B(x,7))
#define ROTR(x,s) (((x)>>s) | (x)<<(64-s))
-#define Sigma0(x) ~(ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39))
-#define Sigma1(x) ~(ROTR((x),14) ^ ROTR((x),18) ^ ROTR((x),41))
-#define sigma0(x) ~(ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7))
-#define sigma1(x) ~(ROTR((x),19) ^ ROTR((x),61) ^ ((x)>>6))
+#define Sigma0(x) (ROTR((x),28) ^ ROTR((x),34) ^ ROTR((x),39) ==
(x) ? -(x) : (x))
+#define Sigma1(x) (ROTR((x),14) ^ ROTR(-(x),18) ^ ROTR((x),41) <
(x) ? -(x) : (x))
+#define sigma0(x) (ROTR((x),1) ^ ROTR((x),8) ^ ((x)>>7) <= (x)
? ~(x) : (x))
+#define sigma1(x) ((long long)(ROTR((x),19) ^ ROTR((x),61) ^
((x)>>6)) < (long long)(x) ? -(x) : (x))
#define Ch(x,y,z) (((x) & (y)) ^ ((~(x)) & (z)))
#define Maj(x,y,z) (((x) & (y)) ^ ((x) & (z)) ^ ((y) & (z)))
This expands *arm_negdi2, *arm_cmpdi_unsigned, *arm_cmpdi_insn.
The stack usage is around 1900 bytes with previous patch,
and 2300 bytes without.
I tried to split *arm_negdi2 and *arm_cmpdi_unsined early, and it
gives indeed smaller stack sizes in the test case above (~400 bytes).
But when I make *arm_cmpdi_insn split early, it ICEs:
--- arm.md.orig 2016-11-27 09:22:41.794790123 +0100
+++ arm.md 2016-11-29 21:51:51.438163078 +0100
@@ -7432,7 +7432,7 @@
(clobber (match_scratch:SI 2 "=r"))]
"TARGET_32BIT"
"#" ; "cmp\\t%Q0, %Q1\;sbcs\\t%2, %R0, %R1"
- "&& reload_completed"
+ "&& ((!TARGET_NEON && !TARGET_IWMMXT) || reload_completed)"
[(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 0) (match_dup 1)))
(parallel [(set (reg:CC CC_REGNUM)
ontop of the latest patch, I got:
gcc -S -Os pr77308-2.c -fdump-rtl-all-verbose
pr77308-2.c: In function 'sha512_block_data_order':
pr77308-2.c:169:1: error: unrecognizable insn:
}
^
(insn 4870 4869 1636 87 (set (scratch:SI)
(minus:SI (minus:SI (subreg:SI (reg:DI 2261) 4)
(subreg:SI (reg:DI 473 [ X$14 ]) 4))
(ltu:SI (reg:CC_C 100 cc)
(const_int 0 [0])))) "pr77308-2.c":140 -1
(nil))
pr77308-2.c:169:1: internal compiler error: in extract_insn, at recog.c:2311
0xaf4cd8 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc-trunk/gcc/rtl-error.c:108
0xaf4d09 _fatal_insn_not_found(rtx_def const*, char const*, int, char
const*)
../../gcc-trunk/gcc/rtl-error.c:116
0xac74ef extract_insn(rtx_insn*)
../../gcc-trunk/gcc/recog.c:2311
0x122427a decompose_multiword_subregs
../../gcc-trunk/gcc/lower-subreg.c:1467
0x122550d execute
../../gcc-trunk/gcc/lower-subreg.c:1734
So it is certainly possible, but not really simple to improve the
stack size even further. But I would prefer to do that in a
separate patch.
BTW: there are also negd2_compare, *negdi_extendsidi,
*negdi_zero_extendsidi, *thumb2_negdi2.
I think it would be a precondition to have test cases that exercise
each of these patterns before we try to split these instructions.
Bernd.
next prev parent reply other threads:[~2016-11-29 21:37 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-11-06 14:18 Bernd Edlinger
2016-11-25 11:30 ` Ramana Radhakrishnan
2016-11-28 19:42 ` Bernd Edlinger
[not found] ` <VI1PR0802MB2621FFBFA3252B40E5978C9F838D0@VI1PR0802MB2621.eurprd08.prod.outlook.com>
2016-11-29 21:37 ` Bernd Edlinger [this message]
[not found] ` <AM5PR0802MB261038521472515DDE3E58DA838D0@AM5PR0802MB2610.eurprd08.prod.outlook.com>
2016-11-30 12:01 ` Wilco Dijkstra
2016-11-30 17:01 ` Bernd Edlinger
2016-12-08 19:50 ` Bernd Edlinger
2017-01-11 16:55 ` Richard Earnshaw (lists)
2017-01-11 17:19 ` Bernd Edlinger
2017-04-29 19:17 ` [PING**2] " Bernd Edlinger
2017-05-12 16:51 ` [PING**3] " Bernd Edlinger
2017-06-01 16:01 ` [PING**4] " Bernd Edlinger
[not found] ` <bd5e03b1-860f-dd16-2030-9ce0f9a94c7c@hotmail.de>
2017-06-14 12:35 ` [PING**5] " Bernd Edlinger
[not found] ` <9a0fbb5d-9909-ef4d-6871-0cb4f7971bbb@hotmail.de>
2017-07-05 18:14 ` [PING**6] " Bernd Edlinger
2017-09-04 14:52 ` [PING**2] " Kyrill Tkachov
2017-09-05 8:47 ` Christophe Lyon
2017-09-05 14:25 ` Bernd Edlinger
2017-09-05 15:02 ` Wilco Dijkstra
2017-09-05 17:48 ` Bernd Edlinger
2017-09-05 17:53 ` Kyrill Tkachov
2017-09-05 18:20 ` Christophe Lyon
2017-09-06 7:35 ` Christophe Lyon
2017-09-05 21:28 ` Wilco Dijkstra
2017-09-06 9:31 ` Bernd Edlinger
2017-09-05 17:45 ` Kyrill Tkachov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AM4PR0701MB21628562505A31E0C0630660E48D0@AM4PR0701MB2162.eurprd07.prod.outlook.com \
--to=bernd.edlinger@hotmail.de \
--cc=Richard.Earnshaw@arm.com \
--cc=Wilco.Dijkstra@arm.com \
--cc=gcc-patches@gcc.gnu.org \
--cc=kyrylo.tkachov@foss.arm.com \
--cc=ramana.gcc@googlemail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).