* [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
@ 2017-07-11 21:21 Jeff Law
0 siblings, 0 replies; 6+ messages in thread
From: Jeff Law @ 2017-07-11 21:21 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 668 bytes --]
The prior patch introduced -fstack-check=clash prologues for the x86.
And yet we still saw large allocations in our testing.
It turns out combine-stack-adjustments would take
allocate PROBE_INTERVAL
probe
allocate PROBE_INTERVAL
probe
allocate PROBE_INTERVAL
probe
allocate RESIDUAL
And turn that into
allocate (3 * PROBE_INTERVAL) + residual
probe
probe
probe
Adjusting the address of the probes appropriately. Ugh.
This patch introduces a new note that the backend can attach to a stack
adjustment which essentially tells c-s-a to not merge it into other
adjustments. THere's an x86 specific test to verify behavior.
Comments/Questions? Ok for the trunk?
[-- Attachment #2: P5 --]
[-- Type: text/plain, Size: 3337 bytes --]
* combine-stack-adj.c (combine_stack_adjustments_for_block): Do
nothing for stack adjustments with REG_STACK_CHECK.
* config/i386/i386.c (pro_epilogue_adjust_stack): Return insn.
(ix86_adjust_satck_and_probe_stack_clash): Add REG_STACK_NOTEs.
* reg-notes.def (STACK_CHECK): New note.
testsuite/
* gcc.target/i386/stack-check-11.c: New test.
commit f363b876ccbbc584db85510cd24b80349fcd8260
Author: Jeff Law <law@torsion.usersys.redhat.com>
Date: Wed Jun 28 12:36:49 2017 -0400
Don't combine adjustments for stck probing
diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index 9ec14a3..82d6dba 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -508,6 +508,8 @@ combine_stack_adjustments_for_block (basic_block bb)
continue;
set = single_set_for_csa (insn);
+ if (set && find_reg_note (insn, REG_STACK_CHECK, NULL_RTX))
+ set = NULL_RTX;
if (set)
{
rtx dest = SET_DEST (set);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 7098f74..a737300 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13405,7 +13405,7 @@ ix86_add_queued_cfa_restore_notes (rtx insn)
zero if %r11 register is live and cannot be freely used and positive
otherwise. */
-static void
+static rtx
pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
int style, bool set_cfa)
{
@@ -13496,6 +13496,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
m->fs.sp_valid = valid;
m->fs.sp_realigned = realigned;
}
+ return insn;
}
/* Find an available register to be used as dynamic realign argument
@@ -13837,9 +13838,11 @@ ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size)
for (i = PROBE_INTERVAL; i <= size; i += PROBE_INTERVAL)
{
/* Allocate PROBE_INTERVAL bytes. */
- pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+ rtx insn
+ = pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
GEN_INT (-PROBE_INTERVAL), -1,
m->fs.cfa_reg == stack_pointer_rtx);
+ add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
/* And probe at *sp. */
emit_stack_probe (stack_pointer_rtx);
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index 8734d26..18cf7e3 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -223,6 +223,10 @@ REG_NOTE (ARGS_SIZE)
pseudo reg. */
REG_NOTE (RETURNED)
+/* Indicates the instruction is a stack check probe that should not
+ be combined with other stack adjustments. */
+REG_NOTE (STACK_CHECK)
+
/* Used to mark a call with the function decl called by the call.
The decl might not be available in the call due to splitting of the call
insn. This note is a SYMBOL_REF. */
diff --git a/gcc/testsuite/gcc.target/i386/stack-check-11.c b/gcc/testsuite/gcc.target/i386/stack-check-11.c
new file mode 100644
index 0000000..c17b8c6
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/stack-check-11.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-check=clash" } */
+
+extern void arf (unsigned long int *, unsigned long int *);
+void
+frob ()
+{
+ unsigned long int num[859];
+ unsigned long int den[859];
+ arf (den, num);
+}
+
+/* { dg-final { scan-assembler-times "subq" 4 } } */
+/* { dg-final { scan-assembler-times "orq" 3 } } */
+
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
2017-07-21 20:25 ` Segher Boessenkool
@ 2017-07-31 15:53 ` Jeff Law
0 siblings, 0 replies; 6+ messages in thread
From: Jeff Law @ 2017-07-31 15:53 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On 07/21/2017 02:25 PM, Segher Boessenkool wrote:
> On Thu, Jul 20, 2017 at 08:20:52AM -0600, Jeff Law wrote:
>>> Can only combine-stack-adjustments do this? It seems like something
>>> many passes could do, and then your new note doesn't help.
>> SO far it's only been observed with c-s-a, but further auditing is
>> certainly desirable here, particularly with the upcoming changes to the
>> generic dynamic alloca handling.
>>
>> In the V2 patch only backends would emit unrolled inline alloca/probe
>> sequences like what you see above and only for prologues. Thus there
>> were a very limited number of passes to be concerned about.
>>
>> In the V3 patch we have unrolled inline probing for the dynamic space as
>> well, so this kind of sequence is exposed to everything after
>> gimple->rtl expansion.
>>
>> Unfortunately, the most thorough checker we have is x86 and on that
>> target, because of stack alignment issues, we'll never see a constant
>> size in the dynamic space and thus no unrolled inlined alloca/probe
>> sequences.
>>
>> In reality I suspect that with teh hard register references, most passes
>> are going to leave those insns alone, but some auditing is necessary.
>
> This is similar to what rs6000 uses stack_tie for. You want the
> prevent a store to the stack (the probe) from being moved after a
> later stack pointer update. By pretending (in the insn pattern)
> there is a store to stack with that stack pointer update, nothing
> can move stores after it.
FWIW, we found a case where the scheduler would muck things up.
Essentially it has code rewrites address computations and memory
references in the hopes of breaking dependency chains. In many ways
it's a lot like c-s-a.
Anyway, it broke the dependency chain and as a result we ended up with
unsafe sequences on aarch64. THe V3 patch posted last night addresses
that in two ways. First the scheduler knows about the magic note in the
same manner as c-s-a. And we emit scheduling barriers in the probing
code to prevent undesirable movement.
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
2017-07-20 14:20 ` Jeff Law
@ 2017-07-21 20:25 ` Segher Boessenkool
2017-07-31 15:53 ` Jeff Law
0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2017-07-21 20:25 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
On Thu, Jul 20, 2017 at 08:20:52AM -0600, Jeff Law wrote:
> > Can only combine-stack-adjustments do this? It seems like something
> > many passes could do, and then your new note doesn't help.
> SO far it's only been observed with c-s-a, but further auditing is
> certainly desirable here, particularly with the upcoming changes to the
> generic dynamic alloca handling.
>
> In the V2 patch only backends would emit unrolled inline alloca/probe
> sequences like what you see above and only for prologues. Thus there
> were a very limited number of passes to be concerned about.
>
> In the V3 patch we have unrolled inline probing for the dynamic space as
> well, so this kind of sequence is exposed to everything after
> gimple->rtl expansion.
>
> Unfortunately, the most thorough checker we have is x86 and on that
> target, because of stack alignment issues, we'll never see a constant
> size in the dynamic space and thus no unrolled inlined alloca/probe
> sequences.
>
> In reality I suspect that with teh hard register references, most passes
> are going to leave those insns alone, but some auditing is necessary.
This is similar to what rs6000 uses stack_tie for. You want the
prevent a store to the stack (the probe) from being moved after a
later stack pointer update. By pretending (in the insn pattern)
there is a store to stack with that stack pointer update, nothing
can move stores after it.
Segher
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
2017-07-20 13:14 ` Segher Boessenkool
@ 2017-07-20 14:20 ` Jeff Law
2017-07-21 20:25 ` Segher Boessenkool
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2017-07-20 14:20 UTC (permalink / raw)
To: Segher Boessenkool; +Cc: gcc-patches
On 07/20/2017 07:09 AM, Segher Boessenkool wrote:
> Hi Jeff,
>
> On Tue, Jul 18, 2017 at 11:17:48PM -0600, Jeff Law wrote:
>> It turns out combine-stack-adjustments would take
>>
>> allocate PROBE_INTERVAL
>> probe
>> allocate PROBE_INTERVAL
>> probe
>> allocate PROBE_INTERVAL
>> probe
>> allocate RESIDUAL
>>
>> And turn that into
>>
>> allocate (3 * PROBE_INTERVAL) + residual
>> probe
>> probe
>> probe
>>
>> Adjusting the address of the probes appropriately. Ugh.
>>
>> This patch introduces a new note that the backend can attach to a stack
>> adjustment which essentially tells c-s-a to not merge it into other
>> adjustments. THere's an x86 specific test to verify behavior.
>
> Can only combine-stack-adjustments do this? It seems like something
> many passes could do, and then your new note doesn't help.
SO far it's only been observed with c-s-a, but further auditing is
certainly desirable here, particularly with the upcoming changes to the
generic dynamic alloca handling.
In the V2 patch only backends would emit unrolled inline alloca/probe
sequences like what you see above and only for prologues. Thus there
were a very limited number of passes to be concerned about.
In the V3 patch we have unrolled inline probing for the dynamic space as
well, so this kind of sequence is exposed to everything after
gimple->rtl expansion.
Unfortunately, the most thorough checker we have is x86 and on that
target, because of stack alignment issues, we'll never see a constant
size in the dynamic space and thus no unrolled inlined alloca/probe
sequences.
In reality I suspect that with teh hard register references, most passes
are going to leave those insns alone, but some auditing is necessary.
Jeff
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
2017-07-19 5:17 Jeff Law
@ 2017-07-20 13:14 ` Segher Boessenkool
2017-07-20 14:20 ` Jeff Law
0 siblings, 1 reply; 6+ messages in thread
From: Segher Boessenkool @ 2017-07-20 13:14 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Hi Jeff,
On Tue, Jul 18, 2017 at 11:17:48PM -0600, Jeff Law wrote:
> It turns out combine-stack-adjustments would take
>
> allocate PROBE_INTERVAL
> probe
> allocate PROBE_INTERVAL
> probe
> allocate PROBE_INTERVAL
> probe
> allocate RESIDUAL
>
> And turn that into
>
> allocate (3 * PROBE_INTERVAL) + residual
> probe
> probe
> probe
>
> Adjusting the address of the probes appropriately. Ugh.
>
> This patch introduces a new note that the backend can attach to a stack
> adjustment which essentially tells c-s-a to not merge it into other
> adjustments. THere's an x86 specific test to verify behavior.
Can only combine-stack-adjustments do this? It seems like something
many passes could do, and then your new note doesn't help.
Segher
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH][RFA/RFC] Stack clash mitigation patch 05/08
@ 2017-07-19 5:17 Jeff Law
2017-07-20 13:14 ` Segher Boessenkool
0 siblings, 1 reply; 6+ messages in thread
From: Jeff Law @ 2017-07-19 5:17 UTC (permalink / raw)
To: gcc-patches
[-- Attachment #1: Type: text/plain, Size: 723 bytes --]
I don't think this has changed in any significant way since V1.
--
The prior patch introduced -fstack-clash-protection prologues for the
x86. And yet we still saw large allocations in our testing.
It turns out combine-stack-adjustments would take
allocate PROBE_INTERVAL
probe
allocate PROBE_INTERVAL
probe
allocate PROBE_INTERVAL
probe
allocate RESIDUAL
And turn that into
allocate (3 * PROBE_INTERVAL) + residual
probe
probe
probe
Adjusting the address of the probes appropriately. Ugh.
This patch introduces a new note that the backend can attach to a stack
adjustment which essentially tells c-s-a to not merge it into other
adjustments. THere's an x86 specific test to verify behavior.
OK for the trunk?
[-- Attachment #2: P5 --]
[-- Type: text/plain, Size: 3228 bytes --]
* combine-stack-adj.c (combine_stack_adjustments_for_block): Do
nothing for stack adjustments with REG_STACK_CHECK.
* config/i386/i386.c (pro_epilogue_adjust_stack): Return insn.
(ix86_adjust_satck_and_probe_stack_clash): Add REG_STACK_NOTEs.
* reg-notes.def (STACK_CHECK): New note.
testsuite/
* gcc.target/i386/stack-check-11.c: New test.
diff --git a/gcc/combine-stack-adj.c b/gcc/combine-stack-adj.c
index 9ec14a3..82d6dba 100644
--- a/gcc/combine-stack-adj.c
+++ b/gcc/combine-stack-adj.c
@@ -508,6 +508,8 @@ combine_stack_adjustments_for_block (basic_block bb)
continue;
set = single_set_for_csa (insn);
+ if (set && find_reg_note (insn, REG_STACK_CHECK, NULL_RTX))
+ set = NULL_RTX;
if (set)
{
rtx dest = SET_DEST (set);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index fa10c28..d297e8a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -13405,7 +13405,7 @@ ix86_add_queued_cfa_restore_notes (rtx insn)
zero if %r11 register is live and cannot be freely used and positive
otherwise. */
-static void
+static rtx
pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
int style, bool set_cfa)
{
@@ -13496,6 +13496,7 @@ pro_epilogue_adjust_stack (rtx dest, rtx src, rtx offset,
m->fs.sp_valid = valid;
m->fs.sp_realigned = realigned;
}
+ return insn;
}
/* Find an available register to be used as dynamic realign argument
@@ -13837,9 +13838,11 @@ ix86_adjust_stack_and_probe_stack_clash (const HOST_WIDE_INT size)
for (i = PROBE_INTERVAL; i <= size; i += PROBE_INTERVAL)
{
/* Allocate PROBE_INTERVAL bytes. */
- pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
+ rtx insn
+ = pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
GEN_INT (-PROBE_INTERVAL), -1,
m->fs.cfa_reg == stack_pointer_rtx);
+ add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
/* And probe at *sp. */
emit_stack_probe (stack_pointer_rtx);
diff --git a/gcc/reg-notes.def b/gcc/reg-notes.def
index 8734d26..18cf7e3 100644
--- a/gcc/reg-notes.def
+++ b/gcc/reg-notes.def
@@ -223,6 +223,10 @@ REG_NOTE (ARGS_SIZE)
pseudo reg. */
REG_NOTE (RETURNED)
+/* Indicates the instruction is a stack check probe that should not
+ be combined with other stack adjustments. */
+REG_NOTE (STACK_CHECK)
+
/* Used to mark a call with the function decl called by the call.
The decl might not be available in the call due to splitting of the call
insn. This note is a SYMBOL_REF. */
diff --git a/gcc/testsuite/gcc.target/i386/stack-check-11.c b/gcc/testsuite/gcc.target/i386/stack-check-11.c
new file mode 100644
index 0000000..183103f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/stack-check-11.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fstack-clash-protection" } */
+/* { dg-require-effective-target supports_stack_clash_protection } */
+
+extern void arf (unsigned long int *, unsigned long int *);
+void
+frob ()
+{
+ unsigned long int num[859];
+ unsigned long int den[859];
+ arf (den, num);
+}
+
+/* { dg-final { scan-assembler-times "subq" 4 } } */
+/* { dg-final { scan-assembler-times "orq" 3 } } */
+
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2017-07-31 15:53 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-11 21:21 [PATCH][RFA/RFC] Stack clash mitigation patch 05/08 Jeff Law
2017-07-19 5:17 Jeff Law
2017-07-20 13:14 ` Segher Boessenkool
2017-07-20 14:20 ` Jeff Law
2017-07-21 20:25 ` Segher Boessenkool
2017-07-31 15:53 ` Jeff Law
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).