public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
       [not found] <CAKnkMGvQoj2Wuz_r-PGX8aJkusA=hzVLW-AHaVjDK78ioHUMxQ@mail.gmail.com>
@ 2018-08-29  9:51 ` Thomas Preudhomme
  2018-08-29 10:07   ` Thomas Preudhomme
                     ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Thomas Preudhomme @ 2018-08-29  9:51 UTC (permalink / raw)
  To: Jeff Law, kyrylo.tkachov, Ramana Radhakrishnan, Richard Earnshaw
  Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 5249 bytes --]

Resend hopefully without HTML this time.

On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Hi,
>
> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
>
>
> In case of high register pressure in PIC mode, address of the stack
> protector's guard can be spilled on ARM targets as shown in PR85434,
> thus allowing an attacker to control what the canary would be compared
> against. ARM does lack stack_protect_set and stack_protect_test insn
> patterns, defining them does not help as the address is expanded
> regularly and the patterns only deal with the copy and test of the
> guard with the canary.
>
> This problem does not occur for x86 targets because the PIC access and
> the test can be done in the same instruction. Aarch64 is exempt too
> because PIC access insn pattern are mov of UNSPEC which prevents it from
> the second access in the epilogue being CSEd in cse_local pass with the
> first access in the prologue.
>
> The approach followed here is to create new "combined" set and test
> standard pattern names that take the unexpanded guard and do the set or
> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> to hide the individual instructions being generated to the compiler and
> split the pattern into generic load, compare and branch instruction
> after register allocator, therefore avoiding any spilling. This is here
> implemented for the ARM targets. For targets not implementing these new
> standard pattern names, the existing stack_protect_set and
> stack_protect_test pattern names are used.
>
> To be able to split PIC access after register allocation, the functions
> had to be augmented to force a new PIC register load and to control
> which register it loads into. This is because sharing the PIC register
> between prologue and epilogue could lead to spilling due to CSE again
> which an attacker could use to control what the canary gets compared
> against.
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
>     * target-insns.def (stack_protect_combined_set): Define new standard
>     pattern name.
>     (stack_protect_combined_test): Likewise.
>     * cfgexpand.c (stack_protect_prologue): Try new
>     stack_protect_combined_set pattern first.
>     * function.c (stack_protect_epilogue): Try new
>     stack_protect_combined_test pattern first.
>     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>     parameters to control which register to use as PIC register and force
>     reloading PIC register respectively.  Insert in the stream of insns if
>     possible.
>     (legitimize_pic_address): Expose above new parameters in prototype and
>     adapt recursive calls accordingly.
>     (arm_legitimize_address): Adapt to new legitimize_pic_address
>     prototype.
>     (thumb_legitimize_address): Likewise.
>     (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>     change.
>     * config/arm/predicated.md (guard_operand): New predicate.
>     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>     prototype change.
>     (stack_protect_combined_set): New insn_and_split pattern.
>     (stack_protect_set): New insn pattern.
>     (stack_protect_combined_test): New insn_and_split pattern.
>     (stack_protect_test): New insn pattern.
>     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>     (UNSPEC_SP_TEST): Likewise.
>     * doc/md.texi (stack_protect_combined_set): Document new standard
>     pattern name.
>     (stack_protect_set): Clarify that the operand for guard's address is
>     legal.
>     (stack_protect_combined_test): Document new standard pattern name.
>     (stack_protect_test): Clarify that the operand for guard's address is
>     legal.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
>     * gcc.target/arm/pr85434.c: New test.
>
>
> Testing:
>
> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> cross ARM Linux: build + testsuite -> no regression
> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
>
> Is this ok for trunk?
>
> Best regards,
>
> Thomas

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 28718 bytes --]

From 922cc5d7054bc598732e4ad6d408c7e4297c519a Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to new require_pic_register prototype.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	* config/arm/predicated.md (guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(stack_protect_combined_set): New insn_and_split pattern.
	(stack_protect_set): New insn pattern.
	(stack_protect_combined_test): New insn_and_split pattern.
	(stack_protect_test): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   2 +-
 gcc/config/arm/arm.c                   |  56 +++++--
 gcc/config/arm/arm.md                  |  94 +++++++++++-
 gcc/config/arm/predicates.md           |  10 ++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 10 files changed, 438 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 3c5b30b79f8..e5320836919 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6108,6 +6108,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 8537262ce64..100844e659c 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -67,7 +67,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index f5eece4f152..87c728e0eea 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7369,20 +7369,26 @@ legitimate_pic_operand_p (rtx x)
 }
 
 /* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+   cfun->machine->pic_reg if we have not already done so.
+
+   A new pseudo register is used for the PIC register if possible, otherwise
+   PIC_REG must be non NULL and is used instead.  COMPUTE_NOW forces the PIC
+   register to be loaded, irregardless of whether it was loaded previously.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && can_create_pseudo_p ()
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7399,7 +7405,8 @@ require_pic_register (void)
 	  rtx_insn *seq, *insn;
 
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg =
+	      can_create_pseudo_p () ? gen_reg_rtx (Pmode) : pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7410,7 +7417,8 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && can_create_pseudo_p ())
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
@@ -7427,15 +7435,29 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
@@ -7469,7 +7491,7 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
 
 	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
 
@@ -7520,9 +7542,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -8707,7 +8731,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8775,7 +8800,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18059,7 +18085,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index ca2a2f5469f..e7abf83494a 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -8634,6 +8635,97 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_insn_and_split "stack_protect_combined_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(match_operand:SI 1 "guard_operand" "X")]
+		   UNSPEC_SP_SET))
+   (match_scratch:SI 2 "=r")
+   (match_scratch:SI 3 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  rtx addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[2],
+					    operands[3], true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (addr, SImode))
+	operands[2] = addr;
+      else
+	operands[2] = XEXP (force_const_mem (SImode, addr), 0);
+    }
+}"
+)
+
+(define_insn "stack_protect_set"
+  [(set (match_operand:SI 0 "memory_operand" "=m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
+(define_insn_and_split "stack_protect_combined_test"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "X")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (match_scratch:SI 3 "=r")
+   (match_scratch:SI 4 "=r")]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq, addr;
+
+  addr = XEXP (operands[1], 0);
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      operands[1] = legitimize_pic_address (addr, SImode, operands[3],
+					    operands[4],
+					    true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (addr, SImode))
+	operands[3] = addr;
+      else
+	operands[3] = XEXP (force_const_mem (SImode, addr), 0);
+    }
+  emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+})
+
+(define_insn "stack_protect_test"
+  [(set (match_operand:SI 0 "register_operand" "=r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  ""
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "12")
+   (set_attr "type" "multiple")])
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..9c54b0d02bb 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,16 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  rtx addr = XEXP (op, 0);
+  return (CONSTANT_ADDRESS_P (addr)
+	  || !targetm.cannot_force_const_mem (mode, addr));
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 02f9e1e4320..e5851a6711d 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7388,22 +7388,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index dee303cdbdd..811849ce31b 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = 0;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.18.0


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-08-29  9:51 ` [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM Thomas Preudhomme
@ 2018-08-29 10:07   ` Thomas Preudhomme
  2018-09-13 12:02     ` Thomas Preudhomme
  2018-09-18  0:57   ` Jeff Law
  2018-09-25 16:13   ` Kyrill Tkachov
  2 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-08-29 10:07 UTC (permalink / raw)
  To: Jeff Law, kyrylo.tkachov, Ramana Radhakrishnan, Richard Earnshaw
  Cc: gcc-patches

[-- Attachment #1: Type: text/plain, Size: 6279 bytes --]

Forgot another important change in ARM backend:

The expander were causing one too many indirection which was what
caused the test failure in glibc. The new expanders code skip the
creation of a move from the memory reference of the guard's address to
a register since this is done in the insn themselves. I think during
the initial implementation of the first version of the patch I had
issues with loading the address and used that to load the address. As
can be seen from the absence of regression on the runtime stack
protector test in glibc, this is now working properly, also confirmed
by manual inspection of the code.

I've attached the interdiff from previous version for reference.

Best regards,

Thomas
On Wed, 29 Aug 2018 at 10:51, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Resend hopefully without HTML this time.
>
> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
> >
> > Hi,
> >
> > I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> >
> >
> > In case of high register pressure in PIC mode, address of the stack
> > protector's guard can be spilled on ARM targets as shown in PR85434,
> > thus allowing an attacker to control what the canary would be compared
> > against. ARM does lack stack_protect_set and stack_protect_test insn
> > patterns, defining them does not help as the address is expanded
> > regularly and the patterns only deal with the copy and test of the
> > guard with the canary.
> >
> > This problem does not occur for x86 targets because the PIC access and
> > the test can be done in the same instruction. Aarch64 is exempt too
> > because PIC access insn pattern are mov of UNSPEC which prevents it from
> > the second access in the epilogue being CSEd in cse_local pass with the
> > first access in the prologue.
> >
> > The approach followed here is to create new "combined" set and test
> > standard pattern names that take the unexpanded guard and do the set or
> > test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > to hide the individual instructions being generated to the compiler and
> > split the pattern into generic load, compare and branch instruction
> > after register allocator, therefore avoiding any spilling. This is here
> > implemented for the ARM targets. For targets not implementing these new
> > standard pattern names, the existing stack_protect_set and
> > stack_protect_test pattern names are used.
> >
> > To be able to split PIC access after register allocation, the functions
> > had to be augmented to force a new PIC register load and to control
> > which register it loads into. This is because sharing the PIC register
> > between prologue and epilogue could lead to spilling due to CSE again
> > which an attacker could use to control what the canary gets compared
> > against.
> >
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >
> >     * target-insns.def (stack_protect_combined_set): Define new standard
> >     pattern name.
> >     (stack_protect_combined_test): Likewise.
> >     * cfgexpand.c (stack_protect_prologue): Try new
> >     stack_protect_combined_set pattern first.
> >     * function.c (stack_protect_epilogue): Try new
> >     stack_protect_combined_test pattern first.
> >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >     parameters to control which register to use as PIC register and force
> >     reloading PIC register respectively.  Insert in the stream of insns if
> >     possible.
> >     (legitimize_pic_address): Expose above new parameters in prototype and
> >     adapt recursive calls accordingly.
> >     (arm_legitimize_address): Adapt to new legitimize_pic_address
> >     prototype.
> >     (thumb_legitimize_address): Likewise.
> >     (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >     change.
> >     * config/arm/predicated.md (guard_operand): New predicate.
> >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >     prototype change.
> >     (stack_protect_combined_set): New insn_and_split pattern.
> >     (stack_protect_set): New insn pattern.
> >     (stack_protect_combined_test): New insn_and_split pattern.
> >     (stack_protect_test): New insn pattern.
> >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >     (UNSPEC_SP_TEST): Likewise.
> >     * doc/md.texi (stack_protect_combined_set): Document new standard
> >     pattern name.
> >     (stack_protect_set): Clarify that the operand for guard's address is
> >     legal.
> >     (stack_protect_combined_test): Document new standard pattern name.
> >     (stack_protect_test): Clarify that the operand for guard's address is
> >     legal.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >
> >     * gcc.target/arm/pr85434.c: New test.
> >
> >
> > Testing:
> >
> > native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > cross ARM Linux: build + testsuite -> no regression
> > native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> >
> > Is this ok for trunk?
> >
> > Best regards,
> >
> > Thomas

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.interdiff --]
[-- Type: application/octet-stream, Size: 5613 bytes --]

diff -u b/gcc/cfgexpand.c b/gcc/cfgexpand.c
--- b/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6106,17 +6106,24 @@
   tree guard_decl = targetm.stack_protect_guard ();
   rtx x, y;
   rtx x, y;
-  struct expand_operand ops[2];
 
   x = expand_normal (crtl->stack_protect_guard);
-  create_fixed_operand (&ops[0], x);
-  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
-  /* Allow the target to compute address of Y and copy it to X without
-     leaking Y into a register.  This combined address + copy pattern allows
-     the target to prevent spilling of any intermediate results by splitting
-     it after register allocator.  */
-  if (maybe_expand_insn (targetm.code_for_stack_protect_combined_set, 2, ops))
-    return;
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
 
   if (guard_decl)
     y = expand_normal (guard_decl);
diff -u b/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
--- b/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -8638,7 +8638,7 @@
 ;; Named patterns for stack smashing protection.
 (define_insn_and_split "stack_protect_combined_set"
   [(set (match_operand:SI 0 "memory_operand" "=m")
-	(unspec:SI [(match_operand:SI 1 "memory_operand" "X")]
+	(unspec:SI [(match_operand:SI 1 "guard_operand" "X")]
 		   UNSPEC_SP_SET))
    (match_scratch:SI 2 "=r")
    (match_scratch:SI 3 "=r")]
@@ -8659,9 +8659,10 @@
     }
   else
     {
-      if (!address_operand (addr, SImode))
-	operands[1] = force_const_mem (SImode, addr);
-      emit_move_insn (operands[2], operands[1]);
+      if (address_operand (addr, SImode))
+	operands[2] = addr;
+      else
+	operands[2] = XEXP (force_const_mem (SImode, addr), 0);
     }
 }"
 )
@@ -8680,7 +8681,7 @@
   [(set (pc)
 	(if_then_else
 		(eq (match_operand:SI 0 "memory_operand" "m")
-		    (unspec:SI [(match_operand:SI 1 "memory_operand" "X")]
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "X")]
 			       UNSPEC_SP_TEST))
 		(label_ref (match_operand 2))
 		(pc)))
@@ -8703,9 +8704,10 @@
     }
   else
     {
-      if (!address_operand (addr, SImode))
-	operands[1] = force_const_mem (SImode, addr);
-      emit_move_insn (operands[3], operands[1]);
+      if (address_operand (addr, SImode))
+	operands[3] = addr;
+      else
+	operands[3] = XEXP (force_const_mem (SImode, addr), 0);
     }
   emit_insn (gen_stack_protect_test (operands[4], operands[0], operands[3]));
   eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
diff -u b/gcc/function.c b/gcc/function.c
--- b/gcc/function.c
+++ b/gcc/function.c
@@ -4892,19 +4892,21 @@
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
-  struct expand_operand ops[3];
+  rtx_insn *seq = 0;
 
   x = expand_normal (crtl->stack_protect_guard);
-  create_fixed_operand (&ops[0], x);
-  create_fixed_operand (&ops[1], DECL_RTL (guard_decl));
-  create_fixed_operand (&ops[2], label);
-  /* Allow the target to compute address of Y and compare it with X without
-     leaking Y into a register.  This combined address + compare pattern allows
-     the target to prevent spilling of any intermediate results by splitting
-     it after register allocator.  */
-  if (!maybe_expand_jump_insn (targetm.code_for_stack_protect_combined_test,
-			       3, ops))
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
+  else
     {
       if (guard_decl)
 	y = expand_normal (guard_decl);
@@ -4914,12 +4916,13 @@
       /* Allow the target to compare Y with X without leaking either into
 	 a register.  */
-      if (targetm.have_stack_protect_test ()
-	  && ((seq = targetm.gen_stack_protect_test (x, y, label))
-	      != NULL_RTX))
-	emit_insn (seq);
-      else
-	emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
     }
 
+  if (seq)
+    emit_insn (seq);
+  else
+    emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
+
   /* The noreturn predictor has been moved to the tree level.  The rtl-level
      predictors estimate this branch about 20%, which isn't enough to get
only in patch2:
unchanged:
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,16 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  rtx addr = XEXP (op, 0);
+  return (CONSTANT_ADDRESS_P (addr)
+	  || !targetm.cannot_force_const_mem (mode, addr));
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-08-29 10:07   ` Thomas Preudhomme
@ 2018-09-13 12:02     ` Thomas Preudhomme
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Preudhomme @ 2018-09-13 12:02 UTC (permalink / raw)
  To: Jeff Law, kyrylo.tkachov, Ramana Radhakrishnan, Richard Earnshaw
  Cc: gcc-patches

Hi all,

Ping? This new version changes both the middle-end and back-end part
so will need a review for both of those.

Best regards,

Thomas
On Wed, 29 Aug 2018 at 11:07, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Forgot another important change in ARM backend:
>
> The expander were causing one too many indirection which was what
> caused the test failure in glibc. The new expanders code skip the
> creation of a move from the memory reference of the guard's address to
> a register since this is done in the insn themselves. I think during
> the initial implementation of the first version of the patch I had
> issues with loading the address and used that to load the address. As
> can be seen from the absence of regression on the runtime stack
> protector test in glibc, this is now working properly, also confirmed
> by manual inspection of the code.
>
> I've attached the interdiff from previous version for reference.
>
> Best regards,
>
> Thomas
> On Wed, 29 Aug 2018 at 10:51, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
> >
> > Resend hopefully without HTML this time.
> >
> > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> > >
> > > Hi,
> > >
> > > I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > >
> > >
> > > In case of high register pressure in PIC mode, address of the stack
> > > protector's guard can be spilled on ARM targets as shown in PR85434,
> > > thus allowing an attacker to control what the canary would be compared
> > > against. ARM does lack stack_protect_set and stack_protect_test insn
> > > patterns, defining them does not help as the address is expanded
> > > regularly and the patterns only deal with the copy and test of the
> > > guard with the canary.
> > >
> > > This problem does not occur for x86 targets because the PIC access and
> > > the test can be done in the same instruction. Aarch64 is exempt too
> > > because PIC access insn pattern are mov of UNSPEC which prevents it from
> > > the second access in the epilogue being CSEd in cse_local pass with the
> > > first access in the prologue.
> > >
> > > The approach followed here is to create new "combined" set and test
> > > standard pattern names that take the unexpanded guard and do the set or
> > > test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > > to hide the individual instructions being generated to the compiler and
> > > split the pattern into generic load, compare and branch instruction
> > > after register allocator, therefore avoiding any spilling. This is here
> > > implemented for the ARM targets. For targets not implementing these new
> > > standard pattern names, the existing stack_protect_set and
> > > stack_protect_test pattern names are used.
> > >
> > > To be able to split PIC access after register allocation, the functions
> > > had to be augmented to force a new PIC register load and to control
> > > which register it loads into. This is because sharing the PIC register
> > > between prologue and epilogue could lead to spilling due to CSE again
> > > which an attacker could use to control what the canary gets compared
> > > against.
> > >
> > > ChangeLog entries are as follows:
> > >
> > > *** gcc/ChangeLog ***
> > >
> > > 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >
> > >     * target-insns.def (stack_protect_combined_set): Define new standard
> > >     pattern name.
> > >     (stack_protect_combined_test): Likewise.
> > >     * cfgexpand.c (stack_protect_prologue): Try new
> > >     stack_protect_combined_set pattern first.
> > >     * function.c (stack_protect_epilogue): Try new
> > >     stack_protect_combined_test pattern first.
> > >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > >     parameters to control which register to use as PIC register and force
> > >     reloading PIC register respectively.  Insert in the stream of insns if
> > >     possible.
> > >     (legitimize_pic_address): Expose above new parameters in prototype and
> > >     adapt recursive calls accordingly.
> > >     (arm_legitimize_address): Adapt to new legitimize_pic_address
> > >     prototype.
> > >     (thumb_legitimize_address): Likewise.
> > >     (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > >     change.
> > >     * config/arm/predicated.md (guard_operand): New predicate.
> > >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > >     prototype change.
> > >     (stack_protect_combined_set): New insn_and_split pattern.
> > >     (stack_protect_set): New insn pattern.
> > >     (stack_protect_combined_test): New insn_and_split pattern.
> > >     (stack_protect_test): New insn pattern.
> > >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > >     (UNSPEC_SP_TEST): Likewise.
> > >     * doc/md.texi (stack_protect_combined_set): Document new standard
> > >     pattern name.
> > >     (stack_protect_set): Clarify that the operand for guard's address is
> > >     legal.
> > >     (stack_protect_combined_test): Document new standard pattern name.
> > >     (stack_protect_test): Clarify that the operand for guard's address is
> > >     legal.
> > >
> > > *** gcc/testsuite/ChangeLog ***
> > >
> > > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >
> > >     * gcc.target/arm/pr85434.c: New test.
> > >
> > >
> > > Testing:
> > >
> > > native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > > native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > > cross ARM Linux: build + testsuite -> no regression
> > > native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > >
> > > Is this ok for trunk?
> > >
> > > Best regards,
> > >
> > > Thomas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-08-29  9:51 ` [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM Thomas Preudhomme
  2018-08-29 10:07   ` Thomas Preudhomme
@ 2018-09-18  0:57   ` Jeff Law
  2018-09-25 16:13   ` Kyrill Tkachov
  2 siblings, 0 replies; 20+ messages in thread
From: Jeff Law @ 2018-09-18  0:57 UTC (permalink / raw)
  To: Thomas Preudhomme, kyrylo.tkachov, Ramana Radhakrishnan,
	Richard Earnshaw
  Cc: gcc-patches

On 8/29/18 3:51 AM, Thomas Preudhomme wrote:
> Resend hopefully without HTML this time.
> 
> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
>> Hi,
>>
>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
>>
>>
>> In case of high register pressure in PIC mode, address of the stack
>> protector's guard can be spilled on ARM targets as shown in PR85434,
>> thus allowing an attacker to control what the canary would be compared
>> against. ARM does lack stack_protect_set and stack_protect_test insn
>> patterns, defining them does not help as the address is expanded
>> regularly and the patterns only deal with the copy and test of the
>> guard with the canary.
>>
>> This problem does not occur for x86 targets because the PIC access and
>> the test can be done in the same instruction. Aarch64 is exempt too
>> because PIC access insn pattern are mov of UNSPEC which prevents it from
>> the second access in the epilogue being CSEd in cse_local pass with the
>> first access in the prologue.
>>
>> The approach followed here is to create new "combined" set and test
>> standard pattern names that take the unexpanded guard and do the set or
>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
>> to hide the individual instructions being generated to the compiler and
>> split the pattern into generic load, compare and branch instruction
>> after register allocator, therefore avoiding any spilling. This is here
>> implemented for the ARM targets. For targets not implementing these new
>> standard pattern names, the existing stack_protect_set and
>> stack_protect_test pattern names are used.
>>
>> To be able to split PIC access after register allocation, the functions
>> had to be augmented to force a new PIC register load and to control
>> which register it loads into. This is because sharing the PIC register
>> between prologue and epilogue could lead to spilling due to CSE again
>> which an attacker could use to control what the canary gets compared
>> against.
>>
>> ChangeLog entries are as follows:
>>
>> *** gcc/ChangeLog ***
>>
>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>
>>     * target-insns.def (stack_protect_combined_set): Define new standard
>>     pattern name.
>>     (stack_protect_combined_test): Likewise.
>>     * cfgexpand.c (stack_protect_prologue): Try new
>>     stack_protect_combined_set pattern first.
>>     * function.c (stack_protect_epilogue): Try new
>>     stack_protect_combined_test pattern first.
>>     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>     parameters to control which register to use as PIC register and force
>>     reloading PIC register respectively.  Insert in the stream of insns if
>>     possible.
>>     (legitimize_pic_address): Expose above new parameters in prototype and
>>     adapt recursive calls accordingly.
>>     (arm_legitimize_address): Adapt to new legitimize_pic_address
>>     prototype.
>>     (thumb_legitimize_address): Likewise.
>>     (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>>     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>     change.
>>     * config/arm/predicated.md (guard_operand): New predicate.
>>     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>     prototype change.
>>     (stack_protect_combined_set): New insn_and_split pattern.
>>     (stack_protect_set): New insn pattern.
>>     (stack_protect_combined_test): New insn_and_split pattern.
>>     (stack_protect_test): New insn pattern.
>>     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>     (UNSPEC_SP_TEST): Likewise.
>>     * doc/md.texi (stack_protect_combined_set): Document new standard
>>     pattern name.
>>     (stack_protect_set): Clarify that the operand for guard's address is
>>     legal.
>>     (stack_protect_combined_test): Document new standard pattern name.
>>     (stack_protect_test): Clarify that the operand for guard's address is
>>     legal.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>
>>     * gcc.target/arm/pr85434.c: New test.
>>
>>
>> Testing:
>>
>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
>> cross ARM Linux: build + testsuite -> no regression
>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
>>
>> Is this ok for trunk?
>>
>> Best regards,
>>
>> Thomas
> 
> fix_pr85434_prevent_spilling_stack_protector_guard_address.patch
> 
> From 922cc5d7054bc598732e4ad6d408c7e4297c519a Mon Sep 17 00:00:00 2001
> From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
> Date: Tue, 8 May 2018 15:47:05 +0100
> Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
>  on ARM
> 
> In case of high register pressure in PIC mode, address of the stack
> protector's guard can be spilled on ARM targets as shown in PR85434,
> thus allowing an attacker to control what the canary would be compared
> against. ARM does lack stack_protect_set and stack_protect_test insn
> patterns, defining them does not help as the address is expanded
> regularly and the patterns only deal with the copy and test of the
> guard with the canary.
> 
> This problem does not occur for x86 targets because the PIC access and
> the test can be done in the same instruction. Aarch64 is exempt too
> because PIC access insn pattern are mov of UNSPEC which prevents it from
> the second access in the epilogue being CSEd in cse_local pass with the
> first access in the prologue.
> 
> The approach followed here is to create new "combined" set and test
> standard pattern names that take the unexpanded guard and do the set or
> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> to hide the individual instructions being generated to the compiler and
> split the pattern into generic load, compare and branch instruction
> after register allocator, therefore avoiding any spilling. This is here
> implemented for the ARM targets. For targets not implementing these new
> standard pattern names, the existing stack_protect_set and
> stack_protect_test pattern names are used.
> 
> To be able to split PIC access after register allocation, the functions
> had to be augmented to force a new PIC register load and to control
> which register it loads into. This is because sharing the PIC register
> between prologue and epilogue could lead to spilling due to CSE again
> which an attacker could use to control what the canary gets compared
> against.
> 
> ChangeLog entries are as follows:
> 
> *** gcc/ChangeLog ***
> 
> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> 
> 	* target-insns.def (stack_protect_combined_set): Define new standard
> 	pattern name.
> 	(stack_protect_combined_test): Likewise.
> 	* cfgexpand.c (stack_protect_prologue): Try new
> 	stack_protect_combined_set pattern first.
> 	* function.c (stack_protect_epilogue): Try new
> 	stack_protect_combined_test pattern first.
> 	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> 	parameters to control which register to use as PIC register and force
> 	reloading PIC register respectively.  Insert in the stream of insns if
> 	possible.
> 	(legitimize_pic_address): Expose above new parameters in prototype and
> 	adapt recursive calls accordingly.
> 	(arm_legitimize_address): Adapt to new legitimize_pic_address
> 	prototype.
> 	(thumb_legitimize_address): Likewise.
> 	(arm_emit_call_insn): Adapt to new require_pic_register prototype.
> 	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> 	change.
> 	* config/arm/predicated.md (guard_operand): New predicate.
> 	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> 	prototype change.
> 	(stack_protect_combined_set): New insn_and_split pattern.
> 	(stack_protect_set): New insn pattern.
> 	(stack_protect_combined_test): New insn_and_split pattern.
> 	(stack_protect_test): New insn pattern.
> 	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> 	(UNSPEC_SP_TEST): Likewise.
> 	* doc/md.texi (stack_protect_combined_set): Document new standard
> 	pattern name.
> 	(stack_protect_set): Clarify that the operand for guard's address is
> 	legal.
> 	(stack_protect_combined_test): Document new standard pattern name.
> 	(stack_protect_test): Clarify that the operand for guard's address is
> 	legal.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> 
> 	* gcc.target/arm/pr85434.c: New test.
> 
> Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
> Aarch64. Testsuite shows no regression on these 3 variants either both
> with default flags and with -fstack-protector-all.
> 
> Is this ok for trunk? If yes, would this be acceptable as a backport to
> GCC 6, 7 and 8 provided that no regression is found?
> 
> Best regards,
> 
> Thomas
> 
> Change-Id: I993343e3063fb570af706624e08b475732a5ec57
> ---
>  gcc/cfgexpand.c                        |  17 +++
>  gcc/config/arm/arm-protos.h            |   2 +-
>  gcc/config/arm/arm.c                   |  56 +++++--
>  gcc/config/arm/arm.md                  |  94 +++++++++++-
>  gcc/config/arm/predicates.md           |  10 ++
>  gcc/config/arm/unspecs.md              |   3 +
>  gcc/doc/md.texi                        |  55 ++++++-
>  gcc/function.c                         |  32 +++-
>  gcc/target-insns.def                   |   2 +
>  gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
>  10 files changed, 438 insertions(+), 33 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c
I think the target independent bits here are still fine.


Jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-08-29  9:51 ` [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM Thomas Preudhomme
  2018-08-29 10:07   ` Thomas Preudhomme
  2018-09-18  0:57   ` Jeff Law
@ 2018-09-25 16:13   ` Kyrill Tkachov
  2018-10-23 13:17     ` Thomas Preudhomme
  2 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2018-09-25 16:13 UTC (permalink / raw)
  To: Thomas Preudhomme, Jeff Law, Ramana Radhakrishnan, Richard Earnshaw
  Cc: gcc-patches

Hi Thomas,

On 29/08/18 10:51, Thomas Preudhomme wrote:
> Resend hopefully without HTML this time.
>
> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
>> Hi,
>>
>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
>>
>>
>> In case of high register pressure in PIC mode, address of the stack
>> protector's guard can be spilled on ARM targets as shown in PR85434,
>> thus allowing an attacker to control what the canary would be compared
>> against. ARM does lack stack_protect_set and stack_protect_test insn
>> patterns, defining them does not help as the address is expanded
>> regularly and the patterns only deal with the copy and test of the
>> guard with the canary.
>>
>> This problem does not occur for x86 targets because the PIC access and
>> the test can be done in the same instruction. Aarch64 is exempt too
>> because PIC access insn pattern are mov of UNSPEC which prevents it from
>> the second access in the epilogue being CSEd in cse_local pass with the
>> first access in the prologue.
>>
>> The approach followed here is to create new "combined" set and test
>> standard pattern names that take the unexpanded guard and do the set or
>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
>> to hide the individual instructions being generated to the compiler and
>> split the pattern into generic load, compare and branch instruction
>> after register allocator, therefore avoiding any spilling. This is here
>> implemented for the ARM targets. For targets not implementing these new
>> standard pattern names, the existing stack_protect_set and
>> stack_protect_test pattern names are used.
>>
>> To be able to split PIC access after register allocation, the functions
>> had to be augmented to force a new PIC register load and to control
>> which register it loads into. This is because sharing the PIC register
>> between prologue and epilogue could lead to spilling due to CSE again
>> which an attacker could use to control what the canary gets compared
>> against.
>>
>> ChangeLog entries are as follows:
>>
>> *** gcc/ChangeLog ***
>>
>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>
>>      * target-insns.def (stack_protect_combined_set): Define new standard
>>      pattern name.
>>      (stack_protect_combined_test): Likewise.
>>      * cfgexpand.c (stack_protect_prologue): Try new
>>      stack_protect_combined_set pattern first.
>>      * function.c (stack_protect_epilogue): Try new
>>      stack_protect_combined_test pattern first.
>>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>      parameters to control which register to use as PIC register and force
>>      reloading PIC register respectively.  Insert in the stream of insns if
>>      possible.
>>      (legitimize_pic_address): Expose above new parameters in prototype and
>>      adapt recursive calls accordingly.
>>      (arm_legitimize_address): Adapt to new legitimize_pic_address
>>      prototype.
>>      (thumb_legitimize_address): Likewise.
>>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>      change.
>>      * config/arm/predicated.md (guard_operand): New predicate.

Typo, predicates.md is the filename.

Looks ok to me otherwise.
Thank you for your patience.

Kyrill

>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>      prototype change.
>>      (stack_protect_combined_set): New insn_and_split pattern.
>>      (stack_protect_set): New insn pattern.
>>      (stack_protect_combined_test): New insn_and_split pattern.
>>      (stack_protect_test): New insn pattern.
>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>      (UNSPEC_SP_TEST): Likewise.
>>      * doc/md.texi (stack_protect_combined_set): Document new standard
>>      pattern name.
>>      (stack_protect_set): Clarify that the operand for guard's address is
>>      legal.
>>      (stack_protect_combined_test): Document new standard pattern name.
>>      (stack_protect_test): Clarify that the operand for guard's address is
>>      legal.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>
>>      * gcc.target/arm/pr85434.c: New test.

>>
>> Testing:
>>
>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
>> cross ARM Linux: build + testsuite -> no regression
>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
>>
>> Is this ok for trunk?
>>
>> Best regards,
>>
>> Thomas

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-09-25 16:13   ` Kyrill Tkachov
@ 2018-10-23 13:17     ` Thomas Preudhomme
  2018-10-24 10:38       ` Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-10-23 13:17 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 10029 bytes --]

[Removing Jeff Law since middle end code hasn't changed]

Hi,

Given how memory operand are reloaded even with an X constraint, I've
reworked the patch for the combined set and combined test instruction
ot keep the mem out of the match_operand and used an expander to
generate the right instruction pattern. I've also fixed some
longstanding issues with the patch when flag_pic is true and with
constraints for Thumb-1 that I hadn't noticed before due to using
dg-cmp-results in conjunction with test_summary which does not show
NA->FAIL (see [1]).

All in all, I think the Arm code would do with a fresh review rather
than looking at the changes since last posted version. (unchanged)
ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    * target-insns.def (stack_protect_combined_set): Define new standard
    pattern name.
    (stack_protect_combined_test): Likewise.
    * cfgexpand.c (stack_protect_prologue): Try new
    stack_protect_combined_set pattern first.
    * function.c (stack_protect_epilogue): Try new
    stack_protect_combined_test pattern first.
    * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
    parameters to control which register to use as PIC register and force
    reloading PIC register respectively.  Insert in the stream of insns if
    possible.
    (legitimize_pic_address): Expose above new parameters in prototype and
    adapt recursive calls accordingly.  Use pic_reg if non null instead of
    cached one.
    (arm_load_pic_register): Add pic_reg parameter and use it if non null.
    (arm_legitimize_address): Adapt to new legitimize_pic_address
    prototype.
    (thumb_legitimize_address): Likewise.
    (arm_emit_call_insn): Adapt to require_pic_register prototype change.
    (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
    (thumb1_expand_prologue): Likewise.
    * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
    change.
    (arm_load_pic_register): Likewise.
    * config/arm/predicated.md (guard_addr_operand): New predicate.
    (guard_operand): New predicate.
    * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
    prototype change.
    (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
    prototype change.
    (stack_protect_combined_set): New expander..
    (stack_protect_combined_set_insn): New insn_and_split pattern.
    (stack_protect_set_insn): New insn pattern.
    (stack_protect_combined_test): New expander.
    (stack_protect_combined_test_insn): New insn_and_split pattern.
    (stack_protect_test_insn): New insn pattern.
    * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
    (UNSPEC_SP_TEST): Likewise.
    * doc/md.texi (stack_protect_combined_set): Document new standard
    pattern name.
    (stack_protect_set): Clarify that the operand for guard's address is
    legal.
    (stack_protect_combined_test): Document new standard pattern name.
    (stack_protect_test): Clarify that the operand for guard's address is
    legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

    * gcc.target/arm/pr85434.c: New test.

Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
-fPIC -fstack-protect-all. A glibc build and testsuite run was also
performed for Arm and Thumb-2. Default flags show no regression and
the other runs have some expected scan-assembler failing (due to stack
protector or fPIC code sequence), as well as guality fail (due to less
optimized code with the new stack protector code) and some execution
failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
due to the PIC sequence for the global variable making the frame
layout different for the 2 functions (these become PASS if making the
global variable static).

Is this ok for trunk?

Best regards,

Thomas

[1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html


On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Thomas,
>
> On 29/08/18 10:51, Thomas Preudhomme wrote:
> > Resend hopefully without HTML this time.
> >
> > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> >> Hi,
> >>
> >> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> >>
> >>
> >> In case of high register pressure in PIC mode, address of the stack
> >> protector's guard can be spilled on ARM targets as shown in PR85434,
> >> thus allowing an attacker to control what the canary would be compared
> >> against. ARM does lack stack_protect_set and stack_protect_test insn
> >> patterns, defining them does not help as the address is expanded
> >> regularly and the patterns only deal with the copy and test of the
> >> guard with the canary.
> >>
> >> This problem does not occur for x86 targets because the PIC access and
> >> the test can be done in the same instruction. Aarch64 is exempt too
> >> because PIC access insn pattern are mov of UNSPEC which prevents it from
> >> the second access in the epilogue being CSEd in cse_local pass with the
> >> first access in the prologue.
> >>
> >> The approach followed here is to create new "combined" set and test
> >> standard pattern names that take the unexpanded guard and do the set or
> >> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> >> to hide the individual instructions being generated to the compiler and
> >> split the pattern into generic load, compare and branch instruction
> >> after register allocator, therefore avoiding any spilling. This is here
> >> implemented for the ARM targets. For targets not implementing these new
> >> standard pattern names, the existing stack_protect_set and
> >> stack_protect_test pattern names are used.
> >>
> >> To be able to split PIC access after register allocation, the functions
> >> had to be augmented to force a new PIC register load and to control
> >> which register it loads into. This is because sharing the PIC register
> >> between prologue and epilogue could lead to spilling due to CSE again
> >> which an attacker could use to control what the canary gets compared
> >> against.
> >>
> >> ChangeLog entries are as follows:
> >>
> >> *** gcc/ChangeLog ***
> >>
> >> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>
> >>      * target-insns.def (stack_protect_combined_set): Define new standard
> >>      pattern name.
> >>      (stack_protect_combined_test): Likewise.
> >>      * cfgexpand.c (stack_protect_prologue): Try new
> >>      stack_protect_combined_set pattern first.
> >>      * function.c (stack_protect_epilogue): Try new
> >>      stack_protect_combined_test pattern first.
> >>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>      parameters to control which register to use as PIC register and force
> >>      reloading PIC register respectively.  Insert in the stream of insns if
> >>      possible.
> >>      (legitimize_pic_address): Expose above new parameters in prototype and
> >>      adapt recursive calls accordingly.
> >>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>      prototype.
> >>      (thumb_legitimize_address): Likewise.
> >>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> >>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>      change.
> >>      * config/arm/predicated.md (guard_operand): New predicate.
>
> Typo, predicates.md is the filename.
>
> Looks ok to me otherwise.
> Thank you for your patience.
>
> Kyrill
>
> >>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>      prototype change.
> >>      (stack_protect_combined_set): New insn_and_split pattern.
> >>      (stack_protect_set): New insn pattern.
> >>      (stack_protect_combined_test): New insn_and_split pattern.
> >>      (stack_protect_test): New insn pattern.
> >>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>      (UNSPEC_SP_TEST): Likewise.
> >>      * doc/md.texi (stack_protect_combined_set): Document new standard
> >>      pattern name.
> >>      (stack_protect_set): Clarify that the operand for guard's address is
> >>      legal.
> >>      (stack_protect_combined_test): Document new standard pattern name.
> >>      (stack_protect_test): Clarify that the operand for guard's address is
> >>      legal.
> >>
> >> *** gcc/testsuite/ChangeLog ***
> >>
> >> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>
> >>      * gcc.target/arm/pr85434.c: New test.
>
> >>
> >> Testing:
> >>
> >> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> >> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> >> cross ARM Linux: build + testsuite -> no regression
> >> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> >> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> >> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> >>
> >> Is this ok for trunk?
> >>
> >> Best regards,
> >>
> >> Thomas
>

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 34775 bytes --]

From e7b6450555b96e7e22ffc7e82badfdffbf96d921 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.  Use pic_reg if non null instead of
	cached one.
	(arm_load_pic_register): Add pic_reg parameter and use it if non null.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to require_pic_register prototype change.
	(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
	(thumb1_expand_prologue): Likewise.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	(arm_load_pic_register): Likewise.
	* config/arm/predicated.md (guard_addr_operand): New predicate.
	(guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
	prototype change.
	(stack_protect_combined_set): New expander..
	(stack_protect_combined_set_insn): New insn_and_split pattern.
	(stack_protect_set_insn): New insn pattern.
	(stack_protect_combined_test): New expander.
	(stack_protect_combined_test_insn): New insn_and_split pattern.
	(stack_protect_test_insn): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   4 +-
 gcc/config/arm/arm.c                   |  87 ++++++++---
 gcc/config/arm/arm.md                  | 158 ++++++++++++++++++-
 gcc/config/arm/predicates.md           |  17 +++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 10 files changed, 532 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..763941868d2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
   return 1;
 }
 
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
 	{
 	  rtx_insn *seq, *insn;
 
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
 
 	      seq = get_insns ();
 	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
 
 	  /* Make the MEM as close to a constant as possible.  */
 	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
    low register.  */
 
 void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
 
   if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
     return;
 
   gcc_assert (flag_pic);
 
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
   if (TARGET_VXWORKS_RTP)
     {
       pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
       mask &= THUMB2_WORK_REGS;
       if (!IS_NESTED (func_type))
 	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, 0);
     }
 
   /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
   /* Load the pic register before setting the frame pointer,
      so we can use r7 as a temporary work register.  */
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, 0);
 
   if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
     emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..9a5fe570c62 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -6309,7 +6310,7 @@
   /* r3 is clobbered by set/longjmp, so we can use it as a scratch
      register.  */
   if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, 0);
   DONE;
 }")
 
@@ -8634,6 +8635,159 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))
+	      (clobber (reg:CC CC_REGNUM))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12,12")
+   (set_attr "conds" "clob")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,t2,a")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  emit_insn (gen_stack_protect_test_insn (operands[4], operands[0],
+					  operands[3]));
+  eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+  emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx, operands[2]));
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "stack_protect_test_insn"
+  [(set (match_operand:SI 0 "register_operand" "=l,r,r")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m,m,m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "+l,r,r"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "@
+   ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0
+   ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0
+   ldr\t%0, [%2]\;ldr\t%2, %1\;eor\t%0, %2, %0"
+  [(set_attr "length" "8,12,12")
+   (set_attr "conds" "clob")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,t2,a")]
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..69718ee9c7a 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,23 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard's address in
+; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
+(define_predicate "guard_addr_operand"
+  (match_test "true")
+{
+  return (CONSTANT_ADDRESS_P (op)
+	  || !targetm.cannot_force_const_mem (mode, op));
+})
+
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  return guard_addr_operand (XEXP (op, 0), mode);
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..0667a242ef3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7424,22 +7424,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 302438323c8..65f34db0651 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = 0;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-10-23 13:17     ` Thomas Preudhomme
@ 2018-10-24 10:38       ` Thomas Preudhomme
  2018-10-25 16:10         ` Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-10-24 10:38 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

Please hold on for the reviews, found a small improvement that could
be done. Am testing it right now, should have something by tonight or
tomorrow.

Best regards,

Thomas
On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> [Removing Jeff Law since middle end code hasn't changed]
>
> Hi,
>
> Given how memory operand are reloaded even with an X constraint, I've
> reworked the patch for the combined set and combined test instruction
> ot keep the mem out of the match_operand and used an expander to
> generate the right instruction pattern. I've also fixed some
> longstanding issues with the patch when flag_pic is true and with
> constraints for Thumb-1 that I hadn't noticed before due to using
> dg-cmp-results in conjunction with test_summary which does not show
> NA->FAIL (see [1]).
>
> All in all, I think the Arm code would do with a fresh review rather
> than looking at the changes since last posted version. (unchanged)
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
>     * target-insns.def (stack_protect_combined_set): Define new standard
>     pattern name.
>     (stack_protect_combined_test): Likewise.
>     * cfgexpand.c (stack_protect_prologue): Try new
>     stack_protect_combined_set pattern first.
>     * function.c (stack_protect_epilogue): Try new
>     stack_protect_combined_test pattern first.
>     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>     parameters to control which register to use as PIC register and force
>     reloading PIC register respectively.  Insert in the stream of insns if
>     possible.
>     (legitimize_pic_address): Expose above new parameters in prototype and
>     adapt recursive calls accordingly.  Use pic_reg if non null instead of
>     cached one.
>     (arm_load_pic_register): Add pic_reg parameter and use it if non null.
>     (arm_legitimize_address): Adapt to new legitimize_pic_address
>     prototype.
>     (thumb_legitimize_address): Likewise.
>     (arm_emit_call_insn): Adapt to require_pic_register prototype change.
>     (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
>     (thumb1_expand_prologue): Likewise.
>     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>     change.
>     (arm_load_pic_register): Likewise.
>     * config/arm/predicated.md (guard_addr_operand): New predicate.
>     (guard_operand): New predicate.
>     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>     prototype change.
>     (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
>     prototype change.
>     (stack_protect_combined_set): New expander..
>     (stack_protect_combined_set_insn): New insn_and_split pattern.
>     (stack_protect_set_insn): New insn pattern.
>     (stack_protect_combined_test): New expander.
>     (stack_protect_combined_test_insn): New insn_and_split pattern.
>     (stack_protect_test_insn): New insn pattern.
>     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>     (UNSPEC_SP_TEST): Likewise.
>     * doc/md.texi (stack_protect_combined_set): Document new standard
>     pattern name.
>     (stack_protect_set): Clarify that the operand for guard's address is
>     legal.
>     (stack_protect_combined_test): Document new standard pattern name.
>     (stack_protect_test): Clarify that the operand for guard's address is
>     legal.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
>     * gcc.target/arm/pr85434.c: New test.
>
> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> performed for Arm and Thumb-2. Default flags show no regression and
> the other runs have some expected scan-assembler failing (due to stack
> protector or fPIC code sequence), as well as guality fail (due to less
> optimized code with the new stack protector code) and some execution
> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> due to the PIC sequence for the global variable making the frame
> layout different for the 2 functions (these become PASS if making the
> global variable static).
>
> Is this ok for trunk?
>
> Best regards,
>
> Thomas
>
> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
>
>
> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> <kyrylo.tkachov@foss.arm.com> wrote:
> >
> > Hi Thomas,
> >
> > On 29/08/18 10:51, Thomas Preudhomme wrote:
> > > Resend hopefully without HTML this time.
> > >
> > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > > <thomas.preudhomme@linaro.org> wrote:
> > >> Hi,
> > >>
> > >> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > >>
> > >>
> > >> In case of high register pressure in PIC mode, address of the stack
> > >> protector's guard can be spilled on ARM targets as shown in PR85434,
> > >> thus allowing an attacker to control what the canary would be compared
> > >> against. ARM does lack stack_protect_set and stack_protect_test insn
> > >> patterns, defining them does not help as the address is expanded
> > >> regularly and the patterns only deal with the copy and test of the
> > >> guard with the canary.
> > >>
> > >> This problem does not occur for x86 targets because the PIC access and
> > >> the test can be done in the same instruction. Aarch64 is exempt too
> > >> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > >> the second access in the epilogue being CSEd in cse_local pass with the
> > >> first access in the prologue.
> > >>
> > >> The approach followed here is to create new "combined" set and test
> > >> standard pattern names that take the unexpanded guard and do the set or
> > >> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > >> to hide the individual instructions being generated to the compiler and
> > >> split the pattern into generic load, compare and branch instruction
> > >> after register allocator, therefore avoiding any spilling. This is here
> > >> implemented for the ARM targets. For targets not implementing these new
> > >> standard pattern names, the existing stack_protect_set and
> > >> stack_protect_test pattern names are used.
> > >>
> > >> To be able to split PIC access after register allocation, the functions
> > >> had to be augmented to force a new PIC register load and to control
> > >> which register it loads into. This is because sharing the PIC register
> > >> between prologue and epilogue could lead to spilling due to CSE again
> > >> which an attacker could use to control what the canary gets compared
> > >> against.
> > >>
> > >> ChangeLog entries are as follows:
> > >>
> > >> *** gcc/ChangeLog ***
> > >>
> > >> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>
> > >>      * target-insns.def (stack_protect_combined_set): Define new standard
> > >>      pattern name.
> > >>      (stack_protect_combined_test): Likewise.
> > >>      * cfgexpand.c (stack_protect_prologue): Try new
> > >>      stack_protect_combined_set pattern first.
> > >>      * function.c (stack_protect_epilogue): Try new
> > >>      stack_protect_combined_test pattern first.
> > >>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > >>      parameters to control which register to use as PIC register and force
> > >>      reloading PIC register respectively.  Insert in the stream of insns if
> > >>      possible.
> > >>      (legitimize_pic_address): Expose above new parameters in prototype and
> > >>      adapt recursive calls accordingly.
> > >>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> > >>      prototype.
> > >>      (thumb_legitimize_address): Likewise.
> > >>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > >>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > >>      change.
> > >>      * config/arm/predicated.md (guard_operand): New predicate.
> >
> > Typo, predicates.md is the filename.
> >
> > Looks ok to me otherwise.
> > Thank you for your patience.
> >
> > Kyrill
> >
> > >>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > >>      prototype change.
> > >>      (stack_protect_combined_set): New insn_and_split pattern.
> > >>      (stack_protect_set): New insn pattern.
> > >>      (stack_protect_combined_test): New insn_and_split pattern.
> > >>      (stack_protect_test): New insn pattern.
> > >>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > >>      (UNSPEC_SP_TEST): Likewise.
> > >>      * doc/md.texi (stack_protect_combined_set): Document new standard
> > >>      pattern name.
> > >>      (stack_protect_set): Clarify that the operand for guard's address is
> > >>      legal.
> > >>      (stack_protect_combined_test): Document new standard pattern name.
> > >>      (stack_protect_test): Clarify that the operand for guard's address is
> > >>      legal.
> > >>
> > >> *** gcc/testsuite/ChangeLog ***
> > >>
> > >> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>
> > >>      * gcc.target/arm/pr85434.c: New test.
> >
> > >>
> > >> Testing:
> > >>
> > >> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > >> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > >> cross ARM Linux: build + testsuite -> no regression
> > >> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > >> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > >> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > >>
> > >> Is this ok for trunk?
> > >>
> > >> Best regards,
> > >>
> > >> Thomas
> >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-10-24 10:38       ` Thomas Preudhomme
@ 2018-10-25 16:10         ` Thomas Preudhomme
  2018-10-27  4:37           ` Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-10-25 16:10 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

Good thing I did, found a missing earlyclobber in the process.
Rerunning all tests again.

Best regards,

Thomas
On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Please hold on for the reviews, found a small improvement that could
> be done. Am testing it right now, should have something by tonight or
> tomorrow.
>
> Best regards,
>
> Thomas
> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
> >
> > [Removing Jeff Law since middle end code hasn't changed]
> >
> > Hi,
> >
> > Given how memory operand are reloaded even with an X constraint, I've
> > reworked the patch for the combined set and combined test instruction
> > ot keep the mem out of the match_operand and used an expander to
> > generate the right instruction pattern. I've also fixed some
> > longstanding issues with the patch when flag_pic is true and with
> > constraints for Thumb-1 that I hadn't noticed before due to using
> > dg-cmp-results in conjunction with test_summary which does not show
> > NA->FAIL (see [1]).
> >
> > All in all, I think the Arm code would do with a fresh review rather
> > than looking at the changes since last posted version. (unchanged)
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >
> >     * target-insns.def (stack_protect_combined_set): Define new standard
> >     pattern name.
> >     (stack_protect_combined_test): Likewise.
> >     * cfgexpand.c (stack_protect_prologue): Try new
> >     stack_protect_combined_set pattern first.
> >     * function.c (stack_protect_epilogue): Try new
> >     stack_protect_combined_test pattern first.
> >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >     parameters to control which register to use as PIC register and force
> >     reloading PIC register respectively.  Insert in the stream of insns if
> >     possible.
> >     (legitimize_pic_address): Expose above new parameters in prototype and
> >     adapt recursive calls accordingly.  Use pic_reg if non null instead of
> >     cached one.
> >     (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> >     (arm_legitimize_address): Adapt to new legitimize_pic_address
> >     prototype.
> >     (thumb_legitimize_address): Likewise.
> >     (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> >     (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> >     (thumb1_expand_prologue): Likewise.
> >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >     change.
> >     (arm_load_pic_register): Likewise.
> >     * config/arm/predicated.md (guard_addr_operand): New predicate.
> >     (guard_operand): New predicate.
> >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >     prototype change.
> >     (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> >     prototype change.
> >     (stack_protect_combined_set): New expander..
> >     (stack_protect_combined_set_insn): New insn_and_split pattern.
> >     (stack_protect_set_insn): New insn pattern.
> >     (stack_protect_combined_test): New expander.
> >     (stack_protect_combined_test_insn): New insn_and_split pattern.
> >     (stack_protect_test_insn): New insn pattern.
> >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >     (UNSPEC_SP_TEST): Likewise.
> >     * doc/md.texi (stack_protect_combined_set): Document new standard
> >     pattern name.
> >     (stack_protect_set): Clarify that the operand for guard's address is
> >     legal.
> >     (stack_protect_combined_test): Document new standard pattern name.
> >     (stack_protect_test): Clarify that the operand for guard's address is
> >     legal.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >
> >     * gcc.target/arm/pr85434.c: New test.
> >
> > Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> > with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> > -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> > performed for Arm and Thumb-2. Default flags show no regression and
> > the other runs have some expected scan-assembler failing (due to stack
> > protector or fPIC code sequence), as well as guality fail (due to less
> > optimized code with the new stack protector code) and some execution
> > failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> > due to the PIC sequence for the global variable making the frame
> > layout different for the 2 functions (these become PASS if making the
> > global variable static).
> >
> > Is this ok for trunk?
> >
> > Best regards,
> >
> > Thomas
> >
> > [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> >
> >
> > On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> > <kyrylo.tkachov@foss.arm.com> wrote:
> > >
> > > Hi Thomas,
> > >
> > > On 29/08/18 10:51, Thomas Preudhomme wrote:
> > > > Resend hopefully without HTML this time.
> > > >
> > > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > > > <thomas.preudhomme@linaro.org> wrote:
> > > >> Hi,
> > > >>
> > > >> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > > >>
> > > >>
> > > >> In case of high register pressure in PIC mode, address of the stack
> > > >> protector's guard can be spilled on ARM targets as shown in PR85434,
> > > >> thus allowing an attacker to control what the canary would be compared
> > > >> against. ARM does lack stack_protect_set and stack_protect_test insn
> > > >> patterns, defining them does not help as the address is expanded
> > > >> regularly and the patterns only deal with the copy and test of the
> > > >> guard with the canary.
> > > >>
> > > >> This problem does not occur for x86 targets because the PIC access and
> > > >> the test can be done in the same instruction. Aarch64 is exempt too
> > > >> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > > >> the second access in the epilogue being CSEd in cse_local pass with the
> > > >> first access in the prologue.
> > > >>
> > > >> The approach followed here is to create new "combined" set and test
> > > >> standard pattern names that take the unexpanded guard and do the set or
> > > >> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > > >> to hide the individual instructions being generated to the compiler and
> > > >> split the pattern into generic load, compare and branch instruction
> > > >> after register allocator, therefore avoiding any spilling. This is here
> > > >> implemented for the ARM targets. For targets not implementing these new
> > > >> standard pattern names, the existing stack_protect_set and
> > > >> stack_protect_test pattern names are used.
> > > >>
> > > >> To be able to split PIC access after register allocation, the functions
> > > >> had to be augmented to force a new PIC register load and to control
> > > >> which register it loads into. This is because sharing the PIC register
> > > >> between prologue and epilogue could lead to spilling due to CSE again
> > > >> which an attacker could use to control what the canary gets compared
> > > >> against.
> > > >>
> > > >> ChangeLog entries are as follows:
> > > >>
> > > >> *** gcc/ChangeLog ***
> > > >>
> > > >> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > >>
> > > >>      * target-insns.def (stack_protect_combined_set): Define new standard
> > > >>      pattern name.
> > > >>      (stack_protect_combined_test): Likewise.
> > > >>      * cfgexpand.c (stack_protect_prologue): Try new
> > > >>      stack_protect_combined_set pattern first.
> > > >>      * function.c (stack_protect_epilogue): Try new
> > > >>      stack_protect_combined_test pattern first.
> > > >>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > >>      parameters to control which register to use as PIC register and force
> > > >>      reloading PIC register respectively.  Insert in the stream of insns if
> > > >>      possible.
> > > >>      (legitimize_pic_address): Expose above new parameters in prototype and
> > > >>      adapt recursive calls accordingly.
> > > >>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > >>      prototype.
> > > >>      (thumb_legitimize_address): Likewise.
> > > >>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > > >>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > >>      change.
> > > >>      * config/arm/predicated.md (guard_operand): New predicate.
> > >
> > > Typo, predicates.md is the filename.
> > >
> > > Looks ok to me otherwise.
> > > Thank you for your patience.
> > >
> > > Kyrill
> > >
> > > >>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > >>      prototype change.
> > > >>      (stack_protect_combined_set): New insn_and_split pattern.
> > > >>      (stack_protect_set): New insn pattern.
> > > >>      (stack_protect_combined_test): New insn_and_split pattern.
> > > >>      (stack_protect_test): New insn pattern.
> > > >>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > >>      (UNSPEC_SP_TEST): Likewise.
> > > >>      * doc/md.texi (stack_protect_combined_set): Document new standard
> > > >>      pattern name.
> > > >>      (stack_protect_set): Clarify that the operand for guard's address is
> > > >>      legal.
> > > >>      (stack_protect_combined_test): Document new standard pattern name.
> > > >>      (stack_protect_test): Clarify that the operand for guard's address is
> > > >>      legal.
> > > >>
> > > >> *** gcc/testsuite/ChangeLog ***
> > > >>
> > > >> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > >>
> > > >>      * gcc.target/arm/pr85434.c: New test.
> > >
> > > >>
> > > >> Testing:
> > > >>
> > > >> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > > >> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > > >> cross ARM Linux: build + testsuite -> no regression
> > > >> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > >> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > >> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > > >>
> > > >> Is this ok for trunk?
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Thomas
> > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-10-25 16:10         ` Thomas Preudhomme
@ 2018-10-27  4:37           ` Thomas Preudhomme
  2018-11-01 16:03             ` [PATCH, ARM, ping] " Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-10-27  4:37 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 17122 bytes --]

Hi,

Please find updated patch to fix PR85434: spilling of stack protector
guard's address on ARM. Quite a few changes have been made to the ARM
part since last round of review so I think it makes more sense to
review it anew. Ran bootstrap + regression testsuite + glibc build +
glibc regression testsuite for Arm and Thumb-2 and bootstrap +
regression testsuite for Thumb-1. GCC's regression testsuite was run
in 3 configurations in all those cases:

- default configuration (no RUNTESTFLAGS)
- with -fstack-protector-all
- with -fPIC -fstack-protector-all (to exercise both codepath in stack
protector's split code)

None of this show any regression beyond some new scan fail with
-fstack-protector-all or -fPIC due to unexpected code sequence for the
testcases concerned and some guality swing due to less optimization
with new stack protector on.

Patch description and ChangeLog below.

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

* target-insns.def (stack_protect_combined_set): Define new standard
pattern name.
(stack_protect_combined_test): Likewise.
* cfgexpand.c (stack_protect_prologue): Try new
stack_protect_combined_set pattern first.
* function.c (stack_protect_epilogue): Try new
stack_protect_combined_test pattern first.
* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
parameters to control which register to use as PIC register and force
reloading PIC register respectively.  Insert in the stream of insns if
possible.
(legitimize_pic_address): Expose above new parameters in prototype and
adapt recursive calls accordingly.  Use pic_reg if non null instead of
cached one.
(arm_load_pic_register): Add pic_reg parameter and use it if non null.
(arm_legitimize_address): Adapt to new legitimize_pic_address
prototype.
(thumb_legitimize_address): Likewise.
(arm_emit_call_insn): Adapt to require_pic_register prototype change.
(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
(thumb1_expand_prologue): Likewise.
* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
change.
(arm_load_pic_register): Likewise.
* config/arm/predicated.md (guard_addr_operand): New predicate.
(guard_operand): New predicate.
* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
prototype change.
(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
prototype change.
(stack_protect_combined_set): New expander..
(stack_protect_combined_set_insn): New insn_and_split pattern.
(stack_protect_set_insn): New insn pattern.
(stack_protect_combined_test): New expander.
(stack_protect_combined_test_insn): New insn_and_split pattern.
(arm_stack_protect_test_insn): New insn pattern.
* config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
(UNSPEC_SP_TEST): Likewise.
* doc/md.texi (stack_protect_combined_set): Document new standard
pattern name.
(stack_protect_set): Clarify that the operand for guard's address is
legal.
(stack_protect_combined_test): Document new standard pattern name.
(stack_protect_test): Clarify that the operand for guard's address is
legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

* gcc.target/arm/pr85434.c: New test.

Is this ok for trunk?

Best regards,

Thomas
On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Good thing I did, found a missing earlyclobber in the process.
> Rerunning all tests again.
>
> Best regards,
>
> Thomas
> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
> >
> > Please hold on for the reviews, found a small improvement that could
> > be done. Am testing it right now, should have something by tonight or
> > tomorrow.
> >
> > Best regards,
> >
> > Thomas
> > On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> > >
> > > [Removing Jeff Law since middle end code hasn't changed]
> > >
> > > Hi,
> > >
> > > Given how memory operand are reloaded even with an X constraint, I've
> > > reworked the patch for the combined set and combined test instruction
> > > ot keep the mem out of the match_operand and used an expander to
> > > generate the right instruction pattern. I've also fixed some
> > > longstanding issues with the patch when flag_pic is true and with
> > > constraints for Thumb-1 that I hadn't noticed before due to using
> > > dg-cmp-results in conjunction with test_summary which does not show
> > > NA->FAIL (see [1]).
> > >
> > > All in all, I think the Arm code would do with a fresh review rather
> > > than looking at the changes since last posted version. (unchanged)
> > > ChangeLog entries are as follows:
> > >
> > > *** gcc/ChangeLog ***
> > >
> > > 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >
> > >     * target-insns.def (stack_protect_combined_set): Define new standard
> > >     pattern name.
> > >     (stack_protect_combined_test): Likewise.
> > >     * cfgexpand.c (stack_protect_prologue): Try new
> > >     stack_protect_combined_set pattern first.
> > >     * function.c (stack_protect_epilogue): Try new
> > >     stack_protect_combined_test pattern first.
> > >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > >     parameters to control which register to use as PIC register and force
> > >     reloading PIC register respectively.  Insert in the stream of insns if
> > >     possible.
> > >     (legitimize_pic_address): Expose above new parameters in prototype and
> > >     adapt recursive calls accordingly.  Use pic_reg if non null instead of
> > >     cached one.
> > >     (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> > >     (arm_legitimize_address): Adapt to new legitimize_pic_address
> > >     prototype.
> > >     (thumb_legitimize_address): Likewise.
> > >     (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> > >     (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> > >     (thumb1_expand_prologue): Likewise.
> > >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > >     change.
> > >     (arm_load_pic_register): Likewise.
> > >     * config/arm/predicated.md (guard_addr_operand): New predicate.
> > >     (guard_operand): New predicate.
> > >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > >     prototype change.
> > >     (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> > >     prototype change.
> > >     (stack_protect_combined_set): New expander..
> > >     (stack_protect_combined_set_insn): New insn_and_split pattern.
> > >     (stack_protect_set_insn): New insn pattern.
> > >     (stack_protect_combined_test): New expander.
> > >     (stack_protect_combined_test_insn): New insn_and_split pattern.
> > >     (stack_protect_test_insn): New insn pattern.
> > >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > >     (UNSPEC_SP_TEST): Likewise.
> > >     * doc/md.texi (stack_protect_combined_set): Document new standard
> > >     pattern name.
> > >     (stack_protect_set): Clarify that the operand for guard's address is
> > >     legal.
> > >     (stack_protect_combined_test): Document new standard pattern name.
> > >     (stack_protect_test): Clarify that the operand for guard's address is
> > >     legal.
> > >
> > > *** gcc/testsuite/ChangeLog ***
> > >
> > > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >
> > >     * gcc.target/arm/pr85434.c: New test.
> > >
> > > Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> > > with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> > > -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> > > performed for Arm and Thumb-2. Default flags show no regression and
> > > the other runs have some expected scan-assembler failing (due to stack
> > > protector or fPIC code sequence), as well as guality fail (due to less
> > > optimized code with the new stack protector code) and some execution
> > > failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> > > due to the PIC sequence for the global variable making the frame
> > > layout different for the 2 functions (these become PASS if making the
> > > global variable static).
> > >
> > > Is this ok for trunk?
> > >
> > > Best regards,
> > >
> > > Thomas
> > >
> > > [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> > >
> > >
> > > On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> > > <kyrylo.tkachov@foss.arm.com> wrote:
> > > >
> > > > Hi Thomas,
> > > >
> > > > On 29/08/18 10:51, Thomas Preudhomme wrote:
> > > > > Resend hopefully without HTML this time.
> > > > >
> > > > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > > > > <thomas.preudhomme@linaro.org> wrote:
> > > > >> Hi,
> > > > >>
> > > > >> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > > > >>
> > > > >>
> > > > >> In case of high register pressure in PIC mode, address of the stack
> > > > >> protector's guard can be spilled on ARM targets as shown in PR85434,
> > > > >> thus allowing an attacker to control what the canary would be compared
> > > > >> against. ARM does lack stack_protect_set and stack_protect_test insn
> > > > >> patterns, defining them does not help as the address is expanded
> > > > >> regularly and the patterns only deal with the copy and test of the
> > > > >> guard with the canary.
> > > > >>
> > > > >> This problem does not occur for x86 targets because the PIC access and
> > > > >> the test can be done in the same instruction. Aarch64 is exempt too
> > > > >> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > > > >> the second access in the epilogue being CSEd in cse_local pass with the
> > > > >> first access in the prologue.
> > > > >>
> > > > >> The approach followed here is to create new "combined" set and test
> > > > >> standard pattern names that take the unexpanded guard and do the set or
> > > > >> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > > > >> to hide the individual instructions being generated to the compiler and
> > > > >> split the pattern into generic load, compare and branch instruction
> > > > >> after register allocator, therefore avoiding any spilling. This is here
> > > > >> implemented for the ARM targets. For targets not implementing these new
> > > > >> standard pattern names, the existing stack_protect_set and
> > > > >> stack_protect_test pattern names are used.
> > > > >>
> > > > >> To be able to split PIC access after register allocation, the functions
> > > > >> had to be augmented to force a new PIC register load and to control
> > > > >> which register it loads into. This is because sharing the PIC register
> > > > >> between prologue and epilogue could lead to spilling due to CSE again
> > > > >> which an attacker could use to control what the canary gets compared
> > > > >> against.
> > > > >>
> > > > >> ChangeLog entries are as follows:
> > > > >>
> > > > >> *** gcc/ChangeLog ***
> > > > >>
> > > > >> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > >>
> > > > >>      * target-insns.def (stack_protect_combined_set): Define new standard
> > > > >>      pattern name.
> > > > >>      (stack_protect_combined_test): Likewise.
> > > > >>      * cfgexpand.c (stack_protect_prologue): Try new
> > > > >>      stack_protect_combined_set pattern first.
> > > > >>      * function.c (stack_protect_epilogue): Try new
> > > > >>      stack_protect_combined_test pattern first.
> > > > >>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > > >>      parameters to control which register to use as PIC register and force
> > > > >>      reloading PIC register respectively.  Insert in the stream of insns if
> > > > >>      possible.
> > > > >>      (legitimize_pic_address): Expose above new parameters in prototype and
> > > > >>      adapt recursive calls accordingly.
> > > > >>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > > >>      prototype.
> > > > >>      (thumb_legitimize_address): Likewise.
> > > > >>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > > > >>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > > >>      change.
> > > > >>      * config/arm/predicated.md (guard_operand): New predicate.
> > > >
> > > > Typo, predicates.md is the filename.
> > > >
> > > > Looks ok to me otherwise.
> > > > Thank you for your patience.
> > > >
> > > > Kyrill
> > > >
> > > > >>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > > >>      prototype change.
> > > > >>      (stack_protect_combined_set): New insn_and_split pattern.
> > > > >>      (stack_protect_set): New insn pattern.
> > > > >>      (stack_protect_combined_test): New insn_and_split pattern.
> > > > >>      (stack_protect_test): New insn pattern.
> > > > >>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > > >>      (UNSPEC_SP_TEST): Likewise.
> > > > >>      * doc/md.texi (stack_protect_combined_set): Document new standard
> > > > >>      pattern name.
> > > > >>      (stack_protect_set): Clarify that the operand for guard's address is
> > > > >>      legal.
> > > > >>      (stack_protect_combined_test): Document new standard pattern name.
> > > > >>      (stack_protect_test): Clarify that the operand for guard's address is
> > > > >>      legal.
> > > > >>
> > > > >> *** gcc/testsuite/ChangeLog ***
> > > > >>
> > > > >> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > >>
> > > > >>      * gcc.target/arm/pr85434.c: New test.
> > > >
> > > > >>
> > > > >> Testing:
> > > > >>
> > > > >> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > > > >> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > > > >> cross ARM Linux: build + testsuite -> no regression
> > > > >> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > > >> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > > >> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > > > >>
> > > > >> Is this ok for trunk?
> > > > >>
> > > > >> Best regards,
> > > > >>
> > > > >> Thomas
> > > >

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 35572 bytes --]

From a2dba2bf283c3a7f5a11cf28a2b16b789c66a592 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.  Use pic_reg if non null instead of
	cached one.
	(arm_load_pic_register): Add pic_reg parameter and use it if non null.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to require_pic_register prototype change.
	(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
	(thumb1_expand_prologue): Likewise.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	(arm_load_pic_register): Likewise.
	* config/arm/predicated.md (guard_addr_operand): New predicate.
	(guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
	prototype change.
	(stack_protect_combined_set): New expander..
	(stack_protect_combined_set_insn): New insn_and_split pattern.
	(stack_protect_set_insn): New insn pattern.
	(stack_protect_combined_test): New expander.
	(stack_protect_combined_test_insn): New insn_and_split pattern.
	(arm_stack_protect_test_insn): New insn pattern.
	* config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   4 +-
 gcc/config/arm/arm.c                   |  87 ++++++++---
 gcc/config/arm/arm.md                  | 162 +++++++++++++++++++-
 gcc/config/arm/predicates.md           |  17 +++
 gcc/config/arm/thumb1.md               |  13 ++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 11 files changed, 549 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..763941868d2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
   return 1;
 }
 
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
 	{
 	  rtx_insn *seq, *insn;
 
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
 
 	      seq = get_insns ();
 	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
 
 	  /* Make the MEM as close to a constant as possible.  */
 	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
    low register.  */
 
 void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
 
   if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
     return;
 
   gcc_assert (flag_pic);
 
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
   if (TARGET_VXWORKS_RTP)
     {
       pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
       mask &= THUMB2_WORK_REGS;
       if (!IS_NESTED (func_type))
 	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, 0);
     }
 
   /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
   /* Load the pic register before setting the frame pointer,
      so we can use r7 as a temporary work register.  */
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, 0);
 
   if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
     emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..af2414cb0ff 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -6309,7 +6310,7 @@
   /* r3 is clobbered by set/longjmp, so we can use it as a scratch
      register.  */
   if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, 0);
   DONE;
 }")
 
@@ -8634,6 +8635,163 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "clob,nocond")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t1,32")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  if (TARGET_32BIT)
+    {
+      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
+						  operands[3]));
+      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
+    }
+  else
+    {
+      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
+						     operands[3]));
+      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
+				      operands[2]));
+    }
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "arm_stack_protect_test_insn"
+  [(set (reg:CC_Z CC_REGNUM)
+	(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
+				  (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
+				 UNSPEC_SP_TEST)
+		      (const_int 0)))
+   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
+   (clobber (match_dup 2))]
+  "TARGET_32BIT"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8,12")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,32")]
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..69718ee9c7a 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,23 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard's address in
+; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
+(define_predicate "guard_addr_operand"
+  (match_test "true")
+{
+  return (CONSTANT_ADDRESS_P (op)
+	  || !targetm.cannot_force_const_mem (mode, op));
+})
+
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  return guard_addr_operand (XEXP (op, 0), mode);
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 19dcdbcdd73..cd199c9c529 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1962,4 +1962,17 @@
   }"
   [(set_attr "type" "mov_reg")]
 )
+
+(define_insn "thumb1_stack_protect_test_insn"
+  [(set (match_operand:SI 0 "register_operand" "=&l")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "+l"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  "TARGET_THUMB1"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8")
+   (set_attr "conds" "set")
+   (set_attr "type" "multiple")]
+)
 \f
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..0667a242ef3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7424,22 +7424,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 302438323c8..65f34db0651 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = 0;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-10-27  4:37           ` Thomas Preudhomme
@ 2018-11-01 16:03             ` Thomas Preudhomme
  2018-11-08  9:53               ` [PATCH, ARM, ping2] " Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-01 16:03 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

Ping?

Best regards,

Thomas
On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Hi,
>
> Please find updated patch to fix PR85434: spilling of stack protector
> guard's address on ARM. Quite a few changes have been made to the ARM
> part since last round of review so I think it makes more sense to
> review it anew. Ran bootstrap + regression testsuite + glibc build +
> glibc regression testsuite for Arm and Thumb-2 and bootstrap +
> regression testsuite for Thumb-1. GCC's regression testsuite was run
> in 3 configurations in all those cases:
>
> - default configuration (no RUNTESTFLAGS)
> - with -fstack-protector-all
> - with -fPIC -fstack-protector-all (to exercise both codepath in stack
> protector's split code)
>
> None of this show any regression beyond some new scan fail with
> -fstack-protector-all or -fPIC due to unexpected code sequence for the
> testcases concerned and some guality swing due to less optimization
> with new stack protector on.
>
> Patch description and ChangeLog below.
>
> In case of high register pressure in PIC mode, address of the stack
> protector's guard can be spilled on ARM targets as shown in PR85434,
> thus allowing an attacker to control what the canary would be compared
> against. ARM does lack stack_protect_set and stack_protect_test insn
> patterns, defining them does not help as the address is expanded
> regularly and the patterns only deal with the copy and test of the
> guard with the canary.
>
> This problem does not occur for x86 targets because the PIC access and
> the test can be done in the same instruction. Aarch64 is exempt too
> because PIC access insn pattern are mov of UNSPEC which prevents it from
> the second access in the epilogue being CSEd in cse_local pass with the
> first access in the prologue.
>
> The approach followed here is to create new "combined" set and test
> standard pattern names that take the unexpanded guard and do the set or
> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> to hide the individual instructions being generated to the compiler and
> split the pattern into generic load, compare and branch instruction
> after register allocator, therefore avoiding any spilling. This is here
> implemented for the ARM targets. For targets not implementing these new
> standard pattern names, the existing stack_protect_set and
> stack_protect_test pattern names are used.
>
> To be able to split PIC access after register allocation, the functions
> had to be augmented to force a new PIC register load and to control
> which register it loads into. This is because sharing the PIC register
> between prologue and epilogue could lead to spilling due to CSE again
> which an attacker could use to control what the canary gets compared
> against.
>
> ChangeLog entries are as follows:
>
> *** gcc/ChangeLog ***
>
> 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
> * target-insns.def (stack_protect_combined_set): Define new standard
> pattern name.
> (stack_protect_combined_test): Likewise.
> * cfgexpand.c (stack_protect_prologue): Try new
> stack_protect_combined_set pattern first.
> * function.c (stack_protect_epilogue): Try new
> stack_protect_combined_test pattern first.
> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> parameters to control which register to use as PIC register and force
> reloading PIC register respectively.  Insert in the stream of insns if
> possible.
> (legitimize_pic_address): Expose above new parameters in prototype and
> adapt recursive calls accordingly.  Use pic_reg if non null instead of
> cached one.
> (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> (arm_legitimize_address): Adapt to new legitimize_pic_address
> prototype.
> (thumb_legitimize_address): Likewise.
> (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> (thumb1_expand_prologue): Likewise.
> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> change.
> (arm_load_pic_register): Likewise.
> * config/arm/predicated.md (guard_addr_operand): New predicate.
> (guard_operand): New predicate.
> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> prototype change.
> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> prototype change.
> (stack_protect_combined_set): New expander..
> (stack_protect_combined_set_insn): New insn_and_split pattern.
> (stack_protect_set_insn): New insn pattern.
> (stack_protect_combined_test): New expander.
> (stack_protect_combined_test_insn): New insn_and_split pattern.
> (arm_stack_protect_test_insn): New insn pattern.
> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> (UNSPEC_SP_TEST): Likewise.
> * doc/md.texi (stack_protect_combined_set): Document new standard
> pattern name.
> (stack_protect_set): Clarify that the operand for guard's address is
> legal.
> (stack_protect_combined_test): Document new standard pattern name.
> (stack_protect_test): Clarify that the operand for guard's address is
> legal.
>
> *** gcc/testsuite/ChangeLog ***
>
> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>
> * gcc.target/arm/pr85434.c: New test.
>
> Is this ok for trunk?
>
> Best regards,
>
> Thomas
> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
> >
> > Good thing I did, found a missing earlyclobber in the process.
> > Rerunning all tests again.
> >
> > Best regards,
> >
> > Thomas
> > On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> > >
> > > Please hold on for the reviews, found a small improvement that could
> > > be done. Am testing it right now, should have something by tonight or
> > > tomorrow.
> > >
> > > Best regards,
> > >
> > > Thomas
> > > On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> > > <thomas.preudhomme@linaro.org> wrote:
> > > >
> > > > [Removing Jeff Law since middle end code hasn't changed]
> > > >
> > > > Hi,
> > > >
> > > > Given how memory operand are reloaded even with an X constraint, I've
> > > > reworked the patch for the combined set and combined test instruction
> > > > ot keep the mem out of the match_operand and used an expander to
> > > > generate the right instruction pattern. I've also fixed some
> > > > longstanding issues with the patch when flag_pic is true and with
> > > > constraints for Thumb-1 that I hadn't noticed before due to using
> > > > dg-cmp-results in conjunction with test_summary which does not show
> > > > NA->FAIL (see [1]).
> > > >
> > > > All in all, I think the Arm code would do with a fresh review rather
> > > > than looking at the changes since last posted version. (unchanged)
> > > > ChangeLog entries are as follows:
> > > >
> > > > *** gcc/ChangeLog ***
> > > >
> > > > 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > >
> > > >     * target-insns.def (stack_protect_combined_set): Define new standard
> > > >     pattern name.
> > > >     (stack_protect_combined_test): Likewise.
> > > >     * cfgexpand.c (stack_protect_prologue): Try new
> > > >     stack_protect_combined_set pattern first.
> > > >     * function.c (stack_protect_epilogue): Try new
> > > >     stack_protect_combined_test pattern first.
> > > >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > >     parameters to control which register to use as PIC register and force
> > > >     reloading PIC register respectively.  Insert in the stream of insns if
> > > >     possible.
> > > >     (legitimize_pic_address): Expose above new parameters in prototype and
> > > >     adapt recursive calls accordingly.  Use pic_reg if non null instead of
> > > >     cached one.
> > > >     (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> > > >     (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > >     prototype.
> > > >     (thumb_legitimize_address): Likewise.
> > > >     (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> > > >     (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> > > >     (thumb1_expand_prologue): Likewise.
> > > >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > >     change.
> > > >     (arm_load_pic_register): Likewise.
> > > >     * config/arm/predicated.md (guard_addr_operand): New predicate.
> > > >     (guard_operand): New predicate.
> > > >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > >     prototype change.
> > > >     (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> > > >     prototype change.
> > > >     (stack_protect_combined_set): New expander..
> > > >     (stack_protect_combined_set_insn): New insn_and_split pattern.
> > > >     (stack_protect_set_insn): New insn pattern.
> > > >     (stack_protect_combined_test): New expander.
> > > >     (stack_protect_combined_test_insn): New insn_and_split pattern.
> > > >     (stack_protect_test_insn): New insn pattern.
> > > >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > >     (UNSPEC_SP_TEST): Likewise.
> > > >     * doc/md.texi (stack_protect_combined_set): Document new standard
> > > >     pattern name.
> > > >     (stack_protect_set): Clarify that the operand for guard's address is
> > > >     legal.
> > > >     (stack_protect_combined_test): Document new standard pattern name.
> > > >     (stack_protect_test): Clarify that the operand for guard's address is
> > > >     legal.
> > > >
> > > > *** gcc/testsuite/ChangeLog ***
> > > >
> > > > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > >
> > > >     * gcc.target/arm/pr85434.c: New test.
> > > >
> > > > Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> > > > with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> > > > -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> > > > performed for Arm and Thumb-2. Default flags show no regression and
> > > > the other runs have some expected scan-assembler failing (due to stack
> > > > protector or fPIC code sequence), as well as guality fail (due to less
> > > > optimized code with the new stack protector code) and some execution
> > > > failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> > > > due to the PIC sequence for the global variable making the frame
> > > > layout different for the 2 functions (these become PASS if making the
> > > > global variable static).
> > > >
> > > > Is this ok for trunk?
> > > >
> > > > Best regards,
> > > >
> > > > Thomas
> > > >
> > > > [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> > > >
> > > >
> > > > On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> > > > <kyrylo.tkachov@foss.arm.com> wrote:
> > > > >
> > > > > Hi Thomas,
> > > > >
> > > > > On 29/08/18 10:51, Thomas Preudhomme wrote:
> > > > > > Resend hopefully without HTML this time.
> > > > > >
> > > > > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > > > > > <thomas.preudhomme@linaro.org> wrote:
> > > > > >> Hi,
> > > > > >>
> > > > > >> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > > > > >>
> > > > > >>
> > > > > >> In case of high register pressure in PIC mode, address of the stack
> > > > > >> protector's guard can be spilled on ARM targets as shown in PR85434,
> > > > > >> thus allowing an attacker to control what the canary would be compared
> > > > > >> against. ARM does lack stack_protect_set and stack_protect_test insn
> > > > > >> patterns, defining them does not help as the address is expanded
> > > > > >> regularly and the patterns only deal with the copy and test of the
> > > > > >> guard with the canary.
> > > > > >>
> > > > > >> This problem does not occur for x86 targets because the PIC access and
> > > > > >> the test can be done in the same instruction. Aarch64 is exempt too
> > > > > >> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > > > > >> the second access in the epilogue being CSEd in cse_local pass with the
> > > > > >> first access in the prologue.
> > > > > >>
> > > > > >> The approach followed here is to create new "combined" set and test
> > > > > >> standard pattern names that take the unexpanded guard and do the set or
> > > > > >> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > > > > >> to hide the individual instructions being generated to the compiler and
> > > > > >> split the pattern into generic load, compare and branch instruction
> > > > > >> after register allocator, therefore avoiding any spilling. This is here
> > > > > >> implemented for the ARM targets. For targets not implementing these new
> > > > > >> standard pattern names, the existing stack_protect_set and
> > > > > >> stack_protect_test pattern names are used.
> > > > > >>
> > > > > >> To be able to split PIC access after register allocation, the functions
> > > > > >> had to be augmented to force a new PIC register load and to control
> > > > > >> which register it loads into. This is because sharing the PIC register
> > > > > >> between prologue and epilogue could lead to spilling due to CSE again
> > > > > >> which an attacker could use to control what the canary gets compared
> > > > > >> against.
> > > > > >>
> > > > > >> ChangeLog entries are as follows:
> > > > > >>
> > > > > >> *** gcc/ChangeLog ***
> > > > > >>
> > > > > >> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > > >>
> > > > > >>      * target-insns.def (stack_protect_combined_set): Define new standard
> > > > > >>      pattern name.
> > > > > >>      (stack_protect_combined_test): Likewise.
> > > > > >>      * cfgexpand.c (stack_protect_prologue): Try new
> > > > > >>      stack_protect_combined_set pattern first.
> > > > > >>      * function.c (stack_protect_epilogue): Try new
> > > > > >>      stack_protect_combined_test pattern first.
> > > > > >>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > > > >>      parameters to control which register to use as PIC register and force
> > > > > >>      reloading PIC register respectively.  Insert in the stream of insns if
> > > > > >>      possible.
> > > > > >>      (legitimize_pic_address): Expose above new parameters in prototype and
> > > > > >>      adapt recursive calls accordingly.
> > > > > >>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > > > >>      prototype.
> > > > > >>      (thumb_legitimize_address): Likewise.
> > > > > >>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > > > > >>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > > > >>      change.
> > > > > >>      * config/arm/predicated.md (guard_operand): New predicate.
> > > > >
> > > > > Typo, predicates.md is the filename.
> > > > >
> > > > > Looks ok to me otherwise.
> > > > > Thank you for your patience.
> > > > >
> > > > > Kyrill
> > > > >
> > > > > >>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > > > >>      prototype change.
> > > > > >>      (stack_protect_combined_set): New insn_and_split pattern.
> > > > > >>      (stack_protect_set): New insn pattern.
> > > > > >>      (stack_protect_combined_test): New insn_and_split pattern.
> > > > > >>      (stack_protect_test): New insn pattern.
> > > > > >>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > > > >>      (UNSPEC_SP_TEST): Likewise.
> > > > > >>      * doc/md.texi (stack_protect_combined_set): Document new standard
> > > > > >>      pattern name.
> > > > > >>      (stack_protect_set): Clarify that the operand for guard's address is
> > > > > >>      legal.
> > > > > >>      (stack_protect_combined_test): Document new standard pattern name.
> > > > > >>      (stack_protect_test): Clarify that the operand for guard's address is
> > > > > >>      legal.
> > > > > >>
> > > > > >> *** gcc/testsuite/ChangeLog ***
> > > > > >>
> > > > > >> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > > >>
> > > > > >>      * gcc.target/arm/pr85434.c: New test.
> > > > >
> > > > > >>
> > > > > >> Testing:
> > > > > >>
> > > > > >> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > > > > >> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > > > > >> cross ARM Linux: build + testsuite -> no regression
> > > > > >> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > > > >> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > > > >> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > > > > >>
> > > > > >> Is this ok for trunk?
> > > > > >>
> > > > > >> Best regards,
> > > > > >>
> > > > > >> Thomas
> > > > >

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping2] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-01 16:03             ` [PATCH, ARM, ping] " Thomas Preudhomme
@ 2018-11-08  9:53               ` Thomas Preudhomme
  2018-11-08 15:53                 ` Kyrill Tkachov
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-08  9:53 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 18773 bytes --]

Ping?

Best regards,

Thomas

On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Ping?
>
> Best regards,
>
> Thomas
> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
> >
> > Hi,
> >
> > Please find updated patch to fix PR85434: spilling of stack protector
> > guard's address on ARM. Quite a few changes have been made to the ARM
> > part since last round of review so I think it makes more sense to
> > review it anew. Ran bootstrap + regression testsuite + glibc build +
> > glibc regression testsuite for Arm and Thumb-2 and bootstrap +
> > regression testsuite for Thumb-1. GCC's regression testsuite was run
> > in 3 configurations in all those cases:
> >
> > - default configuration (no RUNTESTFLAGS)
> > - with -fstack-protector-all
> > - with -fPIC -fstack-protector-all (to exercise both codepath in stack
> > protector's split code)
> >
> > None of this show any regression beyond some new scan fail with
> > -fstack-protector-all or -fPIC due to unexpected code sequence for the
> > testcases concerned and some guality swing due to less optimization
> > with new stack protector on.
> >
> > Patch description and ChangeLog below.
> >
> > In case of high register pressure in PIC mode, address of the stack
> > protector's guard can be spilled on ARM targets as shown in PR85434,
> > thus allowing an attacker to control what the canary would be compared
> > against. ARM does lack stack_protect_set and stack_protect_test insn
> > patterns, defining them does not help as the address is expanded
> > regularly and the patterns only deal with the copy and test of the
> > guard with the canary.
> >
> > This problem does not occur for x86 targets because the PIC access and
> > the test can be done in the same instruction. Aarch64 is exempt too
> > because PIC access insn pattern are mov of UNSPEC which prevents it from
> > the second access in the epilogue being CSEd in cse_local pass with the
> > first access in the prologue.
> >
> > The approach followed here is to create new "combined" set and test
> > standard pattern names that take the unexpanded guard and do the set or
> > test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > to hide the individual instructions being generated to the compiler and
> > split the pattern into generic load, compare and branch instruction
> > after register allocator, therefore avoiding any spilling. This is here
> > implemented for the ARM targets. For targets not implementing these new
> > standard pattern names, the existing stack_protect_set and
> > stack_protect_test pattern names are used.
> >
> > To be able to split PIC access after register allocation, the functions
> > had to be augmented to force a new PIC register load and to control
> > which register it loads into. This is because sharing the PIC register
> > between prologue and epilogue could lead to spilling due to CSE again
> > which an attacker could use to control what the canary gets compared
> > against.
> >
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >
> > * target-insns.def (stack_protect_combined_set): Define new standard
> > pattern name.
> > (stack_protect_combined_test): Likewise.
> > * cfgexpand.c (stack_protect_prologue): Try new
> > stack_protect_combined_set pattern first.
> > * function.c (stack_protect_epilogue): Try new
> > stack_protect_combined_test pattern first.
> > * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > parameters to control which register to use as PIC register and force
> > reloading PIC register respectively.  Insert in the stream of insns if
> > possible.
> > (legitimize_pic_address): Expose above new parameters in prototype and
> > adapt recursive calls accordingly.  Use pic_reg if non null instead of
> > cached one.
> > (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> > (arm_legitimize_address): Adapt to new legitimize_pic_address
> > prototype.
> > (thumb_legitimize_address): Likewise.
> > (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> > (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> > (thumb1_expand_prologue): Likewise.
> > * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > change.
> > (arm_load_pic_register): Likewise.
> > * config/arm/predicated.md (guard_addr_operand): New predicate.
> > (guard_operand): New predicate.
> > * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > prototype change.
> > (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> > prototype change.
> > (stack_protect_combined_set): New expander..
> > (stack_protect_combined_set_insn): New insn_and_split pattern.
> > (stack_protect_set_insn): New insn pattern.
> > (stack_protect_combined_test): New expander.
> > (stack_protect_combined_test_insn): New insn_and_split pattern.
> > (arm_stack_protect_test_insn): New insn pattern.
> > * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
> > * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > (UNSPEC_SP_TEST): Likewise.
> > * doc/md.texi (stack_protect_combined_set): Document new standard
> > pattern name.
> > (stack_protect_set): Clarify that the operand for guard's address is
> > legal.
> > (stack_protect_combined_test): Document new standard pattern name.
> > (stack_protect_test): Clarify that the operand for guard's address is
> > legal.
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >
> > * gcc.target/arm/pr85434.c: New test.
> >
> > Is this ok for trunk?
> >
> > Best regards,
> >
> > Thomas
> > On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> > >
> > > Good thing I did, found a missing earlyclobber in the process.
> > > Rerunning all tests again.
> > >
> > > Best regards,
> > >
> > > Thomas
> > > On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
> > > <thomas.preudhomme@linaro.org> wrote:
> > > >
> > > > Please hold on for the reviews, found a small improvement that could
> > > > be done. Am testing it right now, should have something by tonight or
> > > > tomorrow.
> > > >
> > > > Best regards,
> > > >
> > > > Thomas
> > > > On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> > > > <thomas.preudhomme@linaro.org> wrote:
> > > > >
> > > > > [Removing Jeff Law since middle end code hasn't changed]
> > > > >
> > > > > Hi,
> > > > >
> > > > > Given how memory operand are reloaded even with an X constraint, I've
> > > > > reworked the patch for the combined set and combined test instruction
> > > > > ot keep the mem out of the match_operand and used an expander to
> > > > > generate the right instruction pattern. I've also fixed some
> > > > > longstanding issues with the patch when flag_pic is true and with
> > > > > constraints for Thumb-1 that I hadn't noticed before due to using
> > > > > dg-cmp-results in conjunction with test_summary which does not show
> > > > > NA->FAIL (see [1]).
> > > > >
> > > > > All in all, I think the Arm code would do with a fresh review rather
> > > > > than looking at the changes since last posted version. (unchanged)
> > > > > ChangeLog entries are as follows:
> > > > >
> > > > > *** gcc/ChangeLog ***
> > > > >
> > > > > 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > >
> > > > >     * target-insns.def (stack_protect_combined_set): Define new standard
> > > > >     pattern name.
> > > > >     (stack_protect_combined_test): Likewise.
> > > > >     * cfgexpand.c (stack_protect_prologue): Try new
> > > > >     stack_protect_combined_set pattern first.
> > > > >     * function.c (stack_protect_epilogue): Try new
> > > > >     stack_protect_combined_test pattern first.
> > > > >     * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > > >     parameters to control which register to use as PIC register and force
> > > > >     reloading PIC register respectively.  Insert in the stream of insns if
> > > > >     possible.
> > > > >     (legitimize_pic_address): Expose above new parameters in prototype and
> > > > >     adapt recursive calls accordingly.  Use pic_reg if non null instead of
> > > > >     cached one.
> > > > >     (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> > > > >     (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > > >     prototype.
> > > > >     (thumb_legitimize_address): Likewise.
> > > > >     (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> > > > >     (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> > > > >     (thumb1_expand_prologue): Likewise.
> > > > >     * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > > >     change.
> > > > >     (arm_load_pic_register): Likewise.
> > > > >     * config/arm/predicated.md (guard_addr_operand): New predicate.
> > > > >     (guard_operand): New predicate.
> > > > >     * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > > >     prototype change.
> > > > >     (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> > > > >     prototype change.
> > > > >     (stack_protect_combined_set): New expander..
> > > > >     (stack_protect_combined_set_insn): New insn_and_split pattern.
> > > > >     (stack_protect_set_insn): New insn pattern.
> > > > >     (stack_protect_combined_test): New expander.
> > > > >     (stack_protect_combined_test_insn): New insn_and_split pattern.
> > > > >     (stack_protect_test_insn): New insn pattern.
> > > > >     * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > > >     (UNSPEC_SP_TEST): Likewise.
> > > > >     * doc/md.texi (stack_protect_combined_set): Document new standard
> > > > >     pattern name.
> > > > >     (stack_protect_set): Clarify that the operand for guard's address is
> > > > >     legal.
> > > > >     (stack_protect_combined_test): Document new standard pattern name.
> > > > >     (stack_protect_test): Clarify that the operand for guard's address is
> > > > >     legal.
> > > > >
> > > > > *** gcc/testsuite/ChangeLog ***
> > > > >
> > > > > 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > >
> > > > >     * gcc.target/arm/pr85434.c: New test.
> > > > >
> > > > > Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> > > > > with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> > > > > -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> > > > > performed for Arm and Thumb-2. Default flags show no regression and
> > > > > the other runs have some expected scan-assembler failing (due to stack
> > > > > protector or fPIC code sequence), as well as guality fail (due to less
> > > > > optimized code with the new stack protector code) and some execution
> > > > > failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> > > > > due to the PIC sequence for the global variable making the frame
> > > > > layout different for the 2 functions (these become PASS if making the
> > > > > global variable static).
> > > > >
> > > > > Is this ok for trunk?
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Thomas
> > > > >
> > > > > [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> > > > >
> > > > >
> > > > > On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> > > > > <kyrylo.tkachov@foss.arm.com> wrote:
> > > > > >
> > > > > > Hi Thomas,
> > > > > >
> > > > > > On 29/08/18 10:51, Thomas Preudhomme wrote:
> > > > > > > Resend hopefully without HTML this time.
> > > > > > >
> > > > > > > On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > > > > > > <thomas.preudhomme@linaro.org> wrote:
> > > > > > >> Hi,
> > > > > > >>
> > > > > > >> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > > > > > >>
> > > > > > >>
> > > > > > >> In case of high register pressure in PIC mode, address of the stack
> > > > > > >> protector's guard can be spilled on ARM targets as shown in PR85434,
> > > > > > >> thus allowing an attacker to control what the canary would be compared
> > > > > > >> against. ARM does lack stack_protect_set and stack_protect_test insn
> > > > > > >> patterns, defining them does not help as the address is expanded
> > > > > > >> regularly and the patterns only deal with the copy and test of the
> > > > > > >> guard with the canary.
> > > > > > >>
> > > > > > >> This problem does not occur for x86 targets because the PIC access and
> > > > > > >> the test can be done in the same instruction. Aarch64 is exempt too
> > > > > > >> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > > > > > >> the second access in the epilogue being CSEd in cse_local pass with the
> > > > > > >> first access in the prologue.
> > > > > > >>
> > > > > > >> The approach followed here is to create new "combined" set and test
> > > > > > >> standard pattern names that take the unexpanded guard and do the set or
> > > > > > >> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > > > > > >> to hide the individual instructions being generated to the compiler and
> > > > > > >> split the pattern into generic load, compare and branch instruction
> > > > > > >> after register allocator, therefore avoiding any spilling. This is here
> > > > > > >> implemented for the ARM targets. For targets not implementing these new
> > > > > > >> standard pattern names, the existing stack_protect_set and
> > > > > > >> stack_protect_test pattern names are used.
> > > > > > >>
> > > > > > >> To be able to split PIC access after register allocation, the functions
> > > > > > >> had to be augmented to force a new PIC register load and to control
> > > > > > >> which register it loads into. This is because sharing the PIC register
> > > > > > >> between prologue and epilogue could lead to spilling due to CSE again
> > > > > > >> which an attacker could use to control what the canary gets compared
> > > > > > >> against.
> > > > > > >>
> > > > > > >> ChangeLog entries are as follows:
> > > > > > >>
> > > > > > >> *** gcc/ChangeLog ***
> > > > > > >>
> > > > > > >> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > > > >>
> > > > > > >>      * target-insns.def (stack_protect_combined_set): Define new standard
> > > > > > >>      pattern name.
> > > > > > >>      (stack_protect_combined_test): Likewise.
> > > > > > >>      * cfgexpand.c (stack_protect_prologue): Try new
> > > > > > >>      stack_protect_combined_set pattern first.
> > > > > > >>      * function.c (stack_protect_epilogue): Try new
> > > > > > >>      stack_protect_combined_test pattern first.
> > > > > > >>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > > > > > >>      parameters to control which register to use as PIC register and force
> > > > > > >>      reloading PIC register respectively.  Insert in the stream of insns if
> > > > > > >>      possible.
> > > > > > >>      (legitimize_pic_address): Expose above new parameters in prototype and
> > > > > > >>      adapt recursive calls accordingly.
> > > > > > >>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> > > > > > >>      prototype.
> > > > > > >>      (thumb_legitimize_address): Likewise.
> > > > > > >>      (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > > > > > >>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > > > > > >>      change.
> > > > > > >>      * config/arm/predicated.md (guard_operand): New predicate.
> > > > > >
> > > > > > Typo, predicates.md is the filename.
> > > > > >
> > > > > > Looks ok to me otherwise.
> > > > > > Thank you for your patience.
> > > > > >
> > > > > > Kyrill
> > > > > >
> > > > > > >>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > > > > > >>      prototype change.
> > > > > > >>      (stack_protect_combined_set): New insn_and_split pattern.
> > > > > > >>      (stack_protect_set): New insn pattern.
> > > > > > >>      (stack_protect_combined_test): New insn_and_split pattern.
> > > > > > >>      (stack_protect_test): New insn pattern.
> > > > > > >>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > > > > > >>      (UNSPEC_SP_TEST): Likewise.
> > > > > > >>      * doc/md.texi (stack_protect_combined_set): Document new standard
> > > > > > >>      pattern name.
> > > > > > >>      (stack_protect_set): Clarify that the operand for guard's address is
> > > > > > >>      legal.
> > > > > > >>      (stack_protect_combined_test): Document new standard pattern name.
> > > > > > >>      (stack_protect_test): Clarify that the operand for guard's address is
> > > > > > >>      legal.
> > > > > > >>
> > > > > > >> *** gcc/testsuite/ChangeLog ***
> > > > > > >>
> > > > > > >> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > > > > > >>
> > > > > > >>      * gcc.target/arm/pr85434.c: New test.
> > > > > >
> > > > > > >>
> > > > > > >> Testing:
> > > > > > >>
> > > > > > >> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > > > > > >> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > > > > > >> cross ARM Linux: build + testsuite -> no regression
> > > > > > >> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > > > > >> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > > > > > >> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > > > > > >>
> > > > > > >> Is this ok for trunk?
> > > > > > >>
> > > > > > >> Best regards,
> > > > > > >>
> > > > > > >> Thomas
> > > > > >

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 35675 bytes --]

From a2dba2bf283c3a7f5a11cf28a2b16b789c66a592 Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.  Use pic_reg if non null instead of
	cached one.
	(arm_load_pic_register): Add pic_reg parameter and use it if non null.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to require_pic_register prototype change.
	(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
	(thumb1_expand_prologue): Likewise.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	(arm_load_pic_register): Likewise.
	* config/arm/predicated.md (guard_addr_operand): New predicate.
	(guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
	prototype change.
	(stack_protect_combined_set): New expander..
	(stack_protect_combined_set_insn): New insn_and_split pattern.
	(stack_protect_set_insn): New insn pattern.
	(stack_protect_combined_test): New expander.
	(stack_protect_combined_test_insn): New insn_and_split pattern.
	(arm_stack_protect_test_insn): New insn pattern.
	* config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   4 +-
 gcc/config/arm/arm.c                   |  87 ++++++++---
 gcc/config/arm/arm.md                  | 162 +++++++++++++++++++-
 gcc/config/arm/predicates.md           |  17 +++
 gcc/config/arm/thumb1.md               |  13 ++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 11 files changed, 549 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..763941868d2 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
   return 1;
 }
 
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
 	{
 	  rtx_insn *seq, *insn;
 
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
 
 	      seq = get_insns ();
 	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
 
 	  /* Make the MEM as close to a constant as possible.  */
 	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
    low register.  */
 
 void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
 
   if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
     return;
 
   gcc_assert (flag_pic);
 
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
   if (TARGET_VXWORKS_RTP)
     {
       pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
       mask &= THUMB2_WORK_REGS;
       if (!IS_NESTED (func_type))
 	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, 0);
     }
 
   /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
   /* Load the pic register before setting the frame pointer,
      so we can use r7 as a temporary work register.  */
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, 0);
 
   if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
     emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..af2414cb0ff 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -6309,7 +6310,7 @@
   /* r3 is clobbered by set/longjmp, so we can use it as a scratch
      register.  */
   if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, 0);
   DONE;
 }")
 
@@ -8634,6 +8635,163 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "clob,nocond")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t1,32")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  if (TARGET_32BIT)
+    {
+      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
+						  operands[3]));
+      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
+    }
+  else
+    {
+      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
+						     operands[3]));
+      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
+				      operands[2]));
+    }
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "arm_stack_protect_test_insn"
+  [(set (reg:CC_Z CC_REGNUM)
+	(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
+				  (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
+				 UNSPEC_SP_TEST)
+		      (const_int 0)))
+   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
+   (clobber (match_dup 2))]
+  "TARGET_32BIT"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8,12")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,32")]
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..69718ee9c7a 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,23 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard's address in
+; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
+(define_predicate "guard_addr_operand"
+  (match_test "true")
+{
+  return (CONSTANT_ADDRESS_P (op)
+	  || !targetm.cannot_force_const_mem (mode, op));
+})
+
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  return guard_addr_operand (XEXP (op, 0), mode);
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 19dcdbcdd73..cd199c9c529 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1962,4 +1962,17 @@
   }"
   [(set_attr "type" "mov_reg")]
 )
+
+(define_insn "thumb1_stack_protect_test_insn"
+  [(set (match_operand:SI 0 "register_operand" "=&l")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "+l"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  "TARGET_THUMB1"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8")
+   (set_attr "conds" "set")
+   (set_attr "type" "multiple")]
+)
 \f
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..0667a242ef3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7424,22 +7424,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 302438323c8..65f34db0651 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = 0;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping2] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-08  9:53               ` [PATCH, ARM, ping2] " Thomas Preudhomme
@ 2018-11-08 15:53                 ` Kyrill Tkachov
  2018-11-10 15:07                   ` Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2018-11-08 15:53 UTC (permalink / raw)
  To: Thomas Preudhomme; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

Hi Thomas,

On 08/11/18 09:52, Thomas Preudhomme wrote:
> Ping?
>
> Best regards,
>
> Thomas
>
> On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
>> Ping?
>>
>> Best regards,
>>
>> Thomas
>> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
>> <thomas.preudhomme@linaro.org> wrote:
>>> Hi,
>>>
>>> Please find updated patch to fix PR85434: spilling of stack protector
>>> guard's address on ARM. Quite a few changes have been made to the ARM
>>> part since last round of review so I think it makes more sense to
>>> review it anew. Ran bootstrap + regression testsuite + glibc build +
>>> glibc regression testsuite for Arm and Thumb-2 and bootstrap +
>>> regression testsuite for Thumb-1. GCC's regression testsuite was run
>>> in 3 configurations in all those cases:
>>>
>>> - default configuration (no RUNTESTFLAGS)
>>> - with -fstack-protector-all
>>> - with -fPIC -fstack-protector-all (to exercise both codepath in stack
>>> protector's split code)
>>>
>>> None of this show any regression beyond some new scan fail with
>>> -fstack-protector-all or -fPIC due to unexpected code sequence for the
>>> testcases concerned and some guality swing due to less optimization
>>> with new stack protector on.
>>>
>>> Patch description and ChangeLog below.
>>>
>>> In case of high register pressure in PIC mode, address of the stack
>>> protector's guard can be spilled on ARM targets as shown in PR85434,
>>> thus allowing an attacker to control what the canary would be compared
>>> against. ARM does lack stack_protect_set and stack_protect_test insn
>>> patterns, defining them does not help as the address is expanded
>>> regularly and the patterns only deal with the copy and test of the
>>> guard with the canary.
>>>
>>> This problem does not occur for x86 targets because the PIC access and
>>> the test can be done in the same instruction. Aarch64 is exempt too
>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
>>> the second access in the epilogue being CSEd in cse_local pass with the
>>> first access in the prologue.
>>>
>>> The approach followed here is to create new "combined" set and test
>>> standard pattern names that take the unexpanded guard and do the set or
>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
>>> to hide the individual instructions being generated to the compiler and
>>> split the pattern into generic load, compare and branch instruction
>>> after register allocator, therefore avoiding any spilling. This is here
>>> implemented for the ARM targets. For targets not implementing these new
>>> standard pattern names, the existing stack_protect_set and
>>> stack_protect_test pattern names are used.
>>>
>>> To be able to split PIC access after register allocation, the functions
>>> had to be augmented to force a new PIC register load and to control
>>> which register it loads into. This is because sharing the PIC register
>>> between prologue and epilogue could lead to spilling due to CSE again
>>> which an attacker could use to control what the canary gets compared
>>> against.
>>>
>>> ChangeLog entries are as follows:
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>
>>> * target-insns.def (stack_protect_combined_set): Define new standard
>>> pattern name.
>>> (stack_protect_combined_test): Likewise.
>>> * cfgexpand.c (stack_protect_prologue): Try new
>>> stack_protect_combined_set pattern first.
>>> * function.c (stack_protect_epilogue): Try new
>>> stack_protect_combined_test pattern first.
>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>> parameters to control which register to use as PIC register and force
>>> reloading PIC register respectively.  Insert in the stream of insns if
>>> possible.
>>> (legitimize_pic_address): Expose above new parameters in prototype and
>>> adapt recursive calls accordingly.  Use pic_reg if non null instead of
>>> cached one.
>>> (arm_load_pic_register): Add pic_reg parameter and use it if non null.
>>> (arm_legitimize_address): Adapt to new legitimize_pic_address
>>> prototype.
>>> (thumb_legitimize_address): Likewise.
>>> (arm_emit_call_insn): Adapt to require_pic_register prototype change.
>>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
>>> (thumb1_expand_prologue): Likewise.
>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>> change.
>>> (arm_load_pic_register): Likewise.
>>> * config/arm/predicated.md (guard_addr_operand): New predicate.
>>> (guard_operand): New predicate.
>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>> prototype change.
>>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
>>> prototype change.
>>> (stack_protect_combined_set): New expander..
>>> (stack_protect_combined_set_insn): New insn_and_split pattern.
>>> (stack_protect_set_insn): New insn pattern.
>>> (stack_protect_combined_test): New expander.
>>> (stack_protect_combined_test_insn): New insn_and_split pattern.
>>> (arm_stack_protect_test_insn): New insn pattern.
>>> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>> (UNSPEC_SP_TEST): Likewise.
>>> * doc/md.texi (stack_protect_combined_set): Document new standard
>>> pattern name.
>>> (stack_protect_set): Clarify that the operand for guard's address is
>>> legal.
>>> (stack_protect_combined_test): Document new standard pattern name.
>>> (stack_protect_test): Clarify that the operand for guard's address is
>>> legal.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>
>>> * gcc.target/arm/pr85434.c: New test.
>>>
>>> Is this ok for trunk?
>>>
>>> Best regards,
>>>
>>> Thomas
>>> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
>>> <thomas.preudhomme@linaro.org> wrote:
>>>> Good thing I did, found a missing earlyclobber in the process.
>>>> Rerunning all tests again.
>>>>
>>>> Best regards,
>>>>
>>>> Thomas
>>>> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>> Please hold on for the reviews, found a small improvement that could
>>>>> be done. Am testing it right now, should have something by tonight or
>>>>> tomorrow.
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Thomas
>>>>> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>> [Removing Jeff Law since middle end code hasn't changed]
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Given how memory operand are reloaded even with an X constraint, I've
>>>>>> reworked the patch for the combined set and combined test instruction
>>>>>> ot keep the mem out of the match_operand and used an expander to
>>>>>> generate the right instruction pattern. I've also fixed some
>>>>>> longstanding issues with the patch when flag_pic is true and with
>>>>>> constraints for Thumb-1 that I hadn't noticed before due to using
>>>>>> dg-cmp-results in conjunction with test_summary which does not show
>>>>>> NA->FAIL (see [1]).
>>>>>>
>>>>>> All in all, I think the Arm code would do with a fresh review rather
>>>>>> than looking at the changes since last posted version. (unchanged)
>>>>>> ChangeLog entries are as follows:
>>>>>>
>>>>>> *** gcc/ChangeLog ***
>>>>>>
>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>
>>>>>>      * target-insns.def (stack_protect_combined_set): Define new standard
>>>>>>      pattern name.
>>>>>>      (stack_protect_combined_test): Likewise.
>>>>>>      * cfgexpand.c (stack_protect_prologue): Try new
>>>>>>      stack_protect_combined_set pattern first.
>>>>>>      * function.c (stack_protect_epilogue): Try new
>>>>>>      stack_protect_combined_test pattern first.
>>>>>>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>>>>>      parameters to control which register to use as PIC register and force
>>>>>>      reloading PIC register respectively.  Insert in the stream of insns if
>>>>>>      possible.
>>>>>>      (legitimize_pic_address): Expose above new parameters in prototype and
>>>>>>      adapt recursive calls accordingly.  Use pic_reg if non null instead of
>>>>>>      cached one.
>>>>>>      (arm_load_pic_register): Add pic_reg parameter and use it if non null.
>>>>>>      (arm_legitimize_address): Adapt to new legitimize_pic_address
>>>>>>      prototype.
>>>>>>      (thumb_legitimize_address): Likewise.
>>>>>>      (arm_emit_call_insn): Adapt to require_pic_register prototype change.
>>>>>>      (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
>>>>>>      (thumb1_expand_prologue): Likewise.
>>>>>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>>>>>      change.
>>>>>>      (arm_load_pic_register): Likewise.
>>>>>>      * config/arm/predicated.md (guard_addr_operand): New predicate.
>>>>>>      (guard_operand): New predicate.
>>>>>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>>>>>      prototype change.
>>>>>>      (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
>>>>>>      prototype change.
>>>>>>      (stack_protect_combined_set): New expander..
>>>>>>      (stack_protect_combined_set_insn): New insn_and_split pattern.
>>>>>>      (stack_protect_set_insn): New insn pattern.
>>>>>>      (stack_protect_combined_test): New expander.
>>>>>>      (stack_protect_combined_test_insn): New insn_and_split pattern.
>>>>>>      (stack_protect_test_insn): New insn pattern.
>>>>>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>>>>>      (UNSPEC_SP_TEST): Likewise.
>>>>>>      * doc/md.texi (stack_protect_combined_set): Document new standard
>>>>>>      pattern name.
>>>>>>      (stack_protect_set): Clarify that the operand for guard's address is
>>>>>>      legal.
>>>>>>      (stack_protect_combined_test): Document new standard pattern name.
>>>>>>      (stack_protect_test): Clarify that the operand for guard's address is
>>>>>>      legal.
>>>>>>
>>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>>
>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>
>>>>>>      * gcc.target/arm/pr85434.c: New test.
>>>>>>
>>>>>> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
>>>>>> with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
>>>>>> -fPIC -fstack-protect-all. A glibc build and testsuite run was also
>>>>>> performed for Arm and Thumb-2. Default flags show no regression and
>>>>>> the other runs have some expected scan-assembler failing (due to stack
>>>>>> protector or fPIC code sequence), as well as guality fail (due to less
>>>>>> optimized code with the new stack protector code) and some execution
>>>>>> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
>>>>>> due to the PIC sequence for the global variable making the frame
>>>>>> layout different for the 2 functions (these become PASS if making the
>>>>>> global variable static).
>>>>>>
>>>>>> Is this ok for trunk?
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Thomas
>>>>>>
>>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
>>>>>>
>>>>>>
>>>>>> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
>>>>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>>>>> Hi Thomas,
>>>>>>>
>>>>>>> On 29/08/18 10:51, Thomas Preudhomme wrote:
>>>>>>>> Resend hopefully without HTML this time.
>>>>>>>>
>>>>>>>> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
>>>>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> In case of high register pressure in PIC mode, address of the stack
>>>>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
>>>>>>>>> thus allowing an attacker to control what the canary would be compared
>>>>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
>>>>>>>>> patterns, defining them does not help as the address is expanded
>>>>>>>>> regularly and the patterns only deal with the copy and test of the
>>>>>>>>> guard with the canary.
>>>>>>>>>
>>>>>>>>> This problem does not occur for x86 targets because the PIC access and
>>>>>>>>> the test can be done in the same instruction. Aarch64 is exempt too
>>>>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
>>>>>>>>> the second access in the epilogue being CSEd in cse_local pass with the
>>>>>>>>> first access in the prologue.
>>>>>>>>>
>>>>>>>>> The approach followed here is to create new "combined" set and test
>>>>>>>>> standard pattern names that take the unexpanded guard and do the set or
>>>>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
>>>>>>>>> to hide the individual instructions being generated to the compiler and
>>>>>>>>> split the pattern into generic load, compare and branch instruction
>>>>>>>>> after register allocator, therefore avoiding any spilling. This is here
>>>>>>>>> implemented for the ARM targets. For targets not implementing these new
>>>>>>>>> standard pattern names, the existing stack_protect_set and
>>>>>>>>> stack_protect_test pattern names are used.
>>>>>>>>>
>>>>>>>>> To be able to split PIC access after register allocation, the functions
>>>>>>>>> had to be augmented to force a new PIC register load and to control
>>>>>>>>> which register it loads into. This is because sharing the PIC register
>>>>>>>>> between prologue and epilogue could lead to spilling due to CSE again
>>>>>>>>> which an attacker could use to control what the canary gets compared
>>>>>>>>> against.
>>>>>>>>>
>>>>>>>>> ChangeLog entries are as follows:
>>>>>>>>>
>>>>>>>>> *** gcc/ChangeLog ***
>>>>>>>>>
>>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>>>>
>>>>>>>>>       * target-insns.def (stack_protect_combined_set): Define new standard
>>>>>>>>>       pattern name.
>>>>>>>>>       (stack_protect_combined_test): Likewise.
>>>>>>>>>       * cfgexpand.c (stack_protect_prologue): Try new
>>>>>>>>>       stack_protect_combined_set pattern first.
>>>>>>>>>       * function.c (stack_protect_epilogue): Try new
>>>>>>>>>       stack_protect_combined_test pattern first.
>>>>>>>>>       * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>>>>>>>>       parameters to control which register to use as PIC register and force
>>>>>>>>>       reloading PIC register respectively.  Insert in the stream of insns if
>>>>>>>>>       possible.
>>>>>>>>>       (legitimize_pic_address): Expose above new parameters in prototype and
>>>>>>>>>       adapt recursive calls accordingly.
>>>>>>>>>       (arm_legitimize_address): Adapt to new legitimize_pic_address
>>>>>>>>>       prototype.
>>>>>>>>>       (thumb_legitimize_address): Likewise.
>>>>>>>>>       (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>>>>>>>>>       * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>>>>>>>>       change.
>>>>>>>>>       * config/arm/predicated.md (guard_operand): New predicate.
>>>>>>> Typo, predicates.md is the filename.
>>>>>>>
>>>>>>> Looks ok to me otherwise.
>>>>>>> Thank you for your patience.
>>>>>>>
>>>>>>> Kyrill
>>>>>>>
>>>>>>>>>       * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>>>>>>>>       prototype change.
>>>>>>>>>       (stack_protect_combined_set): New insn_and_split pattern.
>>>>>>>>>       (stack_protect_set): New insn pattern.
>>>>>>>>>       (stack_protect_combined_test): New insn_and_split pattern.
>>>>>>>>>       (stack_protect_test): New insn pattern.
>>>>>>>>>       * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>>>>>>>>       (UNSPEC_SP_TEST): Likewise.
>>>>>>>>>       * doc/md.texi (stack_protect_combined_set): Document new standard
>>>>>>>>>       pattern name.
>>>>>>>>>       (stack_protect_set): Clarify that the operand for guard's address is
>>>>>>>>>       legal.
>>>>>>>>>       (stack_protect_combined_test): Document new standard pattern name.
>>>>>>>>>       (stack_protect_test): Clarify that the operand for guard's address is
>>>>>>>>>       legal.
>>>>>>>>>
>>>>>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>>>>>
>>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>>>>
>>>>>>>>>       * gcc.target/arm/pr85434.c: New test.
>>>>>>>>> Testing:
>>>>>>>>>
>>>>>>>>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
>>>>>>>>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
>>>>>>>>> cross ARM Linux: build + testsuite -> no regression
>>>>>>>>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
>>>>>>>>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
>>>>>>>>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
>>>>>>>>>
>>>>>>>>> Is this ok for trunk?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Thomas


@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
        mask &= THUMB2_WORK_REGS;
        if (!IS_NESTED (func_type))
      mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, 0);



Please use NULL_RTX rather than 0 here and in the other occurrences in the patch.
At a glance the changes look ok, but I'll have a deeper look later.

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping2] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-08 15:53                 ` Kyrill Tkachov
@ 2018-11-10 15:07                   ` Thomas Preudhomme
  2018-11-16 14:57                     ` [PATCH, ARM, ping3] " Thomas Preudhomme
  0 siblings, 1 reply; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-10 15:07 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 19293 bytes --]

Thanks Kyrill.

Updated patch in attachment. Best regards,

Thomas
On Thu, 8 Nov 2018 at 15:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Thomas,
>
> On 08/11/18 09:52, Thomas Preudhomme wrote:
> > Ping?
> >
> > Best regards,
> >
> > Thomas
> >
> > On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> >> Ping?
> >>
> >> Best regards,
> >>
> >> Thomas
> >> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
> >> <thomas.preudhomme@linaro.org> wrote:
> >>> Hi,
> >>>
> >>> Please find updated patch to fix PR85434: spilling of stack protector
> >>> guard's address on ARM. Quite a few changes have been made to the ARM
> >>> part since last round of review so I think it makes more sense to
> >>> review it anew. Ran bootstrap + regression testsuite + glibc build +
> >>> glibc regression testsuite for Arm and Thumb-2 and bootstrap +
> >>> regression testsuite for Thumb-1. GCC's regression testsuite was run
> >>> in 3 configurations in all those cases:
> >>>
> >>> - default configuration (no RUNTESTFLAGS)
> >>> - with -fstack-protector-all
> >>> - with -fPIC -fstack-protector-all (to exercise both codepath in stack
> >>> protector's split code)
> >>>
> >>> None of this show any regression beyond some new scan fail with
> >>> -fstack-protector-all or -fPIC due to unexpected code sequence for the
> >>> testcases concerned and some guality swing due to less optimization
> >>> with new stack protector on.
> >>>
> >>> Patch description and ChangeLog below.
> >>>
> >>> In case of high register pressure in PIC mode, address of the stack
> >>> protector's guard can be spilled on ARM targets as shown in PR85434,
> >>> thus allowing an attacker to control what the canary would be compared
> >>> against. ARM does lack stack_protect_set and stack_protect_test insn
> >>> patterns, defining them does not help as the address is expanded
> >>> regularly and the patterns only deal with the copy and test of the
> >>> guard with the canary.
> >>>
> >>> This problem does not occur for x86 targets because the PIC access and
> >>> the test can be done in the same instruction. Aarch64 is exempt too
> >>> because PIC access insn pattern are mov of UNSPEC which prevents it from
> >>> the second access in the epilogue being CSEd in cse_local pass with the
> >>> first access in the prologue.
> >>>
> >>> The approach followed here is to create new "combined" set and test
> >>> standard pattern names that take the unexpanded guard and do the set or
> >>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> >>> to hide the individual instructions being generated to the compiler and
> >>> split the pattern into generic load, compare and branch instruction
> >>> after register allocator, therefore avoiding any spilling. This is here
> >>> implemented for the ARM targets. For targets not implementing these new
> >>> standard pattern names, the existing stack_protect_set and
> >>> stack_protect_test pattern names are used.
> >>>
> >>> To be able to split PIC access after register allocation, the functions
> >>> had to be augmented to force a new PIC register load and to control
> >>> which register it loads into. This is because sharing the PIC register
> >>> between prologue and epilogue could lead to spilling due to CSE again
> >>> which an attacker could use to control what the canary gets compared
> >>> against.
> >>>
> >>> ChangeLog entries are as follows:
> >>>
> >>> *** gcc/ChangeLog ***
> >>>
> >>> 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>
> >>> * target-insns.def (stack_protect_combined_set): Define new standard
> >>> pattern name.
> >>> (stack_protect_combined_test): Likewise.
> >>> * cfgexpand.c (stack_protect_prologue): Try new
> >>> stack_protect_combined_set pattern first.
> >>> * function.c (stack_protect_epilogue): Try new
> >>> stack_protect_combined_test pattern first.
> >>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>> parameters to control which register to use as PIC register and force
> >>> reloading PIC register respectively.  Insert in the stream of insns if
> >>> possible.
> >>> (legitimize_pic_address): Expose above new parameters in prototype and
> >>> adapt recursive calls accordingly.  Use pic_reg if non null instead of
> >>> cached one.
> >>> (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> >>> (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>> prototype.
> >>> (thumb_legitimize_address): Likewise.
> >>> (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> >>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> >>> (thumb1_expand_prologue): Likewise.
> >>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>> change.
> >>> (arm_load_pic_register): Likewise.
> >>> * config/arm/predicated.md (guard_addr_operand): New predicate.
> >>> (guard_operand): New predicate.
> >>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>> prototype change.
> >>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> >>> prototype change.
> >>> (stack_protect_combined_set): New expander..
> >>> (stack_protect_combined_set_insn): New insn_and_split pattern.
> >>> (stack_protect_set_insn): New insn pattern.
> >>> (stack_protect_combined_test): New expander.
> >>> (stack_protect_combined_test_insn): New insn_and_split pattern.
> >>> (arm_stack_protect_test_insn): New insn pattern.
> >>> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
> >>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>> (UNSPEC_SP_TEST): Likewise.
> >>> * doc/md.texi (stack_protect_combined_set): Document new standard
> >>> pattern name.
> >>> (stack_protect_set): Clarify that the operand for guard's address is
> >>> legal.
> >>> (stack_protect_combined_test): Document new standard pattern name.
> >>> (stack_protect_test): Clarify that the operand for guard's address is
> >>> legal.
> >>>
> >>> *** gcc/testsuite/ChangeLog ***
> >>>
> >>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>
> >>> * gcc.target/arm/pr85434.c: New test.
> >>>
> >>> Is this ok for trunk?
> >>>
> >>> Best regards,
> >>>
> >>> Thomas
> >>> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
> >>> <thomas.preudhomme@linaro.org> wrote:
> >>>> Good thing I did, found a missing earlyclobber in the process.
> >>>> Rerunning all tests again.
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Thomas
> >>>> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
> >>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>> Please hold on for the reviews, found a small improvement that could
> >>>>> be done. Am testing it right now, should have something by tonight or
> >>>>> tomorrow.
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Thomas
> >>>>> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> >>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>> [Removing Jeff Law since middle end code hasn't changed]
> >>>>>>
> >>>>>> Hi,
> >>>>>>
> >>>>>> Given how memory operand are reloaded even with an X constraint, I've
> >>>>>> reworked the patch for the combined set and combined test instruction
> >>>>>> ot keep the mem out of the match_operand and used an expander to
> >>>>>> generate the right instruction pattern. I've also fixed some
> >>>>>> longstanding issues with the patch when flag_pic is true and with
> >>>>>> constraints for Thumb-1 that I hadn't noticed before due to using
> >>>>>> dg-cmp-results in conjunction with test_summary which does not show
> >>>>>> NA->FAIL (see [1]).
> >>>>>>
> >>>>>> All in all, I think the Arm code would do with a fresh review rather
> >>>>>> than looking at the changes since last posted version. (unchanged)
> >>>>>> ChangeLog entries are as follows:
> >>>>>>
> >>>>>> *** gcc/ChangeLog ***
> >>>>>>
> >>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>
> >>>>>>      * target-insns.def (stack_protect_combined_set): Define new standard
> >>>>>>      pattern name.
> >>>>>>      (stack_protect_combined_test): Likewise.
> >>>>>>      * cfgexpand.c (stack_protect_prologue): Try new
> >>>>>>      stack_protect_combined_set pattern first.
> >>>>>>      * function.c (stack_protect_epilogue): Try new
> >>>>>>      stack_protect_combined_test pattern first.
> >>>>>>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>>>>>      parameters to control which register to use as PIC register and force
> >>>>>>      reloading PIC register respectively.  Insert in the stream of insns if
> >>>>>>      possible.
> >>>>>>      (legitimize_pic_address): Expose above new parameters in prototype and
> >>>>>>      adapt recursive calls accordingly.  Use pic_reg if non null instead of
> >>>>>>      cached one.
> >>>>>>      (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> >>>>>>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>>>>>      prototype.
> >>>>>>      (thumb_legitimize_address): Likewise.
> >>>>>>      (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> >>>>>>      (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> >>>>>>      (thumb1_expand_prologue): Likewise.
> >>>>>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>>>>>      change.
> >>>>>>      (arm_load_pic_register): Likewise.
> >>>>>>      * config/arm/predicated.md (guard_addr_operand): New predicate.
> >>>>>>      (guard_operand): New predicate.
> >>>>>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>>>>>      prototype change.
> >>>>>>      (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> >>>>>>      prototype change.
> >>>>>>      (stack_protect_combined_set): New expander..
> >>>>>>      (stack_protect_combined_set_insn): New insn_and_split pattern.
> >>>>>>      (stack_protect_set_insn): New insn pattern.
> >>>>>>      (stack_protect_combined_test): New expander.
> >>>>>>      (stack_protect_combined_test_insn): New insn_and_split pattern.
> >>>>>>      (stack_protect_test_insn): New insn pattern.
> >>>>>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>>>>>      (UNSPEC_SP_TEST): Likewise.
> >>>>>>      * doc/md.texi (stack_protect_combined_set): Document new standard
> >>>>>>      pattern name.
> >>>>>>      (stack_protect_set): Clarify that the operand for guard's address is
> >>>>>>      legal.
> >>>>>>      (stack_protect_combined_test): Document new standard pattern name.
> >>>>>>      (stack_protect_test): Clarify that the operand for guard's address is
> >>>>>>      legal.
> >>>>>>
> >>>>>> *** gcc/testsuite/ChangeLog ***
> >>>>>>
> >>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>
> >>>>>>      * gcc.target/arm/pr85434.c: New test.
> >>>>>>
> >>>>>> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> >>>>>> with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> >>>>>> -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> >>>>>> performed for Arm and Thumb-2. Default flags show no regression and
> >>>>>> the other runs have some expected scan-assembler failing (due to stack
> >>>>>> protector or fPIC code sequence), as well as guality fail (due to less
> >>>>>> optimized code with the new stack protector code) and some execution
> >>>>>> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> >>>>>> due to the PIC sequence for the global variable making the frame
> >>>>>> layout different for the 2 functions (these become PASS if making the
> >>>>>> global variable static).
> >>>>>>
> >>>>>> Is this ok for trunk?
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Thomas
> >>>>>>
> >>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> >>>>>>
> >>>>>>
> >>>>>> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> >>>>>> <kyrylo.tkachov@foss.arm.com> wrote:
> >>>>>>> Hi Thomas,
> >>>>>>>
> >>>>>>> On 29/08/18 10:51, Thomas Preudhomme wrote:
> >>>>>>>> Resend hopefully without HTML this time.
> >>>>>>>>
> >>>>>>>> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> >>>>>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> In case of high register pressure in PIC mode, address of the stack
> >>>>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
> >>>>>>>>> thus allowing an attacker to control what the canary would be compared
> >>>>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
> >>>>>>>>> patterns, defining them does not help as the address is expanded
> >>>>>>>>> regularly and the patterns only deal with the copy and test of the
> >>>>>>>>> guard with the canary.
> >>>>>>>>>
> >>>>>>>>> This problem does not occur for x86 targets because the PIC access and
> >>>>>>>>> the test can be done in the same instruction. Aarch64 is exempt too
> >>>>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
> >>>>>>>>> the second access in the epilogue being CSEd in cse_local pass with the
> >>>>>>>>> first access in the prologue.
> >>>>>>>>>
> >>>>>>>>> The approach followed here is to create new "combined" set and test
> >>>>>>>>> standard pattern names that take the unexpanded guard and do the set or
> >>>>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> >>>>>>>>> to hide the individual instructions being generated to the compiler and
> >>>>>>>>> split the pattern into generic load, compare and branch instruction
> >>>>>>>>> after register allocator, therefore avoiding any spilling. This is here
> >>>>>>>>> implemented for the ARM targets. For targets not implementing these new
> >>>>>>>>> standard pattern names, the existing stack_protect_set and
> >>>>>>>>> stack_protect_test pattern names are used.
> >>>>>>>>>
> >>>>>>>>> To be able to split PIC access after register allocation, the functions
> >>>>>>>>> had to be augmented to force a new PIC register load and to control
> >>>>>>>>> which register it loads into. This is because sharing the PIC register
> >>>>>>>>> between prologue and epilogue could lead to spilling due to CSE again
> >>>>>>>>> which an attacker could use to control what the canary gets compared
> >>>>>>>>> against.
> >>>>>>>>>
> >>>>>>>>> ChangeLog entries are as follows:
> >>>>>>>>>
> >>>>>>>>> *** gcc/ChangeLog ***
> >>>>>>>>>
> >>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>>>>
> >>>>>>>>>       * target-insns.def (stack_protect_combined_set): Define new standard
> >>>>>>>>>       pattern name.
> >>>>>>>>>       (stack_protect_combined_test): Likewise.
> >>>>>>>>>       * cfgexpand.c (stack_protect_prologue): Try new
> >>>>>>>>>       stack_protect_combined_set pattern first.
> >>>>>>>>>       * function.c (stack_protect_epilogue): Try new
> >>>>>>>>>       stack_protect_combined_test pattern first.
> >>>>>>>>>       * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>>>>>>>>       parameters to control which register to use as PIC register and force
> >>>>>>>>>       reloading PIC register respectively.  Insert in the stream of insns if
> >>>>>>>>>       possible.
> >>>>>>>>>       (legitimize_pic_address): Expose above new parameters in prototype and
> >>>>>>>>>       adapt recursive calls accordingly.
> >>>>>>>>>       (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>>>>>>>>       prototype.
> >>>>>>>>>       (thumb_legitimize_address): Likewise.
> >>>>>>>>>       (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> >>>>>>>>>       * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>>>>>>>>       change.
> >>>>>>>>>       * config/arm/predicated.md (guard_operand): New predicate.
> >>>>>>> Typo, predicates.md is the filename.
> >>>>>>>
> >>>>>>> Looks ok to me otherwise.
> >>>>>>> Thank you for your patience.
> >>>>>>>
> >>>>>>> Kyrill
> >>>>>>>
> >>>>>>>>>       * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>>>>>>>>       prototype change.
> >>>>>>>>>       (stack_protect_combined_set): New insn_and_split pattern.
> >>>>>>>>>       (stack_protect_set): New insn pattern.
> >>>>>>>>>       (stack_protect_combined_test): New insn_and_split pattern.
> >>>>>>>>>       (stack_protect_test): New insn pattern.
> >>>>>>>>>       * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>>>>>>>>       (UNSPEC_SP_TEST): Likewise.
> >>>>>>>>>       * doc/md.texi (stack_protect_combined_set): Document new standard
> >>>>>>>>>       pattern name.
> >>>>>>>>>       (stack_protect_set): Clarify that the operand for guard's address is
> >>>>>>>>>       legal.
> >>>>>>>>>       (stack_protect_combined_test): Document new standard pattern name.
> >>>>>>>>>       (stack_protect_test): Clarify that the operand for guard's address is
> >>>>>>>>>       legal.
> >>>>>>>>>
> >>>>>>>>> *** gcc/testsuite/ChangeLog ***
> >>>>>>>>>
> >>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>>>>
> >>>>>>>>>       * gcc.target/arm/pr85434.c: New test.
> >>>>>>>>> Testing:
> >>>>>>>>>
> >>>>>>>>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> >>>>>>>>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> >>>>>>>>> cross ARM Linux: build + testsuite -> no regression
> >>>>>>>>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> >>>>>>>>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> >>>>>>>>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> >>>>>>>>>
> >>>>>>>>> Is this ok for trunk?
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>>
> >>>>>>>>> Thomas
>
>
> @@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
>         mask &= THUMB2_WORK_REGS;
>         if (!IS_NESTED (func_type))
>       mask |= (1 << IP_REGNUM);
> -      arm_load_pic_register (mask);
> +      arm_load_pic_register (mask, 0);
>
>
>
> Please use NULL_RTX rather than 0 here and in the other occurrences in the patch.
> At a glance the changes look ok, but I'll have a deeper look later.
>
> Thanks,
> Kyrill

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 35699 bytes --]

From ab3b83022b775e1e05a03f20c743f4dedd9e536c Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.  Use pic_reg if non null instead of
	cached one.
	(arm_load_pic_register): Add pic_reg parameter and use it if non null.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to require_pic_register prototype change.
	(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
	(thumb1_expand_prologue): Likewise.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	(arm_load_pic_register): Likewise.
	* config/arm/predicated.md (guard_addr_operand): New predicate.
	(guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
	prototype change.
	(stack_protect_combined_set): New expander..
	(stack_protect_combined_set_insn): New insn_and_split pattern.
	(stack_protect_set_insn): New insn pattern.
	(stack_protect_combined_test): New expander.
	(stack_protect_combined_test_insn): New insn_and_split pattern.
	(arm_stack_protect_test_insn): New insn pattern.
	* config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   4 +-
 gcc/config/arm/arm.c                   |  87 ++++++++---
 gcc/config/arm/arm.md                  | 162 +++++++++++++++++++-
 gcc/config/arm/predicates.md           |  17 +++
 gcc/config/arm/thumb1.md               |  13 ++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 11 files changed, 549 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..96b8150d34c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
   return 1;
 }
 
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
 	{
 	  rtx_insn *seq, *insn;
 
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
 
 	      seq = get_insns ();
 	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
 
 	  /* Make the MEM as close to a constant as possible.  */
 	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
    low register.  */
 
 void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
 
   if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
     return;
 
   gcc_assert (flag_pic);
 
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
   if (TARGET_VXWORKS_RTP)
     {
       pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
       mask &= THUMB2_WORK_REGS;
       if (!IS_NESTED (func_type))
 	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, NULL_RTX);
     }
 
   /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
   /* Load the pic register before setting the frame pointer,
      so we can use r7 as a temporary work register.  */
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, NULL_RTX);
 
   if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
     emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..1f702f81fd1 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -6309,7 +6310,7 @@
   /* r3 is clobbered by set/longjmp, so we can use it as a scratch
      register.  */
   if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, NULL_RTX);
   DONE;
 }")
 
@@ -8634,6 +8635,163 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "clob,nocond")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t1,32")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  if (TARGET_32BIT)
+    {
+      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
+						  operands[3]));
+      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
+    }
+  else
+    {
+      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
+						     operands[3]));
+      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
+				      operands[2]));
+    }
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "arm_stack_protect_test_insn"
+  [(set (reg:CC_Z CC_REGNUM)
+	(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
+				  (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
+				 UNSPEC_SP_TEST)
+		      (const_int 0)))
+   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
+   (clobber (match_dup 2))]
+  "TARGET_32BIT"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8,12")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,32")]
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..69718ee9c7a 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,23 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard's address in
+; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
+(define_predicate "guard_addr_operand"
+  (match_test "true")
+{
+  return (CONSTANT_ADDRESS_P (op)
+	  || !targetm.cannot_force_const_mem (mode, op));
+})
+
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  return guard_addr_operand (XEXP (op, 0), mode);
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 19dcdbcdd73..cd199c9c529 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1962,4 +1962,17 @@
   }"
   [(set_attr "type" "mov_reg")]
 )
+
+(define_insn "thumb1_stack_protect_test_insn"
+  [(set (match_operand:SI 0 "register_operand" "=&l")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "+l"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  "TARGET_THUMB1"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8")
+   (set_attr "conds" "set")
+   (set_attr "type" "multiple")]
+)
 \f
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..0667a242ef3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7424,22 +7424,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 302438323c8..17aecedd981 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = NULL;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-10 15:07                   ` Thomas Preudhomme
@ 2018-11-16 14:57                     ` Thomas Preudhomme
  2018-11-21  0:32                       ` Jeff Law
                                         ` (2 more replies)
  0 siblings, 3 replies; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-16 14:57 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 20186 bytes --]

Ping?

Best regards,

Thomas

On Sat, 10 Nov 2018 at 15:07, Thomas Preudhomme
<thomas.preudhomme@linaro.org> wrote:
>
> Thanks Kyrill.
>
> Updated patch in attachment. Best regards,
>
> Thomas
> On Thu, 8 Nov 2018 at 15:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> >
> > Hi Thomas,
> >
> > On 08/11/18 09:52, Thomas Preudhomme wrote:
> > > Ping?
> > >
> > > Best regards,
> > >
> > > Thomas
> > >
> > > On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme
> > > <thomas.preudhomme@linaro.org> wrote:
> > >> Ping?
> > >>
> > >> Best regards,
> > >>
> > >> Thomas
> > >> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
> > >> <thomas.preudhomme@linaro.org> wrote:
> > >>> Hi,
> > >>>
> > >>> Please find updated patch to fix PR85434: spilling of stack protector
> > >>> guard's address on ARM. Quite a few changes have been made to the ARM
> > >>> part since last round of review so I think it makes more sense to
> > >>> review it anew. Ran bootstrap + regression testsuite + glibc build +
> > >>> glibc regression testsuite for Arm and Thumb-2 and bootstrap +
> > >>> regression testsuite for Thumb-1. GCC's regression testsuite was run
> > >>> in 3 configurations in all those cases:
> > >>>
> > >>> - default configuration (no RUNTESTFLAGS)
> > >>> - with -fstack-protector-all
> > >>> - with -fPIC -fstack-protector-all (to exercise both codepath in stack
> > >>> protector's split code)
> > >>>
> > >>> None of this show any regression beyond some new scan fail with
> > >>> -fstack-protector-all or -fPIC due to unexpected code sequence for the
> > >>> testcases concerned and some guality swing due to less optimization
> > >>> with new stack protector on.
> > >>>
> > >>> Patch description and ChangeLog below.
> > >>>
> > >>> In case of high register pressure in PIC mode, address of the stack
> > >>> protector's guard can be spilled on ARM targets as shown in PR85434,
> > >>> thus allowing an attacker to control what the canary would be compared
> > >>> against. ARM does lack stack_protect_set and stack_protect_test insn
> > >>> patterns, defining them does not help as the address is expanded
> > >>> regularly and the patterns only deal with the copy and test of the
> > >>> guard with the canary.
> > >>>
> > >>> This problem does not occur for x86 targets because the PIC access and
> > >>> the test can be done in the same instruction. Aarch64 is exempt too
> > >>> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > >>> the second access in the epilogue being CSEd in cse_local pass with the
> > >>> first access in the prologue.
> > >>>
> > >>> The approach followed here is to create new "combined" set and test
> > >>> standard pattern names that take the unexpanded guard and do the set or
> > >>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > >>> to hide the individual instructions being generated to the compiler and
> > >>> split the pattern into generic load, compare and branch instruction
> > >>> after register allocator, therefore avoiding any spilling. This is here
> > >>> implemented for the ARM targets. For targets not implementing these new
> > >>> standard pattern names, the existing stack_protect_set and
> > >>> stack_protect_test pattern names are used.
> > >>>
> > >>> To be able to split PIC access after register allocation, the functions
> > >>> had to be augmented to force a new PIC register load and to control
> > >>> which register it loads into. This is because sharing the PIC register
> > >>> between prologue and epilogue could lead to spilling due to CSE again
> > >>> which an attacker could use to control what the canary gets compared
> > >>> against.
> > >>>
> > >>> ChangeLog entries are as follows:
> > >>>
> > >>> *** gcc/ChangeLog ***
> > >>>
> > >>> 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>>
> > >>> * target-insns.def (stack_protect_combined_set): Define new standard
> > >>> pattern name.
> > >>> (stack_protect_combined_test): Likewise.
> > >>> * cfgexpand.c (stack_protect_prologue): Try new
> > >>> stack_protect_combined_set pattern first.
> > >>> * function.c (stack_protect_epilogue): Try new
> > >>> stack_protect_combined_test pattern first.
> > >>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > >>> parameters to control which register to use as PIC register and force
> > >>> reloading PIC register respectively.  Insert in the stream of insns if
> > >>> possible.
> > >>> (legitimize_pic_address): Expose above new parameters in prototype and
> > >>> adapt recursive calls accordingly.  Use pic_reg if non null instead of
> > >>> cached one.
> > >>> (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> > >>> (arm_legitimize_address): Adapt to new legitimize_pic_address
> > >>> prototype.
> > >>> (thumb_legitimize_address): Likewise.
> > >>> (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> > >>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> > >>> (thumb1_expand_prologue): Likewise.
> > >>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > >>> change.
> > >>> (arm_load_pic_register): Likewise.
> > >>> * config/arm/predicated.md (guard_addr_operand): New predicate.
> > >>> (guard_operand): New predicate.
> > >>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > >>> prototype change.
> > >>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> > >>> prototype change.
> > >>> (stack_protect_combined_set): New expander..
> > >>> (stack_protect_combined_set_insn): New insn_and_split pattern.
> > >>> (stack_protect_set_insn): New insn pattern.
> > >>> (stack_protect_combined_test): New expander.
> > >>> (stack_protect_combined_test_insn): New insn_and_split pattern.
> > >>> (arm_stack_protect_test_insn): New insn pattern.
> > >>> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
> > >>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > >>> (UNSPEC_SP_TEST): Likewise.
> > >>> * doc/md.texi (stack_protect_combined_set): Document new standard
> > >>> pattern name.
> > >>> (stack_protect_set): Clarify that the operand for guard's address is
> > >>> legal.
> > >>> (stack_protect_combined_test): Document new standard pattern name.
> > >>> (stack_protect_test): Clarify that the operand for guard's address is
> > >>> legal.
> > >>>
> > >>> *** gcc/testsuite/ChangeLog ***
> > >>>
> > >>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>>
> > >>> * gcc.target/arm/pr85434.c: New test.
> > >>>
> > >>> Is this ok for trunk?
> > >>>
> > >>> Best regards,
> > >>>
> > >>> Thomas
> > >>> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
> > >>> <thomas.preudhomme@linaro.org> wrote:
> > >>>> Good thing I did, found a missing earlyclobber in the process.
> > >>>> Rerunning all tests again.
> > >>>>
> > >>>> Best regards,
> > >>>>
> > >>>> Thomas
> > >>>> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
> > >>>> <thomas.preudhomme@linaro.org> wrote:
> > >>>>> Please hold on for the reviews, found a small improvement that could
> > >>>>> be done. Am testing it right now, should have something by tonight or
> > >>>>> tomorrow.
> > >>>>>
> > >>>>> Best regards,
> > >>>>>
> > >>>>> Thomas
> > >>>>> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> > >>>>> <thomas.preudhomme@linaro.org> wrote:
> > >>>>>> [Removing Jeff Law since middle end code hasn't changed]
> > >>>>>>
> > >>>>>> Hi,
> > >>>>>>
> > >>>>>> Given how memory operand are reloaded even with an X constraint, I've
> > >>>>>> reworked the patch for the combined set and combined test instruction
> > >>>>>> ot keep the mem out of the match_operand and used an expander to
> > >>>>>> generate the right instruction pattern. I've also fixed some
> > >>>>>> longstanding issues with the patch when flag_pic is true and with
> > >>>>>> constraints for Thumb-1 that I hadn't noticed before due to using
> > >>>>>> dg-cmp-results in conjunction with test_summary which does not show
> > >>>>>> NA->FAIL (see [1]).
> > >>>>>>
> > >>>>>> All in all, I think the Arm code would do with a fresh review rather
> > >>>>>> than looking at the changes since last posted version. (unchanged)
> > >>>>>> ChangeLog entries are as follows:
> > >>>>>>
> > >>>>>> *** gcc/ChangeLog ***
> > >>>>>>
> > >>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>>>>>
> > >>>>>>      * target-insns.def (stack_protect_combined_set): Define new standard
> > >>>>>>      pattern name.
> > >>>>>>      (stack_protect_combined_test): Likewise.
> > >>>>>>      * cfgexpand.c (stack_protect_prologue): Try new
> > >>>>>>      stack_protect_combined_set pattern first.
> > >>>>>>      * function.c (stack_protect_epilogue): Try new
> > >>>>>>      stack_protect_combined_test pattern first.
> > >>>>>>      * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > >>>>>>      parameters to control which register to use as PIC register and force
> > >>>>>>      reloading PIC register respectively.  Insert in the stream of insns if
> > >>>>>>      possible.
> > >>>>>>      (legitimize_pic_address): Expose above new parameters in prototype and
> > >>>>>>      adapt recursive calls accordingly.  Use pic_reg if non null instead of
> > >>>>>>      cached one.
> > >>>>>>      (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> > >>>>>>      (arm_legitimize_address): Adapt to new legitimize_pic_address
> > >>>>>>      prototype.
> > >>>>>>      (thumb_legitimize_address): Likewise.
> > >>>>>>      (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> > >>>>>>      (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> > >>>>>>      (thumb1_expand_prologue): Likewise.
> > >>>>>>      * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > >>>>>>      change.
> > >>>>>>      (arm_load_pic_register): Likewise.
> > >>>>>>      * config/arm/predicated.md (guard_addr_operand): New predicate.
> > >>>>>>      (guard_operand): New predicate.
> > >>>>>>      * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > >>>>>>      prototype change.
> > >>>>>>      (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> > >>>>>>      prototype change.
> > >>>>>>      (stack_protect_combined_set): New expander..
> > >>>>>>      (stack_protect_combined_set_insn): New insn_and_split pattern.
> > >>>>>>      (stack_protect_set_insn): New insn pattern.
> > >>>>>>      (stack_protect_combined_test): New expander.
> > >>>>>>      (stack_protect_combined_test_insn): New insn_and_split pattern.
> > >>>>>>      (stack_protect_test_insn): New insn pattern.
> > >>>>>>      * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > >>>>>>      (UNSPEC_SP_TEST): Likewise.
> > >>>>>>      * doc/md.texi (stack_protect_combined_set): Document new standard
> > >>>>>>      pattern name.
> > >>>>>>      (stack_protect_set): Clarify that the operand for guard's address is
> > >>>>>>      legal.
> > >>>>>>      (stack_protect_combined_test): Document new standard pattern name.
> > >>>>>>      (stack_protect_test): Clarify that the operand for guard's address is
> > >>>>>>      legal.
> > >>>>>>
> > >>>>>> *** gcc/testsuite/ChangeLog ***
> > >>>>>>
> > >>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>>>>>
> > >>>>>>      * gcc.target/arm/pr85434.c: New test.
> > >>>>>>
> > >>>>>> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> > >>>>>> with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> > >>>>>> -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> > >>>>>> performed for Arm and Thumb-2. Default flags show no regression and
> > >>>>>> the other runs have some expected scan-assembler failing (due to stack
> > >>>>>> protector or fPIC code sequence), as well as guality fail (due to less
> > >>>>>> optimized code with the new stack protector code) and some execution
> > >>>>>> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> > >>>>>> due to the PIC sequence for the global variable making the frame
> > >>>>>> layout different for the 2 functions (these become PASS if making the
> > >>>>>> global variable static).
> > >>>>>>
> > >>>>>> Is this ok for trunk?
> > >>>>>>
> > >>>>>> Best regards,
> > >>>>>>
> > >>>>>> Thomas
> > >>>>>>
> > >>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> > >>>>>>
> > >>>>>>
> > >>>>>> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> > >>>>>> <kyrylo.tkachov@foss.arm.com> wrote:
> > >>>>>>> Hi Thomas,
> > >>>>>>>
> > >>>>>>> On 29/08/18 10:51, Thomas Preudhomme wrote:
> > >>>>>>>> Resend hopefully without HTML this time.
> > >>>>>>>>
> > >>>>>>>> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> > >>>>>>>> <thomas.preudhomme@linaro.org> wrote:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>> In case of high register pressure in PIC mode, address of the stack
> > >>>>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
> > >>>>>>>>> thus allowing an attacker to control what the canary would be compared
> > >>>>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
> > >>>>>>>>> patterns, defining them does not help as the address is expanded
> > >>>>>>>>> regularly and the patterns only deal with the copy and test of the
> > >>>>>>>>> guard with the canary.
> > >>>>>>>>>
> > >>>>>>>>> This problem does not occur for x86 targets because the PIC access and
> > >>>>>>>>> the test can be done in the same instruction. Aarch64 is exempt too
> > >>>>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
> > >>>>>>>>> the second access in the epilogue being CSEd in cse_local pass with the
> > >>>>>>>>> first access in the prologue.
> > >>>>>>>>>
> > >>>>>>>>> The approach followed here is to create new "combined" set and test
> > >>>>>>>>> standard pattern names that take the unexpanded guard and do the set or
> > >>>>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> > >>>>>>>>> to hide the individual instructions being generated to the compiler and
> > >>>>>>>>> split the pattern into generic load, compare and branch instruction
> > >>>>>>>>> after register allocator, therefore avoiding any spilling. This is here
> > >>>>>>>>> implemented for the ARM targets. For targets not implementing these new
> > >>>>>>>>> standard pattern names, the existing stack_protect_set and
> > >>>>>>>>> stack_protect_test pattern names are used.
> > >>>>>>>>>
> > >>>>>>>>> To be able to split PIC access after register allocation, the functions
> > >>>>>>>>> had to be augmented to force a new PIC register load and to control
> > >>>>>>>>> which register it loads into. This is because sharing the PIC register
> > >>>>>>>>> between prologue and epilogue could lead to spilling due to CSE again
> > >>>>>>>>> which an attacker could use to control what the canary gets compared
> > >>>>>>>>> against.
> > >>>>>>>>>
> > >>>>>>>>> ChangeLog entries are as follows:
> > >>>>>>>>>
> > >>>>>>>>> *** gcc/ChangeLog ***
> > >>>>>>>>>
> > >>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>>>>>>>>
> > >>>>>>>>>       * target-insns.def (stack_protect_combined_set): Define new standard
> > >>>>>>>>>       pattern name.
> > >>>>>>>>>       (stack_protect_combined_test): Likewise.
> > >>>>>>>>>       * cfgexpand.c (stack_protect_prologue): Try new
> > >>>>>>>>>       stack_protect_combined_set pattern first.
> > >>>>>>>>>       * function.c (stack_protect_epilogue): Try new
> > >>>>>>>>>       stack_protect_combined_test pattern first.
> > >>>>>>>>>       * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> > >>>>>>>>>       parameters to control which register to use as PIC register and force
> > >>>>>>>>>       reloading PIC register respectively.  Insert in the stream of insns if
> > >>>>>>>>>       possible.
> > >>>>>>>>>       (legitimize_pic_address): Expose above new parameters in prototype and
> > >>>>>>>>>       adapt recursive calls accordingly.
> > >>>>>>>>>       (arm_legitimize_address): Adapt to new legitimize_pic_address
> > >>>>>>>>>       prototype.
> > >>>>>>>>>       (thumb_legitimize_address): Likewise.
> > >>>>>>>>>       (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> > >>>>>>>>>       * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> > >>>>>>>>>       change.
> > >>>>>>>>>       * config/arm/predicated.md (guard_operand): New predicate.
> > >>>>>>> Typo, predicates.md is the filename.
> > >>>>>>>
> > >>>>>>> Looks ok to me otherwise.
> > >>>>>>> Thank you for your patience.
> > >>>>>>>
> > >>>>>>> Kyrill
> > >>>>>>>
> > >>>>>>>>>       * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> > >>>>>>>>>       prototype change.
> > >>>>>>>>>       (stack_protect_combined_set): New insn_and_split pattern.
> > >>>>>>>>>       (stack_protect_set): New insn pattern.
> > >>>>>>>>>       (stack_protect_combined_test): New insn_and_split pattern.
> > >>>>>>>>>       (stack_protect_test): New insn pattern.
> > >>>>>>>>>       * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> > >>>>>>>>>       (UNSPEC_SP_TEST): Likewise.
> > >>>>>>>>>       * doc/md.texi (stack_protect_combined_set): Document new standard
> > >>>>>>>>>       pattern name.
> > >>>>>>>>>       (stack_protect_set): Clarify that the operand for guard's address is
> > >>>>>>>>>       legal.
> > >>>>>>>>>       (stack_protect_combined_test): Document new standard pattern name.
> > >>>>>>>>>       (stack_protect_test): Clarify that the operand for guard's address is
> > >>>>>>>>>       legal.
> > >>>>>>>>>
> > >>>>>>>>> *** gcc/testsuite/ChangeLog ***
> > >>>>>>>>>
> > >>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> > >>>>>>>>>
> > >>>>>>>>>       * gcc.target/arm/pr85434.c: New test.
> > >>>>>>>>> Testing:
> > >>>>>>>>>
> > >>>>>>>>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> > >>>>>>>>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> > >>>>>>>>> cross ARM Linux: build + testsuite -> no regression
> > >>>>>>>>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> > >>>>>>>>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> > >>>>>>>>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> > >>>>>>>>>
> > >>>>>>>>> Is this ok for trunk?
> > >>>>>>>>>
> > >>>>>>>>> Best regards,
> > >>>>>>>>>
> > >>>>>>>>> Thomas
> >
> >
> > @@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
> >         mask &= THUMB2_WORK_REGS;
> >         if (!IS_NESTED (func_type))
> >       mask |= (1 << IP_REGNUM);
> > -      arm_load_pic_register (mask);
> > +      arm_load_pic_register (mask, 0);
> >
> >
> >
> > Please use NULL_RTX rather than 0 here and in the other occurrences in the patch.
> > At a glance the changes look ok, but I'll have a deeper look later.
> >
> > Thanks,
> > Kyrill

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 35699 bytes --]

From ab3b83022b775e1e05a03f20c743f4dedd9e536c Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.  Use pic_reg if non null instead of
	cached one.
	(arm_load_pic_register): Add pic_reg parameter and use it if non null.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to require_pic_register prototype change.
	(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
	(thumb1_expand_prologue): Likewise.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	(arm_load_pic_register): Likewise.
	* config/arm/predicated.md (guard_addr_operand): New predicate.
	(guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
	prototype change.
	(stack_protect_combined_set): New expander..
	(stack_protect_combined_set_insn): New insn_and_split pattern.
	(stack_protect_set_insn): New insn pattern.
	(stack_protect_combined_test): New expander.
	(stack_protect_combined_test_insn): New insn_and_split pattern.
	(arm_stack_protect_test_insn): New insn pattern.
	* config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   4 +-
 gcc/config/arm/arm.c                   |  87 ++++++++---
 gcc/config/arm/arm.md                  | 162 +++++++++++++++++++-
 gcc/config/arm/predicates.md           |  17 +++
 gcc/config/arm/thumb1.md               |  13 ++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 11 files changed, 549 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..96b8150d34c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
   return 1;
 }
 
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
 	{
 	  rtx_insn *seq, *insn;
 
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
 
 	      seq = get_insns ();
 	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
 
 	  /* Make the MEM as close to a constant as possible.  */
 	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
    low register.  */
 
 void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
 
   if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
     return;
 
   gcc_assert (flag_pic);
 
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
   if (TARGET_VXWORKS_RTP)
     {
       pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
       mask &= THUMB2_WORK_REGS;
       if (!IS_NESTED (func_type))
 	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, NULL_RTX);
     }
 
   /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
   /* Load the pic register before setting the frame pointer,
      so we can use r7 as a temporary work register.  */
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, NULL_RTX);
 
   if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
     emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..1f702f81fd1 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -6309,7 +6310,7 @@
   /* r3 is clobbered by set/longjmp, so we can use it as a scratch
      register.  */
   if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, NULL_RTX);
   DONE;
 }")
 
@@ -8634,6 +8635,163 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "clob,nocond")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t1,32")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  if (TARGET_32BIT)
+    {
+      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
+						  operands[3]));
+      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
+    }
+  else
+    {
+      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
+						     operands[3]));
+      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
+				      operands[2]));
+    }
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "arm_stack_protect_test_insn"
+  [(set (reg:CC_Z CC_REGNUM)
+	(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
+				  (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
+				 UNSPEC_SP_TEST)
+		      (const_int 0)))
+   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
+   (clobber (match_dup 2))]
+  "TARGET_32BIT"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8,12")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,32")]
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..69718ee9c7a 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,23 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard's address in
+; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
+(define_predicate "guard_addr_operand"
+  (match_test "true")
+{
+  return (CONSTANT_ADDRESS_P (op)
+	  || !targetm.cannot_force_const_mem (mode, op));
+})
+
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  return guard_addr_operand (XEXP (op, 0), mode);
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 19dcdbcdd73..cd199c9c529 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1962,4 +1962,17 @@
   }"
   [(set_attr "type" "mov_reg")]
 )
+
+(define_insn "thumb1_stack_protect_test_insn"
+  [(set (match_operand:SI 0 "register_operand" "=&l")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "+l"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  "TARGET_THUMB1"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8")
+   (set_attr "conds" "set")
+   (set_attr "type" "multiple")]
+)
 \f
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..0667a242ef3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7424,22 +7424,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 302438323c8..17aecedd981 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = NULL;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-16 14:57                     ` [PATCH, ARM, ping3] " Thomas Preudhomme
@ 2018-11-21  0:32                       ` Jeff Law
  2018-11-21 10:35                         ` Thomas Preudhomme
  2018-11-21 16:07                       ` Kyrill Tkachov
  2018-11-21 17:54                       ` Segher Boessenkool
  2 siblings, 1 reply; 20+ messages in thread
From: Jeff Law @ 2018-11-21  0:32 UTC (permalink / raw)
  To: Thomas Preudhomme, kyrylo.tkachov
  Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

On 11/16/18 7:56 AM, Thomas Preudhomme wrote:
> Ping?
I thought I acked the target independent stuff a while back.  What's
still waiting on review here?

jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-21  0:32                       ` Jeff Law
@ 2018-11-21 10:35                         ` Thomas Preudhomme
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-21 10:35 UTC (permalink / raw)
  To: Jeff Law
  Cc: kyrylo.tkachov, Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

Yes you did indeed which is why I didn't include you in to To list.
I've reworked the Arm part significantly since it was last approved,
the ping is meant for the Arm maintainers.

Thanks for enquiring about it. Best regards,

Thomas
On Wed, 21 Nov 2018 at 00:32, Jeff Law <law@redhat.com> wrote:
>
> On 11/16/18 7:56 AM, Thomas Preudhomme wrote:
> > Ping?
> I thought I acked the target independent stuff a while back.  What's
> still waiting on review here?
>
> jeff

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-16 14:57                     ` [PATCH, ARM, ping3] " Thomas Preudhomme
  2018-11-21  0:32                       ` Jeff Law
@ 2018-11-21 16:07                       ` Kyrill Tkachov
  2018-11-22 14:49                         ` Thomas Preudhomme
  2018-11-21 17:54                       ` Segher Boessenkool
  2 siblings, 1 reply; 20+ messages in thread
From: Kyrill Tkachov @ 2018-11-21 16:07 UTC (permalink / raw)
  To: Thomas Preudhomme; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

Hi Thomas,

Sorry for the delay.

On 16/11/18 14:56, Thomas Preudhomme wrote:
> Ping?
>
> Best regards,
>
> Thomas
>
> On Sat, 10 Nov 2018 at 15:07, Thomas Preudhomme
> <thomas.preudhomme@linaro.org> wrote:
>> Thanks Kyrill.
>>
>> Updated patch in attachment. Best regards,
>>
>> Thomas
>> On Thu, 8 Nov 2018 at 15:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
>>> Hi Thomas,
>>>
>>> On 08/11/18 09:52, Thomas Preudhomme wrote:
>>>> Ping?
>>>>
>>>> Best regards,
>>>>
>>>> Thomas
>>>>
>>>> On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme
>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>> Ping?
>>>>>
>>>>> Best regards,
>>>>>
>>>>> Thomas
>>>>> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> Please find updated patch to fix PR85434: spilling of stack protector
>>>>>> guard's address on ARM. Quite a few changes have been made to the ARM
>>>>>> part since last round of review so I think it makes more sense to
>>>>>> review it anew. Ran bootstrap + regression testsuite + glibc build +
>>>>>> glibc regression testsuite for Arm and Thumb-2 and bootstrap +
>>>>>> regression testsuite for Thumb-1. GCC's regression testsuite was run
>>>>>> in 3 configurations in all those cases:
>>>>>>
>>>>>> - default configuration (no RUNTESTFLAGS)
>>>>>> - with -fstack-protector-all
>>>>>> - with -fPIC -fstack-protector-all (to exercise both codepath in stack
>>>>>> protector's split code)
>>>>>>
>>>>>> None of this show any regression beyond some new scan fail with
>>>>>> -fstack-protector-all or -fPIC due to unexpected code sequence for the
>>>>>> testcases concerned and some guality swing due to less optimization
>>>>>> with new stack protector on.
>>>>>>
>>>>>> Patch description and ChangeLog below.
>>>>>>
>>>>>> In case of high register pressure in PIC mode, address of the stack
>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
>>>>>> thus allowing an attacker to control what the canary would be compared
>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
>>>>>> patterns, defining them does not help as the address is expanded
>>>>>> regularly and the patterns only deal with the copy and test of the
>>>>>> guard with the canary.
>>>>>>
>>>>>> This problem does not occur for x86 targets because the PIC access and
>>>>>> the test can be done in the same instruction. Aarch64 is exempt too
>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
>>>>>> the second access in the epilogue being CSEd in cse_local pass with the
>>>>>> first access in the prologue.
>>>>>>
>>>>>> The approach followed here is to create new "combined" set and test
>>>>>> standard pattern names that take the unexpanded guard and do the set or
>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
>>>>>> to hide the individual instructions being generated to the compiler and
>>>>>> split the pattern into generic load, compare and branch instruction
>>>>>> after register allocator, therefore avoiding any spilling. This is here
>>>>>> implemented for the ARM targets. For targets not implementing these new
>>>>>> standard pattern names, the existing stack_protect_set and
>>>>>> stack_protect_test pattern names are used.
>>>>>>
>>>>>> To be able to split PIC access after register allocation, the functions
>>>>>> had to be augmented to force a new PIC register load and to control
>>>>>> which register it loads into. This is because sharing the PIC register
>>>>>> between prologue and epilogue could lead to spilling due to CSE again
>>>>>> which an attacker could use to control what the canary gets compared
>>>>>> against.
>>>>>>
>>>>>> ChangeLog entries are as follows:
>>>>>>
>>>>>> *** gcc/ChangeLog ***
>>>>>>
>>>>>> 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>
>>>>>> * target-insns.def (stack_protect_combined_set): Define new standard
>>>>>> pattern name.
>>>>>> (stack_protect_combined_test): Likewise.
>>>>>> * cfgexpand.c (stack_protect_prologue): Try new
>>>>>> stack_protect_combined_set pattern first.
>>>>>> * function.c (stack_protect_epilogue): Try new
>>>>>> stack_protect_combined_test pattern first.
>>>>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>>>>> parameters to control which register to use as PIC register and force
>>>>>> reloading PIC register respectively.  Insert in the stream of insns if
>>>>>> possible.
>>>>>> (legitimize_pic_address): Expose above new parameters in prototype and
>>>>>> adapt recursive calls accordingly.  Use pic_reg if non null instead of
>>>>>> cached one.
>>>>>> (arm_load_pic_register): Add pic_reg parameter and use it if non null.
>>>>>> (arm_legitimize_address): Adapt to new legitimize_pic_address
>>>>>> prototype.
>>>>>> (thumb_legitimize_address): Likewise.
>>>>>> (arm_emit_call_insn): Adapt to require_pic_register prototype change.
>>>>>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
>>>>>> (thumb1_expand_prologue): Likewise.
>>>>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>>>>> change.
>>>>>> (arm_load_pic_register): Likewise.
>>>>>> * config/arm/predicated.md (guard_addr_operand): New predicate.
>>>>>> (guard_operand): New predicate.
>>>>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>>>>> prototype change.
>>>>>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
>>>>>> prototype change.
>>>>>> (stack_protect_combined_set): New expander..
>>>>>> (stack_protect_combined_set_insn): New insn_and_split pattern.
>>>>>> (stack_protect_set_insn): New insn pattern.
>>>>>> (stack_protect_combined_test): New expander.
>>>>>> (stack_protect_combined_test_insn): New insn_and_split pattern.
>>>>>> (arm_stack_protect_test_insn): New insn pattern.
>>>>>> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
>>>>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>>>>> (UNSPEC_SP_TEST): Likewise.
>>>>>> * doc/md.texi (stack_protect_combined_set): Document new standard
>>>>>> pattern name.
>>>>>> (stack_protect_set): Clarify that the operand for guard's address is
>>>>>> legal.
>>>>>> (stack_protect_combined_test): Document new standard pattern name.
>>>>>> (stack_protect_test): Clarify that the operand for guard's address is
>>>>>> legal.
>>>>>>
>>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>>
>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>
>>>>>> * gcc.target/arm/pr85434.c: New test.
>>>>>>
>>>>>> Is this ok for trunk?
>>>>>>
>>>>>> Best regards,
>>>>>>
>>>>>> Thomas
>>>>>> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
>>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>>> Good thing I did, found a missing earlyclobber in the process.
>>>>>>> Rerunning all tests again.
>>>>>>>
>>>>>>> Best regards,
>>>>>>>
>>>>>>> Thomas
>>>>>>> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
>>>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>>>> Please hold on for the reviews, found a small improvement that could
>>>>>>>> be done. Am testing it right now, should have something by tonight or
>>>>>>>> tomorrow.
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>>
>>>>>>>> Thomas
>>>>>>>> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
>>>>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>>>>> [Removing Jeff Law since middle end code hasn't changed]
>>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> Given how memory operand are reloaded even with an X constraint, I've
>>>>>>>>> reworked the patch for the combined set and combined test instruction
>>>>>>>>> ot keep the mem out of the match_operand and used an expander to
>>>>>>>>> generate the right instruction pattern. I've also fixed some
>>>>>>>>> longstanding issues with the patch when flag_pic is true and with
>>>>>>>>> constraints for Thumb-1 that I hadn't noticed before due to using
>>>>>>>>> dg-cmp-results in conjunction with test_summary which does not show
>>>>>>>>> NA->FAIL (see [1]).
>>>>>>>>>
>>>>>>>>> All in all, I think the Arm code would do with a fresh review rather
>>>>>>>>> than looking at the changes since last posted version. (unchanged)
>>>>>>>>> ChangeLog entries are as follows:
>>>>>>>>>
>>>>>>>>> *** gcc/ChangeLog ***
>>>>>>>>>
>>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>>>>
>>>>>>>>>       * target-insns.def (stack_protect_combined_set): Define new standard
>>>>>>>>>       pattern name.
>>>>>>>>>       (stack_protect_combined_test): Likewise.
>>>>>>>>>       * cfgexpand.c (stack_protect_prologue): Try new
>>>>>>>>>       stack_protect_combined_set pattern first.
>>>>>>>>>       * function.c (stack_protect_epilogue): Try new
>>>>>>>>>       stack_protect_combined_test pattern first.
>>>>>>>>>       * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>>>>>>>>       parameters to control which register to use as PIC register and force
>>>>>>>>>       reloading PIC register respectively.  Insert in the stream of insns if
>>>>>>>>>       possible.
>>>>>>>>>       (legitimize_pic_address): Expose above new parameters in prototype and
>>>>>>>>>       adapt recursive calls accordingly.  Use pic_reg if non null instead of
>>>>>>>>>       cached one.
>>>>>>>>>       (arm_load_pic_register): Add pic_reg parameter and use it if non null.
>>>>>>>>>       (arm_legitimize_address): Adapt to new legitimize_pic_address
>>>>>>>>>       prototype.
>>>>>>>>>       (thumb_legitimize_address): Likewise.
>>>>>>>>>       (arm_emit_call_insn): Adapt to require_pic_register prototype change.
>>>>>>>>>       (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
>>>>>>>>>       (thumb1_expand_prologue): Likewise.
>>>>>>>>>       * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>>>>>>>>       change.
>>>>>>>>>       (arm_load_pic_register): Likewise.
>>>>>>>>>       * config/arm/predicated.md (guard_addr_operand): New predicate.
>>>>>>>>>       (guard_operand): New predicate.
>>>>>>>>>       * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>>>>>>>>       prototype change.
>>>>>>>>>       (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
>>>>>>>>>       prototype change.
>>>>>>>>>       (stack_protect_combined_set): New expander..
>>>>>>>>>       (stack_protect_combined_set_insn): New insn_and_split pattern.
>>>>>>>>>       (stack_protect_set_insn): New insn pattern.
>>>>>>>>>       (stack_protect_combined_test): New expander.
>>>>>>>>>       (stack_protect_combined_test_insn): New insn_and_split pattern.
>>>>>>>>>       (stack_protect_test_insn): New insn pattern.
>>>>>>>>>       * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>>>>>>>>       (UNSPEC_SP_TEST): Likewise.
>>>>>>>>>       * doc/md.texi (stack_protect_combined_set): Document new standard
>>>>>>>>>       pattern name.
>>>>>>>>>       (stack_protect_set): Clarify that the operand for guard's address is
>>>>>>>>>       legal.
>>>>>>>>>       (stack_protect_combined_test): Document new standard pattern name.
>>>>>>>>>       (stack_protect_test): Clarify that the operand for guard's address is
>>>>>>>>>       legal.
>>>>>>>>>
>>>>>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>>>>>
>>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>>>>
>>>>>>>>>       * gcc.target/arm/pr85434.c: New test.
>>>>>>>>>
>>>>>>>>> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
>>>>>>>>> with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
>>>>>>>>> -fPIC -fstack-protect-all. A glibc build and testsuite run was also
>>>>>>>>> performed for Arm and Thumb-2. Default flags show no regression and
>>>>>>>>> the other runs have some expected scan-assembler failing (due to stack
>>>>>>>>> protector or fPIC code sequence), as well as guality fail (due to less
>>>>>>>>> optimized code with the new stack protector code) and some execution
>>>>>>>>> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
>>>>>>>>> due to the PIC sequence for the global variable making the frame
>>>>>>>>> layout different for the 2 functions (these become PASS if making the
>>>>>>>>> global variable static).
>>>>>>>>>
>>>>>>>>> Is this ok for trunk?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>>
>>>>>>>>> Thomas
>>>>>>>>>
>>>>>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
>>>>>>>>> <kyrylo.tkachov@foss.arm.com> wrote:
>>>>>>>>>> Hi Thomas,
>>>>>>>>>>
>>>>>>>>>> On 29/08/18 10:51, Thomas Preudhomme wrote:
>>>>>>>>>>> Resend hopefully without HTML this time.
>>>>>>>>>>>
>>>>>>>>>>> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
>>>>>>>>>>> <thomas.preudhomme@linaro.org> wrote:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> In case of high register pressure in PIC mode, address of the stack
>>>>>>>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
>>>>>>>>>>>> thus allowing an attacker to control what the canary would be compared
>>>>>>>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
>>>>>>>>>>>> patterns, defining them does not help as the address is expanded
>>>>>>>>>>>> regularly and the patterns only deal with the copy and test of the
>>>>>>>>>>>> guard with the canary.
>>>>>>>>>>>>
>>>>>>>>>>>> This problem does not occur for x86 targets because the PIC access and
>>>>>>>>>>>> the test can be done in the same instruction. Aarch64 is exempt too
>>>>>>>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
>>>>>>>>>>>> the second access in the epilogue being CSEd in cse_local pass with the
>>>>>>>>>>>> first access in the prologue.
>>>>>>>>>>>>
>>>>>>>>>>>> The approach followed here is to create new "combined" set and test
>>>>>>>>>>>> standard pattern names that take the unexpanded guard and do the set or
>>>>>>>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
>>>>>>>>>>>> to hide the individual instructions being generated to the compiler and
>>>>>>>>>>>> split the pattern into generic load, compare and branch instruction
>>>>>>>>>>>> after register allocator, therefore avoiding any spilling. This is here
>>>>>>>>>>>> implemented for the ARM targets. For targets not implementing these new
>>>>>>>>>>>> standard pattern names, the existing stack_protect_set and
>>>>>>>>>>>> stack_protect_test pattern names are used.
>>>>>>>>>>>>
>>>>>>>>>>>> To be able to split PIC access after register allocation, the functions
>>>>>>>>>>>> had to be augmented to force a new PIC register load and to control
>>>>>>>>>>>> which register it loads into. This is because sharing the PIC register
>>>>>>>>>>>> between prologue and epilogue could lead to spilling due to CSE again
>>>>>>>>>>>> which an attacker could use to control what the canary gets compared
>>>>>>>>>>>> against.
>>>>>>>>>>>>
>>>>>>>>>>>> ChangeLog entries are as follows:
>>>>>>>>>>>>
>>>>>>>>>>>> *** gcc/ChangeLog ***
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>>>>>>>
>>>>>>>>>>>>        * target-insns.def (stack_protect_combined_set): Define new standard
>>>>>>>>>>>>        pattern name.
>>>>>>>>>>>>        (stack_protect_combined_test): Likewise.
>>>>>>>>>>>>        * cfgexpand.c (stack_protect_prologue): Try new
>>>>>>>>>>>>        stack_protect_combined_set pattern first.
>>>>>>>>>>>>        * function.c (stack_protect_epilogue): Try new
>>>>>>>>>>>>        stack_protect_combined_test pattern first.
>>>>>>>>>>>>        * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
>>>>>>>>>>>>        parameters to control which register to use as PIC register and force
>>>>>>>>>>>>        reloading PIC register respectively.  Insert in the stream of insns if
>>>>>>>>>>>>        possible.
>>>>>>>>>>>>        (legitimize_pic_address): Expose above new parameters in prototype and
>>>>>>>>>>>>        adapt recursive calls accordingly.
>>>>>>>>>>>>        (arm_legitimize_address): Adapt to new legitimize_pic_address
>>>>>>>>>>>>        prototype.
>>>>>>>>>>>>        (thumb_legitimize_address): Likewise.
>>>>>>>>>>>>        (arm_emit_call_insn): Adapt to new require_pic_register prototype.
>>>>>>>>>>>>        * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
>>>>>>>>>>>>        change.
>>>>>>>>>>>>        * config/arm/predicated.md (guard_operand): New predicate.
>>>>>>>>>> Typo, predicates.md is the filename.
>>>>>>>>>>
>>>>>>>>>> Looks ok to me otherwise.
>>>>>>>>>> Thank you for your patience.
>>>>>>>>>>
>>>>>>>>>> Kyrill
>>>>>>>>>>
>>>>>>>>>>>>        * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
>>>>>>>>>>>>        prototype change.
>>>>>>>>>>>>        (stack_protect_combined_set): New insn_and_split pattern.
>>>>>>>>>>>>        (stack_protect_set): New insn pattern.
>>>>>>>>>>>>        (stack_protect_combined_test): New insn_and_split pattern.
>>>>>>>>>>>>        (stack_protect_test): New insn pattern.
>>>>>>>>>>>>        * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
>>>>>>>>>>>>        (UNSPEC_SP_TEST): Likewise.
>>>>>>>>>>>>        * doc/md.texi (stack_protect_combined_set): Document new standard
>>>>>>>>>>>>        pattern name.
>>>>>>>>>>>>        (stack_protect_set): Clarify that the operand for guard's address is
>>>>>>>>>>>>        legal.
>>>>>>>>>>>>        (stack_protect_combined_test): Document new standard pattern name.
>>>>>>>>>>>>        (stack_protect_test): Clarify that the operand for guard's address is
>>>>>>>>>>>>        legal.
>>>>>>>>>>>>
>>>>>>>>>>>> *** gcc/testsuite/ChangeLog ***
>>>>>>>>>>>>
>>>>>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
>>>>>>>>>>>>
>>>>>>>>>>>>        * gcc.target/arm/pr85434.c: New test.
>>>>>>>>>>>> Testing:
>>>>>>>>>>>>
>>>>>>>>>>>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
>>>>>>>>>>>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
>>>>>>>>>>>> cross ARM Linux: build + testsuite -> no regression
>>>>>>>>>>>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
>>>>>>>>>>>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
>>>>>>>>>>>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
>>>>>>>>>>>>
>>>>>>>>>>>> Is this ok for trunk?
>>>>>>>>>>>>
>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>
>>>>>>>>>>>> Thomas
>>>
>>> @@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
>>>          mask &= THUMB2_WORK_REGS;
>>>          if (!IS_NESTED (func_type))
>>>        mask |= (1 << IP_REGNUM);
>>> -      arm_load_pic_register (mask);
>>> +      arm_load_pic_register (mask, 0);
>>>
>>>
>>>
>>> Please use NULL_RTX rather than 0 here and in the other occurrences in the patch.
>>> At a glance the changes look ok, but I'll have a deeper look later.
>>>
>>> Thanks,
>>> Kyrill

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
    rtx x, y;
  
    x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
    if (guard_decl)
      y = expand_normal (guard_decl);
    else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
  extern int use_return_insn (int, rtx);
  extern bool use_simple_return_p (void);
  extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
  extern int arm_volatile_func (void);
  extern void arm_expand_prologue (void);
  extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
  extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
  			       HOST_WIDE_INT, rtx, rtx, int);
  extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
  extern rtx legitimize_tls_address (rtx, rtx);
  extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
  extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..96b8150d34c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
    return 1;
  }
  
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
  
  static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
  {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
    /* A lot of the logic here is made obscure by the fact that this
       routine gets called as part of the rtx cost estimation process.
       We don't want those calls to affect any assumptions about the real
       function; and further, we can't call entry_of_function() until we
       start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
      {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
        if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
  	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
  	{
  	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
  	{
  	  rtx_insn *seq, *insn;
  
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
  	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
  
  	  /* Play games to avoid marking the function as needing pic
  	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
  	      start_sequence ();
  
  	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
  		emit_move_insn (cfun->machine->pic_reg,
  				gen_rtx_REG (Pmode, arm_pic_register));
  	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
  
  	      seq = get_insns ();
  	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
  	         we can't yet emit instructions directly in the final
  		 insn stream.  Queue the insns on the entry edge, they will
  		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
  	    }
  	}
      }
  }
  
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
  rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
  {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
    if (GET_CODE (orig) == SYMBOL_REF
        || GET_CODE (orig) == LABEL_REF)
      {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
  	  rtx mem;
  
  	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
  
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
  
  	  /* Make the MEM as close to a constant as possible.  */
  	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
  
        gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
  
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
        offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
  
        if (CONST_INT_P (offset))
  	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
     low register.  */
  
  void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
  {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
  
    if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
      return;
  
    gcc_assert (flag_pic);
  
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
    if (TARGET_VXWORKS_RTP)
      {
        pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
      {
        /* We need to find and carefully transform any SYMBOL and LABEL
  	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
  
        if (new_x != orig_x)
  	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
      {
        /* We need to find and carefully transform any SYMBOL and LABEL
  	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
  
        if (new_x != orig_x)
  	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
  	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
  	  : !SYMBOL_REF_LOCAL_P (addr)))
      {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
        use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
      }
  
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
        mask &= THUMB2_WORK_REGS;
        if (!IS_NESTED (func_type))
  	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, NULL_RTX);
      }
  
    /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
    /* Load the pic register before setting the frame pointer,
       so we can use r7 as a temporary work register.  */
    if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, NULL_RTX);
  
    if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
      emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..1f702f81fd1 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
        operands[1] = legitimize_pic_address (operands[1], SImode,
  					    (!can_create_pseudo_p ()
  					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
    }
    "
  )
@@ -6309,7 +6310,7 @@
    /* r3 is clobbered by set/longjmp, so we can use it as a scratch
       register.  */
    if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, NULL_RTX);
    DONE;
  }")
  
@@ -8634,6 +8635,163 @@
     (set_attr "conds" "clob")]
  )
  
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "clob,nocond")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t1,32")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  if (TARGET_32BIT)
+    {
+      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
+						  operands[3]));
+      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
+    }
+  else
+    {
+      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
+						     operands[3]));
+      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
+				      operands[2]));
+    }
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "arm_stack_protect_test_insn"
+  [(set (reg:CC_Z CC_REGNUM)
+	(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
+				  (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
+				 UNSPEC_SP_TEST)
+		      (const_int 0)))
+   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
+   (clobber (match_dup 2))]
+  "TARGET_32BIT"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8,12")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,32")]


I believe this needs to set the "conds" attribute to "set" so that the final_prescan stuff handles it correctly.

Ok with that change.

Thanks,
Kyrill

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-16 14:57                     ` [PATCH, ARM, ping3] " Thomas Preudhomme
  2018-11-21  0:32                       ` Jeff Law
  2018-11-21 16:07                       ` Kyrill Tkachov
@ 2018-11-21 17:54                       ` Segher Boessenkool
  2018-11-22 16:06                         ` Thomas Preudhomme
  2 siblings, 1 reply; 20+ messages in thread
From: Segher Boessenkool @ 2018-11-21 17:54 UTC (permalink / raw)
  To: Thomas Preudhomme
  Cc: kyrylo.tkachov, Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

On Fri, Nov 16, 2018 at 02:56:46PM +0000, Thomas Preudhomme wrote:
> In case of high register pressure in PIC mode, address of the stack
> protector's guard can be spilled on ARM targets as shown in PR85434,
> thus allowing an attacker to control what the canary would be compared
> against. ARM does lack stack_protect_set and stack_protect_test insn
> patterns, defining them does not help as the address is expanded
> regularly and the patterns only deal with the copy and test of the
> guard with the canary.
> 
> This problem does not occur for x86 targets because the PIC access and
> the test can be done in the same instruction. Aarch64 is exempt too
> because PIC access insn pattern are mov of UNSPEC which prevents it from
> the second access in the epilogue being CSEd in cse_local pass with the
> first access in the prologue.

The unspecs are not CSEd because they are *different* unspecs (UNSPEC_SP_SET
vs. UNSPEC_SP_TEST; they have different args too, different number of args
even).  Two the same unspecs can be CSEd just fine.


Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-21 16:07                       ` Kyrill Tkachov
@ 2018-11-22 14:49                         ` Thomas Preudhomme
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-22 14:49 UTC (permalink / raw)
  To: kyrylo.tkachov; +Cc: Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

[-- Attachment #1: Type: text/plain, Size: 39629 bytes --]

Thanks Kyrill. Committed the attached patch.

Best regards,

Thomas
On Wed, 21 Nov 2018 at 16:06, Kyrill Tkachov
<kyrylo.tkachov@foss.arm.com> wrote:
>
> Hi Thomas,
>
> Sorry for the delay.
>
> On 16/11/18 14:56, Thomas Preudhomme wrote:
> > Ping?
> >
> > Best regards,
> >
> > Thomas
> >
> > On Sat, 10 Nov 2018 at 15:07, Thomas Preudhomme
> > <thomas.preudhomme@linaro.org> wrote:
> >> Thanks Kyrill.
> >>
> >> Updated patch in attachment. Best regards,
> >>
> >> Thomas
> >> On Thu, 8 Nov 2018 at 15:53, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> wrote:
> >>> Hi Thomas,
> >>>
> >>> On 08/11/18 09:52, Thomas Preudhomme wrote:
> >>>> Ping?
> >>>>
> >>>> Best regards,
> >>>>
> >>>> Thomas
> >>>>
> >>>> On Thu, 1 Nov 2018 at 16:03, Thomas Preudhomme
> >>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>> Ping?
> >>>>>
> >>>>> Best regards,
> >>>>>
> >>>>> Thomas
> >>>>> On Fri, 26 Oct 2018 at 22:41, Thomas Preudhomme
> >>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> Please find updated patch to fix PR85434: spilling of stack protector
> >>>>>> guard's address on ARM. Quite a few changes have been made to the ARM
> >>>>>> part since last round of review so I think it makes more sense to
> >>>>>> review it anew. Ran bootstrap + regression testsuite + glibc build +
> >>>>>> glibc regression testsuite for Arm and Thumb-2 and bootstrap +
> >>>>>> regression testsuite for Thumb-1. GCC's regression testsuite was run
> >>>>>> in 3 configurations in all those cases:
> >>>>>>
> >>>>>> - default configuration (no RUNTESTFLAGS)
> >>>>>> - with -fstack-protector-all
> >>>>>> - with -fPIC -fstack-protector-all (to exercise both codepath in stack
> >>>>>> protector's split code)
> >>>>>>
> >>>>>> None of this show any regression beyond some new scan fail with
> >>>>>> -fstack-protector-all or -fPIC due to unexpected code sequence for the
> >>>>>> testcases concerned and some guality swing due to less optimization
> >>>>>> with new stack protector on.
> >>>>>>
> >>>>>> Patch description and ChangeLog below.
> >>>>>>
> >>>>>> In case of high register pressure in PIC mode, address of the stack
> >>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
> >>>>>> thus allowing an attacker to control what the canary would be compared
> >>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
> >>>>>> patterns, defining them does not help as the address is expanded
> >>>>>> regularly and the patterns only deal with the copy and test of the
> >>>>>> guard with the canary.
> >>>>>>
> >>>>>> This problem does not occur for x86 targets because the PIC access and
> >>>>>> the test can be done in the same instruction. Aarch64 is exempt too
> >>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
> >>>>>> the second access in the epilogue being CSEd in cse_local pass with the
> >>>>>> first access in the prologue.
> >>>>>>
> >>>>>> The approach followed here is to create new "combined" set and test
> >>>>>> standard pattern names that take the unexpanded guard and do the set or
> >>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> >>>>>> to hide the individual instructions being generated to the compiler and
> >>>>>> split the pattern into generic load, compare and branch instruction
> >>>>>> after register allocator, therefore avoiding any spilling. This is here
> >>>>>> implemented for the ARM targets. For targets not implementing these new
> >>>>>> standard pattern names, the existing stack_protect_set and
> >>>>>> stack_protect_test pattern names are used.
> >>>>>>
> >>>>>> To be able to split PIC access after register allocation, the functions
> >>>>>> had to be augmented to force a new PIC register load and to control
> >>>>>> which register it loads into. This is because sharing the PIC register
> >>>>>> between prologue and epilogue could lead to spilling due to CSE again
> >>>>>> which an attacker could use to control what the canary gets compared
> >>>>>> against.
> >>>>>>
> >>>>>> ChangeLog entries are as follows:
> >>>>>>
> >>>>>> *** gcc/ChangeLog ***
> >>>>>>
> >>>>>> 2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>
> >>>>>> * target-insns.def (stack_protect_combined_set): Define new standard
> >>>>>> pattern name.
> >>>>>> (stack_protect_combined_test): Likewise.
> >>>>>> * cfgexpand.c (stack_protect_prologue): Try new
> >>>>>> stack_protect_combined_set pattern first.
> >>>>>> * function.c (stack_protect_epilogue): Try new
> >>>>>> stack_protect_combined_test pattern first.
> >>>>>> * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>>>>> parameters to control which register to use as PIC register and force
> >>>>>> reloading PIC register respectively.  Insert in the stream of insns if
> >>>>>> possible.
> >>>>>> (legitimize_pic_address): Expose above new parameters in prototype and
> >>>>>> adapt recursive calls accordingly.  Use pic_reg if non null instead of
> >>>>>> cached one.
> >>>>>> (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> >>>>>> (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>>>>> prototype.
> >>>>>> (thumb_legitimize_address): Likewise.
> >>>>>> (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> >>>>>> (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> >>>>>> (thumb1_expand_prologue): Likewise.
> >>>>>> * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>>>>> change.
> >>>>>> (arm_load_pic_register): Likewise.
> >>>>>> * config/arm/predicated.md (guard_addr_operand): New predicate.
> >>>>>> (guard_operand): New predicate.
> >>>>>> * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>>>>> prototype change.
> >>>>>> (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> >>>>>> prototype change.
> >>>>>> (stack_protect_combined_set): New expander..
> >>>>>> (stack_protect_combined_set_insn): New insn_and_split pattern.
> >>>>>> (stack_protect_set_insn): New insn pattern.
> >>>>>> (stack_protect_combined_test): New expander.
> >>>>>> (stack_protect_combined_test_insn): New insn_and_split pattern.
> >>>>>> (arm_stack_protect_test_insn): New insn pattern.
> >>>>>> * config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
> >>>>>> * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>>>>> (UNSPEC_SP_TEST): Likewise.
> >>>>>> * doc/md.texi (stack_protect_combined_set): Document new standard
> >>>>>> pattern name.
> >>>>>> (stack_protect_set): Clarify that the operand for guard's address is
> >>>>>> legal.
> >>>>>> (stack_protect_combined_test): Document new standard pattern name.
> >>>>>> (stack_protect_test): Clarify that the operand for guard's address is
> >>>>>> legal.
> >>>>>>
> >>>>>> *** gcc/testsuite/ChangeLog ***
> >>>>>>
> >>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>
> >>>>>> * gcc.target/arm/pr85434.c: New test.
> >>>>>>
> >>>>>> Is this ok for trunk?
> >>>>>>
> >>>>>> Best regards,
> >>>>>>
> >>>>>> Thomas
> >>>>>> On Thu, 25 Oct 2018 at 15:54, Thomas Preudhomme
> >>>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>>> Good thing I did, found a missing earlyclobber in the process.
> >>>>>>> Rerunning all tests again.
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>>
> >>>>>>> Thomas
> >>>>>>> On Wed, 24 Oct 2018 at 10:13, Thomas Preudhomme
> >>>>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>>>> Please hold on for the reviews, found a small improvement that could
> >>>>>>>> be done. Am testing it right now, should have something by tonight or
> >>>>>>>> tomorrow.
> >>>>>>>>
> >>>>>>>> Best regards,
> >>>>>>>>
> >>>>>>>> Thomas
> >>>>>>>> On Tue, 23 Oct 2018 at 13:35, Thomas Preudhomme
> >>>>>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>>>>> [Removing Jeff Law since middle end code hasn't changed]
> >>>>>>>>>
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> Given how memory operand are reloaded even with an X constraint, I've
> >>>>>>>>> reworked the patch for the combined set and combined test instruction
> >>>>>>>>> ot keep the mem out of the match_operand and used an expander to
> >>>>>>>>> generate the right instruction pattern. I've also fixed some
> >>>>>>>>> longstanding issues with the patch when flag_pic is true and with
> >>>>>>>>> constraints for Thumb-1 that I hadn't noticed before due to using
> >>>>>>>>> dg-cmp-results in conjunction with test_summary which does not show
> >>>>>>>>> NA->FAIL (see [1]).
> >>>>>>>>>
> >>>>>>>>> All in all, I think the Arm code would do with a fresh review rather
> >>>>>>>>> than looking at the changes since last posted version. (unchanged)
> >>>>>>>>> ChangeLog entries are as follows:
> >>>>>>>>>
> >>>>>>>>> *** gcc/ChangeLog ***
> >>>>>>>>>
> >>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>>>>
> >>>>>>>>>       * target-insns.def (stack_protect_combined_set): Define new standard
> >>>>>>>>>       pattern name.
> >>>>>>>>>       (stack_protect_combined_test): Likewise.
> >>>>>>>>>       * cfgexpand.c (stack_protect_prologue): Try new
> >>>>>>>>>       stack_protect_combined_set pattern first.
> >>>>>>>>>       * function.c (stack_protect_epilogue): Try new
> >>>>>>>>>       stack_protect_combined_test pattern first.
> >>>>>>>>>       * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>>>>>>>>       parameters to control which register to use as PIC register and force
> >>>>>>>>>       reloading PIC register respectively.  Insert in the stream of insns if
> >>>>>>>>>       possible.
> >>>>>>>>>       (legitimize_pic_address): Expose above new parameters in prototype and
> >>>>>>>>>       adapt recursive calls accordingly.  Use pic_reg if non null instead of
> >>>>>>>>>       cached one.
> >>>>>>>>>       (arm_load_pic_register): Add pic_reg parameter and use it if non null.
> >>>>>>>>>       (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>>>>>>>>       prototype.
> >>>>>>>>>       (thumb_legitimize_address): Likewise.
> >>>>>>>>>       (arm_emit_call_insn): Adapt to require_pic_register prototype change.
> >>>>>>>>>       (arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
> >>>>>>>>>       (thumb1_expand_prologue): Likewise.
> >>>>>>>>>       * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>>>>>>>>       change.
> >>>>>>>>>       (arm_load_pic_register): Likewise.
> >>>>>>>>>       * config/arm/predicated.md (guard_addr_operand): New predicate.
> >>>>>>>>>       (guard_operand): New predicate.
> >>>>>>>>>       * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>>>>>>>>       prototype change.
> >>>>>>>>>       (builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
> >>>>>>>>>       prototype change.
> >>>>>>>>>       (stack_protect_combined_set): New expander..
> >>>>>>>>>       (stack_protect_combined_set_insn): New insn_and_split pattern.
> >>>>>>>>>       (stack_protect_set_insn): New insn pattern.
> >>>>>>>>>       (stack_protect_combined_test): New expander.
> >>>>>>>>>       (stack_protect_combined_test_insn): New insn_and_split pattern.
> >>>>>>>>>       (stack_protect_test_insn): New insn pattern.
> >>>>>>>>>       * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>>>>>>>>       (UNSPEC_SP_TEST): Likewise.
> >>>>>>>>>       * doc/md.texi (stack_protect_combined_set): Document new standard
> >>>>>>>>>       pattern name.
> >>>>>>>>>       (stack_protect_set): Clarify that the operand for guard's address is
> >>>>>>>>>       legal.
> >>>>>>>>>       (stack_protect_combined_test): Document new standard pattern name.
> >>>>>>>>>       (stack_protect_test): Clarify that the operand for guard's address is
> >>>>>>>>>       legal.
> >>>>>>>>>
> >>>>>>>>> *** gcc/testsuite/ChangeLog ***
> >>>>>>>>>
> >>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>>>>
> >>>>>>>>>       * gcc.target/arm/pr85434.c: New test.
> >>>>>>>>>
> >>>>>>>>> Testing: Bootstrap and regression testing for Arm, Thumb-1 and Thumb-2
> >>>>>>>>> with (i) default flags, (ii) an extra -fstack-protect-all and (iii)
> >>>>>>>>> -fPIC -fstack-protect-all. A glibc build and testsuite run was also
> >>>>>>>>> performed for Arm and Thumb-2. Default flags show no regression and
> >>>>>>>>> the other runs have some expected scan-assembler failing (due to stack
> >>>>>>>>> protector or fPIC code sequence), as well as guality fail (due to less
> >>>>>>>>> optimized code with the new stack protector code) and some execution
> >>>>>>>>> failures in sibcall-9 and sibcall-10 under -fPIC -fstack-protector-all
> >>>>>>>>> due to the PIC sequence for the global variable making the frame
> >>>>>>>>> layout different for the 2 functions (these become PASS if making the
> >>>>>>>>> global variable static).
> >>>>>>>>>
> >>>>>>>>> Is this ok for trunk?
> >>>>>>>>>
> >>>>>>>>> Best regards,
> >>>>>>>>>
> >>>>>>>>> Thomas
> >>>>>>>>>
> >>>>>>>>> [1] https://gcc.gnu.org/ml/gcc-patches/2018-10/msg01412.html
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, 25 Sep 2018 at 17:10, Kyrill Tkachov
> >>>>>>>>> <kyrylo.tkachov@foss.arm.com> wrote:
> >>>>>>>>>> Hi Thomas,
> >>>>>>>>>>
> >>>>>>>>>> On 29/08/18 10:51, Thomas Preudhomme wrote:
> >>>>>>>>>>> Resend hopefully without HTML this time.
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, 29 Aug 2018 at 10:49, Thomas Preudhomme
> >>>>>>>>>>> <thomas.preudhomme@linaro.org> wrote:
> >>>>>>>>>>>> Hi,
> >>>>>>>>>>>>
> >>>>>>>>>>>> I've reworked the patch fixing PR85434 (spilling of stack protector guard's address on ARM) to address the testsuite regression on powerpc and x86 as well as glibc testsuite regression on ARM. Issues were due to unconditionally attempting to generate the new patterns. The code now tests if there is a pattern for them for the target before generating them. In the ARM side of the patch, I've also added a more specific predicate for the new patterns. The new patch is found below.
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> In case of high register pressure in PIC mode, address of the stack
> >>>>>>>>>>>> protector's guard can be spilled on ARM targets as shown in PR85434,
> >>>>>>>>>>>> thus allowing an attacker to control what the canary would be compared
> >>>>>>>>>>>> against. ARM does lack stack_protect_set and stack_protect_test insn
> >>>>>>>>>>>> patterns, defining them does not help as the address is expanded
> >>>>>>>>>>>> regularly and the patterns only deal with the copy and test of the
> >>>>>>>>>>>> guard with the canary.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This problem does not occur for x86 targets because the PIC access and
> >>>>>>>>>>>> the test can be done in the same instruction. Aarch64 is exempt too
> >>>>>>>>>>>> because PIC access insn pattern are mov of UNSPEC which prevents it from
> >>>>>>>>>>>> the second access in the epilogue being CSEd in cse_local pass with the
> >>>>>>>>>>>> first access in the prologue.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The approach followed here is to create new "combined" set and test
> >>>>>>>>>>>> standard pattern names that take the unexpanded guard and do the set or
> >>>>>>>>>>>> test. This allows the target to use an opaque pattern (eg. using UNSPEC)
> >>>>>>>>>>>> to hide the individual instructions being generated to the compiler and
> >>>>>>>>>>>> split the pattern into generic load, compare and branch instruction
> >>>>>>>>>>>> after register allocator, therefore avoiding any spilling. This is here
> >>>>>>>>>>>> implemented for the ARM targets. For targets not implementing these new
> >>>>>>>>>>>> standard pattern names, the existing stack_protect_set and
> >>>>>>>>>>>> stack_protect_test pattern names are used.
> >>>>>>>>>>>>
> >>>>>>>>>>>> To be able to split PIC access after register allocation, the functions
> >>>>>>>>>>>> had to be augmented to force a new PIC register load and to control
> >>>>>>>>>>>> which register it loads into. This is because sharing the PIC register
> >>>>>>>>>>>> between prologue and epilogue could lead to spilling due to CSE again
> >>>>>>>>>>>> which an attacker could use to control what the canary gets compared
> >>>>>>>>>>>> against.
> >>>>>>>>>>>>
> >>>>>>>>>>>> ChangeLog entries are as follows:
> >>>>>>>>>>>>
> >>>>>>>>>>>> *** gcc/ChangeLog ***
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2018-08-09  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>>>>>>>
> >>>>>>>>>>>>        * target-insns.def (stack_protect_combined_set): Define new standard
> >>>>>>>>>>>>        pattern name.
> >>>>>>>>>>>>        (stack_protect_combined_test): Likewise.
> >>>>>>>>>>>>        * cfgexpand.c (stack_protect_prologue): Try new
> >>>>>>>>>>>>        stack_protect_combined_set pattern first.
> >>>>>>>>>>>>        * function.c (stack_protect_epilogue): Try new
> >>>>>>>>>>>>        stack_protect_combined_test pattern first.
> >>>>>>>>>>>>        * config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
> >>>>>>>>>>>>        parameters to control which register to use as PIC register and force
> >>>>>>>>>>>>        reloading PIC register respectively.  Insert in the stream of insns if
> >>>>>>>>>>>>        possible.
> >>>>>>>>>>>>        (legitimize_pic_address): Expose above new parameters in prototype and
> >>>>>>>>>>>>        adapt recursive calls accordingly.
> >>>>>>>>>>>>        (arm_legitimize_address): Adapt to new legitimize_pic_address
> >>>>>>>>>>>>        prototype.
> >>>>>>>>>>>>        (thumb_legitimize_address): Likewise.
> >>>>>>>>>>>>        (arm_emit_call_insn): Adapt to new require_pic_register prototype.
> >>>>>>>>>>>>        * config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
> >>>>>>>>>>>>        change.
> >>>>>>>>>>>>        * config/arm/predicated.md (guard_operand): New predicate.
> >>>>>>>>>> Typo, predicates.md is the filename.
> >>>>>>>>>>
> >>>>>>>>>> Looks ok to me otherwise.
> >>>>>>>>>> Thank you for your patience.
> >>>>>>>>>>
> >>>>>>>>>> Kyrill
> >>>>>>>>>>
> >>>>>>>>>>>>        * config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
> >>>>>>>>>>>>        prototype change.
> >>>>>>>>>>>>        (stack_protect_combined_set): New insn_and_split pattern.
> >>>>>>>>>>>>        (stack_protect_set): New insn pattern.
> >>>>>>>>>>>>        (stack_protect_combined_test): New insn_and_split pattern.
> >>>>>>>>>>>>        (stack_protect_test): New insn pattern.
> >>>>>>>>>>>>        * config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
> >>>>>>>>>>>>        (UNSPEC_SP_TEST): Likewise.
> >>>>>>>>>>>>        * doc/md.texi (stack_protect_combined_set): Document new standard
> >>>>>>>>>>>>        pattern name.
> >>>>>>>>>>>>        (stack_protect_set): Clarify that the operand for guard's address is
> >>>>>>>>>>>>        legal.
> >>>>>>>>>>>>        (stack_protect_combined_test): Document new standard pattern name.
> >>>>>>>>>>>>        (stack_protect_test): Clarify that the operand for guard's address is
> >>>>>>>>>>>>        legal.
> >>>>>>>>>>>>
> >>>>>>>>>>>> *** gcc/testsuite/ChangeLog ***
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>
> >>>>>>>>>>>>
> >>>>>>>>>>>>        * gcc.target/arm/pr85434.c: New test.
> >>>>>>>>>>>> Testing:
> >>>>>>>>>>>>
> >>>>>>>>>>>> native x86_64: bootstrap + testsuite -> no regression, can see failures with previous version of patch but not with new version
> >>>>>>>>>>>> native powerpc64: bootstrap + testsuite -> no regression, can see failures from pr86834 with previous version of patch but not with new version
> >>>>>>>>>>>> cross ARM Linux: build + testsuite -> no regression
> >>>>>>>>>>>> native ARM Thumb-2: bootstrap + testsuite + glibc build + glibc test -> no regression
> >>>>>>>>>>>> native ARM Arm: bootstrap + testsuite + glibc build + glibc test -> no regression
> >>>>>>>>>>>> Aarch64: bootstrap + testsuite + glibc build + glibc test-> no regression
> >>>>>>>>>>>>
> >>>>>>>>>>>> Is this ok for trunk?
> >>>>>>>>>>>>
> >>>>>>>>>>>> Best regards,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thomas
> >>>
> >>> @@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
> >>>          mask &= THUMB2_WORK_REGS;
> >>>          if (!IS_NESTED (func_type))
> >>>        mask |= (1 << IP_REGNUM);
> >>> -      arm_load_pic_register (mask);
> >>> +      arm_load_pic_register (mask, 0);
> >>>
> >>>
> >>>
> >>> Please use NULL_RTX rather than 0 here and in the other occurrences in the patch.
> >>> At a glance the changes look ok, but I'll have a deeper look later.
> >>>
> >>> Thanks,
> >>> Kyrill
>
> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
> index 35ca276e4ad..c8d0374f8ae 100644
> --- a/gcc/cfgexpand.c
> +++ b/gcc/cfgexpand.c
> @@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
>     rtx x, y;
>
>     x = expand_normal (crtl->stack_protect_guard);
> +
> +  if (targetm.have_stack_protect_combined_set () && guard_decl)
> +    {
> +      gcc_assert (DECL_P (guard_decl));
> +      y = DECL_RTL (guard_decl);
> +
> +      /* Allow the target to compute address of Y and copy it to X without
> +        leaking Y into a register.  This combined address + copy pattern
> +        allows the target to prevent spilling of any intermediate results by
> +        splitting it after register allocator.  */
> +      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
> +       {
> +         emit_insn (insn);
> +         return;
> +       }
> +    }
> +
>     if (guard_decl)
>       y = expand_normal (guard_decl);
>     else
> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
> index 0dfb3ac59a6..f508bc5a455 100644
> --- a/gcc/config/arm/arm-protos.h
> +++ b/gcc/config/arm/arm-protos.h
> @@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
>   extern int use_return_insn (int, rtx);
>   extern bool use_simple_return_p (void);
>   extern enum reg_class arm_regno_class (int);
> -extern void arm_load_pic_register (unsigned long);
> +extern void arm_load_pic_register (unsigned long, rtx);
>   extern int arm_volatile_func (void);
>   extern void arm_expand_prologue (void);
>   extern void arm_expand_epilogue (bool);
> @@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
>   extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
>                                HOST_WIDE_INT, rtx, rtx, int);
>   extern int legitimate_pic_operand_p (rtx);
> -extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
> +extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
>   extern rtx legitimize_tls_address (rtx, rtx);
>   extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
>   extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index 8810df53aa3..96b8150d34c 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
>     return 1;
>   }
>
> -/* Record that the current function needs a PIC register.  Initialize
> -   cfun->machine->pic_reg if we have not already done so.  */
> +/* Record that the current function needs a PIC register.  If PIC_REG is null,
> +   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
> +   both case cfun->machine->pic_reg is initialized if we have not already done
> +   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
> +   PIC register is reloaded in the current position of the instruction stream
> +   irregardless of whether it was loaded before.  Otherwise, it is only loaded
> +   if not already done so (crtl->uses_pic_offset_table is null).  Note that
> +   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
> +   is only supported iff COMPUTE_NOW is false.  */
>
>   static void
> -require_pic_register (void)
> +require_pic_register (rtx pic_reg, bool compute_now)
>   {
> +  gcc_assert (compute_now == (pic_reg != NULL_RTX));
> +
>     /* A lot of the logic here is made obscure by the fact that this
>        routine gets called as part of the rtx cost estimation process.
>        We don't want those calls to affect any assumptions about the real
>        function; and further, we can't call entry_of_function() until we
>        start the real expansion process.  */
> -  if (!crtl->uses_pic_offset_table)
> +  if (!crtl->uses_pic_offset_table || compute_now)
>       {
> -      gcc_assert (can_create_pseudo_p ());
> +      gcc_assert (can_create_pseudo_p ()
> +                 || (pic_reg != NULL_RTX
> +                     && REG_P (pic_reg)
> +                     && GET_MODE (pic_reg) == Pmode));
>         if (arm_pic_register != INVALID_REGNUM
> +         && !compute_now
>           && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
>         {
>           if (!cfun->machine->pic_reg)
> @@ -7401,8 +7414,10 @@ require_pic_register (void)
>         {
>           rtx_insn *seq, *insn;
>
> +         if (pic_reg == NULL_RTX)
> +           pic_reg = gen_reg_rtx (Pmode);
>           if (!cfun->machine->pic_reg)
> -           cfun->machine->pic_reg = gen_reg_rtx (Pmode);
> +           cfun->machine->pic_reg = pic_reg;
>
>           /* Play games to avoid marking the function as needing pic
>              if we are being called as part of the cost-estimation
> @@ -7413,11 +7428,12 @@ require_pic_register (void)
>               start_sequence ();
>
>               if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
> -                 && arm_pic_register > LAST_LO_REGNUM)
> +                 && arm_pic_register > LAST_LO_REGNUM
> +                 && !compute_now)
>                 emit_move_insn (cfun->machine->pic_reg,
>                                 gen_rtx_REG (Pmode, arm_pic_register));
>               else
> -               arm_load_pic_register (0UL);
> +               arm_load_pic_register (0UL, pic_reg);
>
>               seq = get_insns ();
>               end_sequence ();
> @@ -7430,16 +7446,33 @@ require_pic_register (void)
>                  we can't yet emit instructions directly in the final
>                  insn stream.  Queue the insns on the entry edge, they will
>                  be committed after everything else is expanded.  */
> -             insert_insn_on_edge (seq,
> -                                  single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +             if (currently_expanding_to_rtl)
> +               insert_insn_on_edge (seq,
> +                                    single_succ_edge
> +                                    (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
> +             else
> +               emit_insn (seq);
>             }
>         }
>       }
>   }
>
> +/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
> +   created to hold the result of the load.  If not NULL, PIC_REG indicates
> +   which register to use as PIC register, otherwise it is decided by register
> +   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
> +   location in the instruction stream, irregardless of whether it was loaded
> +   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
> +   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
> +
> +   Returns the register REG into which the PIC load is performed.  */
> +
>   rtx
> -legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
> +legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
> +                       bool compute_now)
>   {
> +  gcc_assert (compute_now == (pic_reg != NULL_RTX));
> +
>     if (GET_CODE (orig) == SYMBOL_REF
>         || GET_CODE (orig) == LABEL_REF)
>       {
> @@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
>           rtx mem;
>
>           /* If this function doesn't have a pic register, create one now.  */
> -         require_pic_register ();
> +         require_pic_register (pic_reg, compute_now);
> +
> +         if (pic_reg == NULL_RTX)
> +           pic_reg = cfun->machine->pic_reg;
>
> -         pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
> +         pat = gen_calculate_pic_address (reg, pic_reg, orig);
>
>           /* Make the MEM as close to a constant as possible.  */
>           mem = SET_SRC (pat);
> @@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
>
>         gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
>
> -      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
> +      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
> +                                    pic_reg, compute_now);
>         offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
> -                                      base == reg ? 0 : reg);
> +                                      base == reg ? 0 : reg, pic_reg,
> +                                      compute_now);
>
>         if (CONST_INT_P (offset))
>         {
> @@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
>      low register.  */
>
>   void
> -arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
> +arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
>   {
> -  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
> +  rtx l1, labelno, pic_tmp, pic_rtx;
>
>     if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
>       return;
>
>     gcc_assert (flag_pic);
>
> -  pic_reg = cfun->machine->pic_reg;
> +  if (pic_reg == NULL_RTX)
> +    pic_reg = cfun->machine->pic_reg;
>     if (TARGET_VXWORKS_RTP)
>       {
>         pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
> @@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
>       {
>         /* We need to find and carefully transform any SYMBOL and LABEL
>          references; so go back to the original address expression.  */
> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
> +                                         false /*compute_now*/);
>
>         if (new_x != orig_x)
>         x = new_x;
> @@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
>       {
>         /* We need to find and carefully transform any SYMBOL and LABEL
>          references; so go back to the original address expression.  */
> -      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
> +      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
> +                                         false /*compute_now*/);
>
>         if (new_x != orig_x)
>         x = new_x;
> @@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
>           ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
>           : !SYMBOL_REF_LOCAL_P (addr)))
>       {
> -      require_pic_register ();
> +      require_pic_register (NULL_RTX, false /*compute_now*/);
>         use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
>       }
>
> @@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
>         mask &= THUMB2_WORK_REGS;
>         if (!IS_NESTED (func_type))
>         mask |= (1 << IP_REGNUM);
> -      arm_load_pic_register (mask);
> +      arm_load_pic_register (mask, NULL_RTX);
>       }
>
>     /* If we are profiling, make sure no instructions are scheduled before
> @@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
>     /* Load the pic register before setting the frame pointer,
>        so we can use r7 as a temporary work register.  */
>     if (flag_pic && arm_pic_register != INVALID_REGNUM)
> -    arm_load_pic_register (live_regs_mask);
> +    arm_load_pic_register (live_regs_mask, NULL_RTX);
>
>     if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
>       emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
> index 270b8e454b3..1f702f81fd1 100644
> --- a/gcc/config/arm/arm.md
> +++ b/gcc/config/arm/arm.md
> @@ -6021,7 +6021,8 @@
>         operands[1] = legitimize_pic_address (operands[1], SImode,
>                                             (!can_create_pseudo_p ()
>                                              ? operands[0]
> -                                            : 0));
> +                                            : NULL_RTX), NULL_RTX,
> +                                           false /*compute_now*/);
>     }
>     "
>   )
> @@ -6309,7 +6310,7 @@
>     /* r3 is clobbered by set/longjmp, so we can use it as a scratch
>        register.  */
>     if (arm_pic_register != INVALID_REGNUM)
> -    arm_load_pic_register (1UL << 3);
> +    arm_load_pic_register (1UL << 3, NULL_RTX);
>     DONE;
>   }")
>
> @@ -8634,6 +8635,163 @@
>      (set_attr "conds" "clob")]
>   )
>
> +;; Named patterns for stack smashing protection.
> +(define_expand "stack_protect_combined_set"
> +  [(parallel
> +     [(set (match_operand:SI 0 "memory_operand" "")
> +          (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
> +                     UNSPEC_SP_SET))
> +      (clobber (match_scratch:SI 2 ""))
> +      (clobber (match_scratch:SI 3 ""))])]
> +  ""
> +  ""
> +)
> +
> +;; Use a separate insn from the above expand to be able to have the mem outside
> +;; the operand #1 when register allocation comes. This is needed to avoid LRA
> +;; try to reload the guard since we need to control how PIC access is done in
> +;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
> +;; legitimize_pic_address ()).
> +(define_insn_and_split "*stack_protect_combined_set_insn"
> +  [(set (match_operand:SI 0 "memory_operand" "=m,m")
> +       (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
> +                  UNSPEC_SP_SET))
> +   (clobber (match_scratch:SI 2 "=&l,&r"))
> +   (clobber (match_scratch:SI 3 "=&l,&r"))]
> +  ""
> +  "#"
> +  "reload_completed"
> +  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
> +                                           UNSPEC_SP_SET))
> +             (clobber (match_dup 2))])]
> +  "
> +{
> +  if (flag_pic)
> +    {
> +      /* Forces recomputing of GOT base now.  */
> +      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
> +                             true /*compute_now*/);
> +    }
> +  else
> +    {
> +      if (address_operand (operands[1], SImode))
> +       operands[2] = operands[1];
> +      else
> +       {
> +         rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
> +         emit_move_insn (operands[2], mem);
> +       }
> +    }
> +}"
> +  [(set_attr "arch" "t1,32")]
> +)
> +
> +(define_insn "*stack_protect_set_insn"
> +  [(set (match_operand:SI 0 "memory_operand" "=m,m")
> +       (unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
> +        UNSPEC_SP_SET))
> +   (clobber (match_dup 1))]
> +  ""
> +  "@
> +   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
> +   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
> +  [(set_attr "length" "8,12")
> +   (set_attr "conds" "clob,nocond")
> +   (set_attr "type" "multiple")
> +   (set_attr "arch" "t1,32")]
> +)
> +
> +(define_expand "stack_protect_combined_test"
> +  [(parallel
> +     [(set (pc)
> +          (if_then_else
> +               (eq (match_operand:SI 0 "memory_operand" "")
> +                   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
> +                              UNSPEC_SP_TEST))
> +               (label_ref (match_operand 2))
> +               (pc)))
> +      (clobber (match_scratch:SI 3 ""))
> +      (clobber (match_scratch:SI 4 ""))
> +      (clobber (reg:CC CC_REGNUM))])]
> +  ""
> +  ""
> +)
> +
> +;; Use a separate insn from the above expand to be able to have the mem outside
> +;; the operand #1 when register allocation comes. This is needed to avoid LRA
> +;; try to reload the guard since we need to control how PIC access is done in
> +;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
> +;; legitimize_pic_address ()).
> +(define_insn_and_split "*stack_protect_combined_test_insn"
> +  [(set (pc)
> +       (if_then_else
> +               (eq (match_operand:SI 0 "memory_operand" "m,m")
> +                   (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
> +                              UNSPEC_SP_TEST))
> +               (label_ref (match_operand 2))
> +               (pc)))
> +   (clobber (match_scratch:SI 3 "=&l,&r"))
> +   (clobber (match_scratch:SI 4 "=&l,&r"))
> +   (clobber (reg:CC CC_REGNUM))]
> +  ""
> +  "#"
> +  "reload_completed"
> +  [(const_int 0)]
> +{
> +  rtx eq;
> +
> +  if (flag_pic)
> +    {
> +      /* Forces recomputing of GOT base now.  */
> +      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
> +                             true /*compute_now*/);
> +    }
> +  else
> +    {
> +      if (address_operand (operands[1], SImode))
> +       operands[3] = operands[1];
> +      else
> +       {
> +         rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
> +         emit_move_insn (operands[3], mem);
> +       }
> +    }
> +  if (TARGET_32BIT)
> +    {
> +      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
> +                                                 operands[3]));
> +      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
> +      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
> +      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
> +    }
> +  else
> +    {
> +      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
> +                                                    operands[3]));
> +      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
> +      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
> +                                     operands[2]));
> +    }
> +  DONE;
> +}
> +  [(set_attr "arch" "t1,32")]
> +)
> +
> +(define_insn "arm_stack_protect_test_insn"
> +  [(set (reg:CC_Z CC_REGNUM)
> +       (compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
> +                                 (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
> +                                UNSPEC_SP_TEST)
> +                     (const_int 0)))
> +   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
> +   (clobber (match_dup 2))]
> +  "TARGET_32BIT"
> +  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
> +  [(set_attr "length" "8,12")
> +   (set_attr "type" "multiple")
> +   (set_attr "arch" "t,32")]
>
>
> I believe this needs to set the "conds" attribute to "set" so that the final_prescan stuff handles it correctly.
>
> Ok with that change.
>
> Thanks,
> Kyrill
>

[-- Attachment #2: fix_pr85434_prevent_spilling_stack_protector_guard_address.patch --]
[-- Type: text/x-patch, Size: 35728 bytes --]

From a098336953823160e976679a9dd6bec2ba2592bd Mon Sep 17 00:00:00 2001
From: Thomas Preud'homme <thomas.preudhomme@linaro.org>
Date: Tue, 8 May 2018 15:47:05 +0100
Subject: [PATCH] PR85434: Prevent spilling of stack protector guard's address
 on ARM

In case of high register pressure in PIC mode, address of the stack
protector's guard can be spilled on ARM targets as shown in PR85434,
thus allowing an attacker to control what the canary would be compared
against. ARM does lack stack_protect_set and stack_protect_test insn
patterns, defining them does not help as the address is expanded
regularly and the patterns only deal with the copy and test of the
guard with the canary.

This problem does not occur for x86 targets because the PIC access and
the test can be done in the same instruction. Aarch64 is exempt too
because PIC access insn pattern are mov of UNSPEC which prevents it from
the second access in the epilogue being CSEd in cse_local pass with the
first access in the prologue.

The approach followed here is to create new "combined" set and test
standard pattern names that take the unexpanded guard and do the set or
test. This allows the target to use an opaque pattern (eg. using UNSPEC)
to hide the individual instructions being generated to the compiler and
split the pattern into generic load, compare and branch instruction
after register allocator, therefore avoiding any spilling. This is here
implemented for the ARM targets. For targets not implementing these new
standard pattern names, the existing stack_protect_set and
stack_protect_test pattern names are used.

To be able to split PIC access after register allocation, the functions
had to be augmented to force a new PIC register load and to control
which register it loads into. This is because sharing the PIC register
between prologue and epilogue could lead to spilling due to CSE again
which an attacker could use to control what the canary gets compared
against.

ChangeLog entries are as follows:

*** gcc/ChangeLog ***

2018-10-26  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* target-insns.def (stack_protect_combined_set): Define new standard
	pattern name.
	(stack_protect_combined_test): Likewise.
	* cfgexpand.c (stack_protect_prologue): Try new
	stack_protect_combined_set pattern first.
	* function.c (stack_protect_epilogue): Try new
	stack_protect_combined_test pattern first.
	* config/arm/arm.c (require_pic_register): Add pic_reg and compute_now
	parameters to control which register to use as PIC register and force
	reloading PIC register respectively.  Insert in the stream of insns if
	possible.
	(legitimize_pic_address): Expose above new parameters in prototype and
	adapt recursive calls accordingly.  Use pic_reg if non null instead of
	cached one.
	(arm_load_pic_register): Add pic_reg parameter and use it if non null.
	(arm_legitimize_address): Adapt to new legitimize_pic_address
	prototype.
	(thumb_legitimize_address): Likewise.
	(arm_emit_call_insn): Adapt to require_pic_register prototype change.
	(arm_expand_prologue): Adapt to arm_load_pic_register prototype change.
	(thumb1_expand_prologue): Likewise.
	* config/arm/arm-protos.h (legitimize_pic_address): Adapt to prototype
	change.
	(arm_load_pic_register): Likewise.
	* config/arm/predicated.md (guard_addr_operand): New predicate.
	(guard_operand): New predicate.
	* config/arm/arm.md (movsi expander): Adapt to legitimize_pic_address
	prototype change.
	(builtin_setjmp_receiver expander): Adapt to thumb1_expand_prologue
	prototype change.
	(stack_protect_combined_set): New expander..
	(stack_protect_combined_set_insn): New insn_and_split pattern.
	(stack_protect_set_insn): New insn pattern.
	(stack_protect_combined_test): New expander.
	(stack_protect_combined_test_insn): New insn_and_split pattern.
	(arm_stack_protect_test_insn): New insn pattern.
	* config/arm/thumb1.md (thumb1_stack_protect_test_insn): New insn pattern.
	* config/arm/unspecs.md (UNSPEC_SP_SET): New unspec.
	(UNSPEC_SP_TEST): Likewise.
	* doc/md.texi (stack_protect_combined_set): Document new standard
	pattern name.
	(stack_protect_set): Clarify that the operand for guard's address is
	legal.
	(stack_protect_combined_test): Document new standard pattern name.
	(stack_protect_test): Clarify that the operand for guard's address is
	legal.

*** gcc/testsuite/ChangeLog ***

2018-07-05  Thomas Preud'homme  <thomas.preudhomme@linaro.org>

	* gcc.target/arm/pr85434.c: New test.

Testing: Bootstrapped on ARM in both Arm and Thumb-2 mode as well as on
Aarch64. Testsuite shows no regression on these 3 variants either both
with default flags and with -fstack-protector-all.

Is this ok for trunk? If yes, would this be acceptable as a backport to
GCC 6, 7 and 8 provided that no regression is found?

Best regards,

Thomas

Change-Id: I993343e3063fb570af706624e08b475732a5ec57
---
 gcc/cfgexpand.c                        |  17 +++
 gcc/config/arm/arm-protos.h            |   4 +-
 gcc/config/arm/arm.c                   |  87 ++++++++---
 gcc/config/arm/arm.md                  | 163 +++++++++++++++++++-
 gcc/config/arm/predicates.md           |  17 +++
 gcc/config/arm/thumb1.md               |  13 ++
 gcc/config/arm/unspecs.md              |   3 +
 gcc/doc/md.texi                        |  55 ++++++-
 gcc/function.c                         |  32 +++-
 gcc/target-insns.def                   |   2 +
 gcc/testsuite/gcc.target/arm/pr85434.c | 200 +++++++++++++++++++++++++
 11 files changed, 550 insertions(+), 43 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/arm/pr85434.c

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 35ca276e4ad..c8d0374f8ae 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6131,6 +6131,23 @@ stack_protect_prologue (void)
   rtx x, y;
 
   x = expand_normal (crtl->stack_protect_guard);
+
+  if (targetm.have_stack_protect_combined_set () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+
+      /* Allow the target to compute address of Y and copy it to X without
+	 leaking Y into a register.  This combined address + copy pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      if (rtx_insn *insn = targetm.gen_stack_protect_combined_set (x, y))
+	{
+	  emit_insn (insn);
+	  return;
+	}
+    }
+
   if (guard_decl)
     y = expand_normal (guard_decl);
   else
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 0dfb3ac59a6..f508bc5a455 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -28,7 +28,7 @@ extern enum unwind_info_type arm_except_unwind_info (struct gcc_options *);
 extern int use_return_insn (int, rtx);
 extern bool use_simple_return_p (void);
 extern enum reg_class arm_regno_class (int);
-extern void arm_load_pic_register (unsigned long);
+extern void arm_load_pic_register (unsigned long, rtx);
 extern int arm_volatile_func (void);
 extern void arm_expand_prologue (void);
 extern void arm_expand_epilogue (bool);
@@ -69,7 +69,7 @@ extern int const_ok_for_dimode_op (HOST_WIDE_INT, enum rtx_code);
 extern int arm_split_constant (RTX_CODE, machine_mode, rtx,
 			       HOST_WIDE_INT, rtx, rtx, int);
 extern int legitimate_pic_operand_p (rtx);
-extern rtx legitimize_pic_address (rtx, machine_mode, rtx);
+extern rtx legitimize_pic_address (rtx, machine_mode, rtx, rtx, bool);
 extern rtx legitimize_tls_address (rtx, rtx);
 extern bool arm_legitimate_address_p (machine_mode, rtx, bool);
 extern int arm_legitimate_address_outer_p (machine_mode, rtx, RTX_CODE, int);
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 8810df53aa3..96b8150d34c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7371,21 +7371,34 @@ legitimate_pic_operand_p (rtx x)
   return 1;
 }
 
-/* Record that the current function needs a PIC register.  Initialize
-   cfun->machine->pic_reg if we have not already done so.  */
+/* Record that the current function needs a PIC register.  If PIC_REG is null,
+   a new pseudo is allocated as PIC register, otherwise PIC_REG is used.  In
+   both case cfun->machine->pic_reg is initialized if we have not already done
+   so.  COMPUTE_NOW decide whether and where to set the PIC register.  If true,
+   PIC register is reloaded in the current position of the instruction stream
+   irregardless of whether it was loaded before.  Otherwise, it is only loaded
+   if not already done so (crtl->uses_pic_offset_table is null).  Note that
+   nonnull PIC_REG is only supported iff COMPUTE_NOW is true and null PIC_REG
+   is only supported iff COMPUTE_NOW is false.  */
 
 static void
-require_pic_register (void)
+require_pic_register (rtx pic_reg, bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   /* A lot of the logic here is made obscure by the fact that this
      routine gets called as part of the rtx cost estimation process.
      We don't want those calls to affect any assumptions about the real
      function; and further, we can't call entry_of_function() until we
      start the real expansion process.  */
-  if (!crtl->uses_pic_offset_table)
+  if (!crtl->uses_pic_offset_table || compute_now)
     {
-      gcc_assert (can_create_pseudo_p ());
+      gcc_assert (can_create_pseudo_p ()
+		  || (pic_reg != NULL_RTX
+		      && REG_P (pic_reg)
+		      && GET_MODE (pic_reg) == Pmode));
       if (arm_pic_register != INVALID_REGNUM
+	  && !compute_now
 	  && !(TARGET_THUMB1 && arm_pic_register > LAST_LO_REGNUM))
 	{
 	  if (!cfun->machine->pic_reg)
@@ -7401,8 +7414,10 @@ require_pic_register (void)
 	{
 	  rtx_insn *seq, *insn;
 
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = gen_reg_rtx (Pmode);
 	  if (!cfun->machine->pic_reg)
-	    cfun->machine->pic_reg = gen_reg_rtx (Pmode);
+	    cfun->machine->pic_reg = pic_reg;
 
 	  /* Play games to avoid marking the function as needing pic
 	     if we are being called as part of the cost-estimation
@@ -7413,11 +7428,12 @@ require_pic_register (void)
 	      start_sequence ();
 
 	      if (TARGET_THUMB1 && arm_pic_register != INVALID_REGNUM
-		  && arm_pic_register > LAST_LO_REGNUM)
+		  && arm_pic_register > LAST_LO_REGNUM
+		  && !compute_now)
 		emit_move_insn (cfun->machine->pic_reg,
 				gen_rtx_REG (Pmode, arm_pic_register));
 	      else
-		arm_load_pic_register (0UL);
+		arm_load_pic_register (0UL, pic_reg);
 
 	      seq = get_insns ();
 	      end_sequence ();
@@ -7430,16 +7446,33 @@ require_pic_register (void)
 	         we can't yet emit instructions directly in the final
 		 insn stream.  Queue the insns on the entry edge, they will
 		 be committed after everything else is expanded.  */
-	      insert_insn_on_edge (seq,
-				   single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      if (currently_expanding_to_rtl)
+		insert_insn_on_edge (seq,
+				     single_succ_edge
+				     (ENTRY_BLOCK_PTR_FOR_FN (cfun)));
+	      else
+		emit_insn (seq);
 	    }
 	}
     }
 }
 
+/* Legitimize PIC load to ORIG into REG.  If REG is NULL, a new pseudo is
+   created to hold the result of the load.  If not NULL, PIC_REG indicates
+   which register to use as PIC register, otherwise it is decided by register
+   allocator.  COMPUTE_NOW forces the PIC register to be loaded at the current
+   location in the instruction stream, irregardless of whether it was loaded
+   previously.  Note that nonnull PIC_REG is only supported iff COMPUTE_NOW is
+   true and null PIC_REG is only supported iff COMPUTE_NOW is false.
+
+   Returns the register REG into which the PIC load is performed.  */
+
 rtx
-legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
+legitimize_pic_address (rtx orig, machine_mode mode, rtx reg, rtx pic_reg,
+			bool compute_now)
 {
+  gcc_assert (compute_now == (pic_reg != NULL_RTX));
+
   if (GET_CODE (orig) == SYMBOL_REF
       || GET_CODE (orig) == LABEL_REF)
     {
@@ -7472,9 +7505,12 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 	  rtx mem;
 
 	  /* If this function doesn't have a pic register, create one now.  */
-	  require_pic_register ();
+	  require_pic_register (pic_reg, compute_now);
+
+	  if (pic_reg == NULL_RTX)
+	    pic_reg = cfun->machine->pic_reg;
 
-	  pat = gen_calculate_pic_address (reg, cfun->machine->pic_reg, orig);
+	  pat = gen_calculate_pic_address (reg, pic_reg, orig);
 
 	  /* Make the MEM as close to a constant as possible.  */
 	  mem = SET_SRC (pat);
@@ -7523,9 +7559,11 @@ legitimize_pic_address (rtx orig, machine_mode mode, rtx reg)
 
       gcc_assert (GET_CODE (XEXP (orig, 0)) == PLUS);
 
-      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg);
+      base = legitimize_pic_address (XEXP (XEXP (orig, 0), 0), Pmode, reg,
+				     pic_reg, compute_now);
       offset = legitimize_pic_address (XEXP (XEXP (orig, 0), 1), Pmode,
-				       base == reg ? 0 : reg);
+				       base == reg ? 0 : reg, pic_reg,
+				       compute_now);
 
       if (CONST_INT_P (offset))
 	{
@@ -7625,16 +7663,17 @@ static GTY(()) int pic_labelno;
    low register.  */
 
 void
-arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED)
+arm_load_pic_register (unsigned long saved_regs ATTRIBUTE_UNUSED, rtx pic_reg)
 {
-  rtx l1, labelno, pic_tmp, pic_rtx, pic_reg;
+  rtx l1, labelno, pic_tmp, pic_rtx;
 
   if (crtl->uses_pic_offset_table == 0 || TARGET_SINGLE_PIC_BASE)
     return;
 
   gcc_assert (flag_pic);
 
-  pic_reg = cfun->machine->pic_reg;
+  if (pic_reg == NULL_RTX)
+    pic_reg = cfun->machine->pic_reg;
   if (TARGET_VXWORKS_RTP)
     {
       pic_rtx = gen_rtx_SYMBOL_REF (Pmode, VXWORKS_GOTT_BASE);
@@ -8710,7 +8749,8 @@ arm_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -8778,7 +8818,8 @@ thumb_legitimize_address (rtx x, rtx orig_x, machine_mode mode)
     {
       /* We need to find and carefully transform any SYMBOL and LABEL
 	 references; so go back to the original address expression.  */
-      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX);
+      rtx new_x = legitimize_pic_address (orig_x, mode, NULL_RTX, NULL_RTX,
+					  false /*compute_now*/);
 
       if (new_x != orig_x)
 	x = new_x;
@@ -18066,7 +18107,7 @@ arm_emit_call_insn (rtx pat, rtx addr, bool sibcall)
 	  ? !targetm.binds_local_p (SYMBOL_REF_DECL (addr))
 	  : !SYMBOL_REF_LOCAL_P (addr)))
     {
-      require_pic_register ();
+      require_pic_register (NULL_RTX, false /*compute_now*/);
       use_reg (&CALL_INSN_FUNCTION_USAGE (insn), cfun->machine->pic_reg);
     }
 
@@ -21998,7 +22039,7 @@ arm_expand_prologue (void)
       mask &= THUMB2_WORK_REGS;
       if (!IS_NESTED (func_type))
 	mask |= (1 << IP_REGNUM);
-      arm_load_pic_register (mask);
+      arm_load_pic_register (mask, NULL_RTX);
     }
 
   /* If we are profiling, make sure no instructions are scheduled before
@@ -25229,7 +25270,7 @@ thumb1_expand_prologue (void)
   /* Load the pic register before setting the frame pointer,
      so we can use r7 as a temporary work register.  */
   if (flag_pic && arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (live_regs_mask);
+    arm_load_pic_register (live_regs_mask, NULL_RTX);
 
   if (!frame_pointer_needed && CALLER_INTERWORKING_SLOT_SIZE > 0)
     emit_move_insn (gen_rtx_REG (Pmode, ARM_HARD_FRAME_POINTER_REGNUM),
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 270b8e454b3..c2a8e696115 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -6021,7 +6021,8 @@
       operands[1] = legitimize_pic_address (operands[1], SImode,
 					    (!can_create_pseudo_p ()
 					     ? operands[0]
-					     : 0));
+					     : NULL_RTX), NULL_RTX,
+					    false /*compute_now*/);
   }
   "
 )
@@ -6309,7 +6310,7 @@
   /* r3 is clobbered by set/longjmp, so we can use it as a scratch
      register.  */
   if (arm_pic_register != INVALID_REGNUM)
-    arm_load_pic_register (1UL << 3);
+    arm_load_pic_register (1UL << 3, NULL_RTX);
   DONE;
 }")
 
@@ -8634,6 +8635,164 @@
    (set_attr "conds" "clob")]
 )
 
+;; Named patterns for stack smashing protection.
+(define_expand "stack_protect_combined_set"
+  [(parallel
+     [(set (match_operand:SI 0 "memory_operand" "")
+	   (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+		      UNSPEC_SP_SET))
+      (clobber (match_scratch:SI 2 ""))
+      (clobber (match_scratch:SI 3 ""))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+		   UNSPEC_SP_SET))
+   (clobber (match_scratch:SI 2 "=&l,&r"))
+   (clobber (match_scratch:SI 3 "=&l,&r"))]
+  ""
+  "#"
+  "reload_completed"
+  [(parallel [(set (match_dup 0) (unspec:SI [(mem:SI (match_dup 2))]
+					    UNSPEC_SP_SET))
+	      (clobber (match_dup 2))])]
+  "
+{
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[2], operands[3],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[2] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[2], mem);
+	}
+    }
+}"
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "*stack_protect_set_insn"
+  [(set (match_operand:SI 0 "memory_operand" "=m,m")
+	(unspec:SI [(mem:SI (match_operand:SI 1 "register_operand" "+&l,&r"))]
+	 UNSPEC_SP_SET))
+   (clobber (match_dup 1))]
+  ""
+  "@
+   ldr\\t%1, [%1]\;str\\t%1, %0\;movs\t%1,#0
+   ldr\\t%1, [%1]\;str\\t%1, %0\;mov\t%1,#0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "clob,nocond")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t1,32")]
+)
+
+(define_expand "stack_protect_combined_test"
+  [(parallel
+     [(set (pc)
+	   (if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "")
+		    (unspec:SI [(match_operand:SI 1 "guard_operand" "")]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+      (clobber (match_scratch:SI 3 ""))
+      (clobber (match_scratch:SI 4 ""))
+      (clobber (reg:CC CC_REGNUM))])]
+  ""
+  ""
+)
+
+;; Use a separate insn from the above expand to be able to have the mem outside
+;; the operand #1 when register allocation comes. This is needed to avoid LRA
+;; try to reload the guard since we need to control how PIC access is done in
+;; the -fpic/-fPIC case (see COMPUTE_NOW parameter when calling
+;; legitimize_pic_address ()).
+(define_insn_and_split "*stack_protect_combined_test_insn"
+  [(set (pc)
+	(if_then_else
+		(eq (match_operand:SI 0 "memory_operand" "m,m")
+		    (unspec:SI [(mem:SI (match_operand:SI 1 "guard_addr_operand" "X,X"))]
+			       UNSPEC_SP_TEST))
+		(label_ref (match_operand 2))
+		(pc)))
+   (clobber (match_scratch:SI 3 "=&l,&r"))
+   (clobber (match_scratch:SI 4 "=&l,&r"))
+   (clobber (reg:CC CC_REGNUM))]
+  ""
+  "#"
+  "reload_completed"
+  [(const_int 0)]
+{
+  rtx eq;
+
+  if (flag_pic)
+    {
+      /* Forces recomputing of GOT base now.  */
+      legitimize_pic_address (operands[1], SImode, operands[3], operands[4],
+			      true /*compute_now*/);
+    }
+  else
+    {
+      if (address_operand (operands[1], SImode))
+	operands[3] = operands[1];
+      else
+	{
+	  rtx mem = XEXP (force_const_mem (SImode, operands[1]), 0);
+	  emit_move_insn (operands[3], mem);
+	}
+    }
+  if (TARGET_32BIT)
+    {
+      emit_insn (gen_arm_stack_protect_test_insn (operands[4], operands[0],
+						  operands[3]));
+      rtx cc_reg = gen_rtx_REG (CC_Zmode, CC_REGNUM);
+      eq = gen_rtx_EQ (CC_Zmode, cc_reg, const0_rtx);
+      emit_jump_insn (gen_arm_cond_branch (operands[2], eq, cc_reg));
+    }
+  else
+    {
+      emit_insn (gen_thumb1_stack_protect_test_insn (operands[4], operands[0],
+						     operands[3]));
+      eq = gen_rtx_EQ (VOIDmode, operands[4], const0_rtx);
+      emit_jump_insn (gen_cbranchsi4 (eq, operands[4], const0_rtx,
+				      operands[2]));
+    }
+  DONE;
+}
+  [(set_attr "arch" "t1,32")]
+)
+
+(define_insn "arm_stack_protect_test_insn"
+  [(set (reg:CC_Z CC_REGNUM)
+	(compare:CC_Z (unspec:SI [(match_operand:SI 1 "memory_operand" "m,m")
+				  (mem:SI (match_operand:SI 2 "register_operand" "+l,r"))]
+				 UNSPEC_SP_TEST)
+		      (const_int 0)))
+   (clobber (match_operand:SI 0 "register_operand" "=&l,&r"))
+   (clobber (match_dup 2))]
+  "TARGET_32BIT"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8,12")
+   (set_attr "conds" "set")
+   (set_attr "type" "multiple")
+   (set_attr "arch" "t,32")]
+)
+
 (define_expand "casesi"
   [(match_operand:SI 0 "s_register_operand" "")	; index to jump on
    (match_operand:SI 1 "const_int_operand" "")	; lower bound
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 7e198f9bce4..69718ee9c7a 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -31,6 +31,23 @@
 	      || REGNO_REG_CLASS (REGNO (op)) != NO_REGS));
 })
 
+; Predicate for stack protector guard's address in
+; stack_protect_combined_set_insn and stack_protect_combined_test_insn patterns
+(define_predicate "guard_addr_operand"
+  (match_test "true")
+{
+  return (CONSTANT_ADDRESS_P (op)
+	  || !targetm.cannot_force_const_mem (mode, op));
+})
+
+; Predicate for stack protector guard in stack_protect_combined_set and
+; stack_protect_combined_test patterns
+(define_predicate "guard_operand"
+  (match_code "mem")
+{
+  return guard_addr_operand (XEXP (op, 0), mode);
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md
index 19dcdbcdd73..cd199c9c529 100644
--- a/gcc/config/arm/thumb1.md
+++ b/gcc/config/arm/thumb1.md
@@ -1962,4 +1962,17 @@
   }"
   [(set_attr "type" "mov_reg")]
 )
+
+(define_insn "thumb1_stack_protect_test_insn"
+  [(set (match_operand:SI 0 "register_operand" "=&l")
+	(unspec:SI [(match_operand:SI 1 "memory_operand" "m")
+		    (mem:SI (match_operand:SI 2 "register_operand" "+l"))]
+	 UNSPEC_SP_TEST))
+   (clobber (match_dup 2))]
+  "TARGET_THUMB1"
+  "ldr\t%0, [%2]\;ldr\t%2, %1\;eors\t%0, %2, %0"
+  [(set_attr "length" "8")
+   (set_attr "conds" "set")
+   (set_attr "type" "multiple")]
+)
 \f
diff --git a/gcc/config/arm/unspecs.md b/gcc/config/arm/unspecs.md
index 19416736ef9..8f9dbcb08dc 100644
--- a/gcc/config/arm/unspecs.md
+++ b/gcc/config/arm/unspecs.md
@@ -86,6 +86,9 @@
   UNSPEC_PROBE_STACK    ; Probe stack memory reference
   UNSPEC_NONSECURE_MEM	; Represent non-secure memory in ARMv8-M with
 			; security extension
+  UNSPEC_SP_SET		; Represent the setting of stack protector's canary
+  UNSPEC_SP_TEST	; Represent the testing of stack protector's canary
+			; against the guard.
 ])
 
 (define_c_enum "unspec" [
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 4801d68a207..0667a242ef3 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7424,22 +7424,61 @@ builtins.
 The get/set patterns have a single output/input operand respectively,
 with @var{mode} intended to be @code{Pmode}.
 
+@cindex @code{stack_protect_combined_set} instruction pattern
+@item @samp{stack_protect_combined_set}
+This pattern, if defined, moves a @code{ptr_mode} value from an address
+whose declaration RTX is given in operand 1 to the memory in operand 0
+without leaving the value in a register afterward.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_set}
+pattern is then generated to move the value from that address to the
+address in operand 0.
+
 @cindex @code{stack_protect_set} instruction pattern
 @item @samp{stack_protect_set}
-This pattern, if defined, moves a @code{ptr_mode} value from the memory
-in operand 1 to the memory in operand 0 without leaving the value in
-a register afterward.  This is to avoid leaking the value some place
-that an attacker might use to rewrite the stack guard slot after
-having clobbered it.
+This pattern, if defined, moves a @code{ptr_mode} value from the valid
+memory location in operand 1 to the memory in operand 0 without leaving
+the value in a register afterward.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+Note: on targets where the addressing modes do not allow to load
+directly from stack guard address, the address is expanded in a standard
+way first which could cause some spills.
 
 If this pattern is not defined, then a plain move pattern is generated.
 
+@cindex @code{stack_protect_combined_test} instruction pattern
+@item @samp{stack_protect_combined_test}
+This pattern, if defined, compares a @code{ptr_mode} value from an
+address whose declaration RTX is given in operand 1 with the memory in
+operand 0 without leaving the value in a register afterward and
+branches to operand 2 if the values were equal.  If several
+instructions are needed by the target to perform the operation (eg. to
+load the address from a GOT entry then load the @code{ptr_mode} value
+and finally store it), it is the backend's responsibility to ensure no
+intermediate result gets spilled.  This is to avoid leaking the value
+some place that an attacker might use to rewrite the stack guard slot
+after having clobbered it.
+
+If this pattern is not defined, then the address declaration is
+expanded first in the standard way and a @code{stack_protect_test}
+pattern is then generated to compare the value from that address to the
+value at the memory in operand 0.
+
 @cindex @code{stack_protect_test} instruction pattern
 @item @samp{stack_protect_test}
 This pattern, if defined, compares a @code{ptr_mode} value from the
-memory in operand 1 with the memory in operand 0 without leaving the
-value in a register afterward and branches to operand 2 if the values
-were equal.
+valid memory location in operand 1 with the memory in operand 0 without
+leaving the value in a register afterward and branches to operand 2 if
+the values were equal.
 
 If this pattern is not defined, then a plain compare pattern and
 conditional branch pattern is used.
diff --git a/gcc/function.c b/gcc/function.c
index 302438323c8..17aecedd981 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -4892,18 +4892,34 @@ stack_protect_epilogue (void)
   tree guard_decl = targetm.stack_protect_guard ();
   rtx_code_label *label = gen_label_rtx ();
   rtx x, y;
-  rtx_insn *seq;
+  rtx_insn *seq = NULL;
 
   x = expand_normal (crtl->stack_protect_guard);
-  if (guard_decl)
-    y = expand_normal (guard_decl);
+
+  if (targetm.have_stack_protect_combined_test () && guard_decl)
+    {
+      gcc_assert (DECL_P (guard_decl));
+      y = DECL_RTL (guard_decl);
+      /* Allow the target to compute address of Y and compare it with X without
+	 leaking Y into a register.  This combined address + compare pattern
+	 allows the target to prevent spilling of any intermediate results by
+	 splitting it after register allocator.  */
+      seq = targetm.gen_stack_protect_combined_test (x, y, label);
+    }
   else
-    y = const0_rtx;
+    {
+      if (guard_decl)
+	y = expand_normal (guard_decl);
+      else
+	y = const0_rtx;
+
+      /* Allow the target to compare Y with X without leaking either into
+	 a register.  */
+      if (targetm.have_stack_protect_test ())
+	seq = targetm.gen_stack_protect_test (x, y, label);
+    }
 
-  /* Allow the target to compare Y with X without leaking either into
-     a register.  */
-  if (targetm.have_stack_protect_test ()
-      && ((seq = targetm.gen_stack_protect_test (x, y, label)) != NULL_RTX))
+  if (seq)
     emit_insn (seq);
   else
     emit_cmp_and_jump_insns (x, y, EQ, NULL_RTX, ptr_mode, 1, label);
diff --git a/gcc/target-insns.def b/gcc/target-insns.def
index 9a552c3d11c..d39889b3522 100644
--- a/gcc/target-insns.def
+++ b/gcc/target-insns.def
@@ -96,7 +96,9 @@ DEF_TARGET_INSN (sibcall_value, (rtx x0, rtx x1, rtx opt2, rtx opt3,
 DEF_TARGET_INSN (simple_return, (void))
 DEF_TARGET_INSN (split_stack_prologue, (void))
 DEF_TARGET_INSN (split_stack_space_check, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_set, (rtx x0, rtx x1))
 DEF_TARGET_INSN (stack_protect_set, (rtx x0, rtx x1))
+DEF_TARGET_INSN (stack_protect_combined_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (stack_protect_test, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (store_multiple, (rtx x0, rtx x1, rtx x2))
 DEF_TARGET_INSN (tablejump, (rtx x0, rtx x1))
diff --git a/gcc/testsuite/gcc.target/arm/pr85434.c b/gcc/testsuite/gcc.target/arm/pr85434.c
new file mode 100644
index 00000000000..4143a861f7c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/pr85434.c
@@ -0,0 +1,200 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target fstack_protector }*/
+/* { dg-require-effective-target fpic }*/
+/* { dg-additional-options "-Os -fpic -fstack-protector-strong" } */
+
+#include <stddef.h>
+#include <stdint.h>
+
+
+static const unsigned char base64_enc_map[64] =
+{
+    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J',
+    'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T',
+    'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd',
+    'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+    'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
+    'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
+    '8', '9', '+', '/'
+};
+
+#define BASE64_SIZE_T_MAX   ( (size_t) -1 ) /* SIZE_T_MAX is not standard */
+
+
+void doSmth(void *x);
+
+#include <string.h>
+
+
+void check(int n) {
+  
+    if (!(n % 2 && n % 3 && n % 5)) {
+ __asm__  (   "add    r8, r8, #1;" );
+    }
+}
+
+uint32_t test(
+  uint32_t a1,
+  uint32_t a2,
+  size_t a3,
+  size_t a4,
+  size_t a5,
+  size_t a6)
+{
+  uint32_t nResult = 0;
+  uint8_t* h = 0L;
+  uint8_t X[128];
+  uint8_t mac[64];
+  size_t len;
+
+  doSmth(&a1);
+  doSmth(&a2);
+  doSmth(&a3);
+  doSmth(&a4);
+  doSmth(&a5);
+  doSmth(&a6);
+
+  if (a1 && a2 && a3 && a4 && a5 && a6) {
+    nResult = 1;
+    h = (void*)X;
+    len = sizeof(X);
+    memset(X, a2, len);
+    len -= 64;
+    memcpy(mac ,X, len);
+    *(h + len) = a6;
+
+    {
+
+
+        unsigned char *dst = X;
+        size_t dlen = a3;
+        size_t *olen = &a6;
+        const unsigned char *src = mac;
+        size_t slen = a4;
+    size_t i, n;
+    int C1, C2, C3;
+    unsigned char *p;
+
+    if( slen == 0 )
+    {
+        *olen = 0;
+        return( 0 );
+    }
+
+    n = slen / 3 + ( slen % 3 != 0 );
+
+    if( n > ( BASE64_SIZE_T_MAX - 1 ) / 4 )
+    {
+        *olen = BASE64_SIZE_T_MAX;
+        return( 0 );
+    }
+
+    n *= 4;
+
+    if( ( dlen < n + 1 ) || ( NULL == dst ) )
+    {
+        *olen = n + 1;
+        return( 0 );
+    }
+
+    n = ( slen / 3 ) * 3;
+
+    for( i = 0, p = dst; i < n; i += 3 )
+    {
+        C1 = *src++;
+        C2 = *src++;
+        C3 = *src++;
+
+        check(i);
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 &  3) << 4) + (C2 >> 4)) & 0x3F];
+        *p++ = base64_enc_map[(((C2 & 15) << 2) + (C3 >> 6)) & 0x3F];
+        *p++ = base64_enc_map[C3 & 0x3F];
+    }
+
+    if( i < slen )
+    {
+        C1 = *src++;
+        C2 = ( ( i + 1 ) < slen ) ? *src++ : 0;
+
+        *p++ = base64_enc_map[(C1 >> 2) & 0x3F];
+        *p++ = base64_enc_map[(((C1 & 3) << 4) + (C2 >> 4)) & 0x3F];
+
+        if( ( i + 1 ) < slen )
+             *p++ = base64_enc_map[((C2 & 15) << 2) & 0x3F];
+        else *p++ = '=';
+
+        *p++ = '=';
+    }
+
+    *olen = p - dst;
+    *p = 0;
+
+}
+
+  __asm__ ("mov r8, %0;" : "=r" ( nResult ));
+  }
+  else
+  {
+    nResult = 2;
+  }
+
+  doSmth(X);
+  doSmth(mac);
+
+
+  return nResult;
+}
+
+/* The pattern below catches sequences of instructions that were generated
+   for ARM and Thumb-2 before the fix for this PR. They are of the form:
+
+   ldr     rX, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+   Ideally the optional block would check for the various rX, rY and rZ
+   registers not being set but this is not possible due to back references
+   being illegal in lookahead expression in Tcl, thus preventing to use the
+   only construct that allow to negate a regexp from using the backreferences
+   to those registers.  Instead we go for the heuristic of allowing non ldr/cmp
+   instructions with the assumptions that (i) those are not part of the stack
+   protector sequences and (ii) they would only be scheduled here if they don't
+   conflict with registers used by stack protector.
+
+   Note on the regexp logic:
+   Allowing non X instructions (where X is ldr or cmp) is done by looking for
+   some non newline spaces, followed by something which is not X, followed by
+   an alphanumeric character followed by anything but a newline and ended by a
+   newline the whole thing an undetermined number of times. The alphanumeric
+   character is there to force the match of the negative lookahead for X to
+   only happen after all the initial spaces and thus to check the mnemonic.
+   This prevents it to match one of the initial space.  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\1\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\2, \3(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
+
+/* Likewise for Thumb-1 sequences of instructions prior to the fix for this PR
+   which had the form:
+
+   ldr     rS, <offset from sp or fp>
+   <optional non ldr instructions>
+   ldr     rT, <PC relative offset>
+   <optional non ldr instructions>
+   ldr     rX, [rS, rT]
+   <optional non ldr instructions>
+   ldr     rY, <offset from sp or fp>
+   ldr     rZ, [rX]
+   <optional non ldr instructions>
+   cmp     rY, rZ
+   <optional non cmp instructions>
+   bl      __stack_chk_fail
+
+  Note on the regexp logic:
+  PC relative offset is checked by looking for a source operand that does not
+  contain [ or ].  */
+/* { dg-final { scan-assembler-not {ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), [^][\n]*(?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[\1, \2\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+ldr[ \t]+([^,]+), \[(?:sp|fp)[^]]*\]\n[ \t]+ldr[ \t]+([^,]+), \[\3\](?:\n[ \t]+(?!ldr)\w[^\n]*)*\n[ \t]+cmp[ \t]+\4, \5(?:\n[ \t]+(?!cmp)\w[^\n]*)*\n[ \t]+bl[ \t]+__stack_chk_fail} } } */
-- 
2.19.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH, ARM, ping3] PR85434: Prevent spilling of stack protector guard's address on ARM
  2018-11-21 17:54                       ` Segher Boessenkool
@ 2018-11-22 16:06                         ` Thomas Preudhomme
  0 siblings, 0 replies; 20+ messages in thread
From: Thomas Preudhomme @ 2018-11-22 16:06 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: kyrylo.tkachov, Ramana Radhakrishnan, Richard Earnshaw, gcc-patches

I'm talking about the PIC access to the guard's variable. See for
example the pr85434.c testcase contributed with this patch when
compiled for aarch64 with -Os -fpic -march=armv8-a
-fstack-protector-strong:

(insn 227 226 228 33 (set (reg:DI 90)
        (high:DI (symbol_ref:DI ("_GLOBAL_OFFSET_TABLE_"))))
"/data/dev/checkouts/private/linaro/gcc/gcc/testsuite/gcc.target/arm/pr85434.c":148:1
-1
     (nil))
(insn 228 227 229 33 (set (reg/f:DI 244)
        (unspec:DI [
                (mem/u/c:DI (lo_sum:DI (reg:DI 90)
                        (symbol_ref:DI ("__stack_chk_guard") [flags
0xc0]  <var_decl 0x7f93f8778750 __stack_chk_guard>)) [0  S8 A8])
            ] UNSPEC_GOTSMALLPIC28K))
"/data/dev/checkouts/private/linaro/gcc/gcc/testsuite/gcc.target/arm/pr85434.c":148:1
-1
     (expr_list:REG_EQUAL (symbol_ref:DI ("__stack_chk_guard") [flags
0xc0]  <var_decl 0x7f93f8778750 __stack_chk_guard>)
        (nil)))
(insn 229 228 230 33 (parallel [
            (set (reg:DI 245)
                (unspec:DI [
                        (mem/v/f/c:DI (plus:DI (reg/f:DI 85 virtual-stack-vars)
                                (const_int -8 [0xfffffffffffffff8]))
[4 D.3715+0 S8 A64])
                        (mem/v/f/c:DI (reg/f:DI 244) [4
__stack_chk_guard+0 S8 A64])
                    ] UNSPEC_SP_TEST))
            (clobber (scratch:DI))
        ]) "/data/dev/checkouts/private/linaro/gcc/gcc/testsuite/gcc.target/arm/pr85434.c":148:1
-1
     (nil))

The unspec in insn 228 is not CSEd in my experiment despite the same
instruction happening in the prologue to set the canary. In arm
backend it was but the PIC access is of the form (mem (reg) (unspec
offset)), ie the outermost rtx in the source is not an unspec.

Best regards,

Thomas
On Wed, 21 Nov 2018 at 17:54, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Nov 16, 2018 at 02:56:46PM +0000, Thomas Preudhomme wrote:
> > In case of high register pressure in PIC mode, address of the stack
> > protector's guard can be spilled on ARM targets as shown in PR85434,
> > thus allowing an attacker to control what the canary would be compared
> > against. ARM does lack stack_protect_set and stack_protect_test insn
> > patterns, defining them does not help as the address is expanded
> > regularly and the patterns only deal with the copy and test of the
> > guard with the canary.
> >
> > This problem does not occur for x86 targets because the PIC access and
> > the test can be done in the same instruction. Aarch64 is exempt too
> > because PIC access insn pattern are mov of UNSPEC which prevents it from
> > the second access in the epilogue being CSEd in cse_local pass with the
> > first access in the prologue.
>
> The unspecs are not CSEd because they are *different* unspecs (UNSPEC_SP_SET
> vs. UNSPEC_SP_TEST; they have different args too, different number of args
> even).  Two the same unspecs can be CSEd just fine.
>
>
> Segher

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2018-11-22 16:06 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKnkMGvQoj2Wuz_r-PGX8aJkusA=hzVLW-AHaVjDK78ioHUMxQ@mail.gmail.com>
2018-08-29  9:51 ` [PATCH, ARM] PR85434: Prevent spilling of stack protector guard's address on ARM Thomas Preudhomme
2018-08-29 10:07   ` Thomas Preudhomme
2018-09-13 12:02     ` Thomas Preudhomme
2018-09-18  0:57   ` Jeff Law
2018-09-25 16:13   ` Kyrill Tkachov
2018-10-23 13:17     ` Thomas Preudhomme
2018-10-24 10:38       ` Thomas Preudhomme
2018-10-25 16:10         ` Thomas Preudhomme
2018-10-27  4:37           ` Thomas Preudhomme
2018-11-01 16:03             ` [PATCH, ARM, ping] " Thomas Preudhomme
2018-11-08  9:53               ` [PATCH, ARM, ping2] " Thomas Preudhomme
2018-11-08 15:53                 ` Kyrill Tkachov
2018-11-10 15:07                   ` Thomas Preudhomme
2018-11-16 14:57                     ` [PATCH, ARM, ping3] " Thomas Preudhomme
2018-11-21  0:32                       ` Jeff Law
2018-11-21 10:35                         ` Thomas Preudhomme
2018-11-21 16:07                       ` Kyrill Tkachov
2018-11-22 14:49                         ` Thomas Preudhomme
2018-11-21 17:54                       ` Segher Boessenkool
2018-11-22 16:06                         ` Thomas Preudhomme

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).