public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* Straight Line Speculation (SLS) mitigation.
@ 2020-06-08 14:10 Matthew Malcomson
  2020-06-08 14:10 ` [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags Matthew Malcomson
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-08 14:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: Kyrylo.Tkachov, richard.sandiford, Kristof.Beyls, Richard.Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 3346 bytes --]

Hi,

A new speculative cache side-channel vulnerability has been published at
the link below, named "straight-line speculation" (SLS in this patch series).
https://developer.arm.com/support/arm-security-updates/speculative-processor-vulnerability/downloads/straight-line-speculation
This vulnerability has been given CVE number CVE-2020-13844.

We have prepared some toolchain mitigations for this vulnerability.  These
mitigate against the RET, BR case and the BLR case mentioned in the linked
whitepaper.

The part of vulnerability relevant to these toolchain mitigations is as
follows:
Some processors may speculatively execute the instructions immediately
following what should be a change in control flow.  The examples we mitigate in
this patch series are the instructions RET (return), BR (indirect jump) and BLR
(indirect function call).
Where the speculative path contains a suitable code sequence, often described
by researchers as a "Spectre Revelation Gadget", such straight-line speculation
could lead to changes in the caches and similar structures that are indicative
of secrets, making those secrets vulnerable to revelation through timing
analysis.

The gist of the mitigation posted here is:

Every RET and BR instruction has a speculation barrier placed directly after
it.  These speculation barriers are not to be architecturally executed, so the
performance cost is expected to be low.

Each BLR instruction is replaced by a BL to a function stub consisting of a BR
instruction followed by a speculation barrier.
This alternate approach is used since the instructions directly after a BLR are
usually architecturally executed, and this approach ensures the speculation
barrier is off that architecturally executed path.
Arm has been unable to demonstrate straight line speculation past a BL or B
instruction, and so we believe the BL instruction can be used without a
barrier.

In summary, a
  RET
will be transformed to
  RET
  <speculation barrier>

While a
  BLR x<N>
will be transformed to a
  BL __call_indirect_x<N>
call, with __call_indirect_x<N> being a thunk that looks like
__call_indirect_x<N>:
  BR x<N>
  <speculation barrier>


The patch series is structured as follows:
    1) Introduce new command line arguments.
    2) The RET/BR mitigation.
    3) The BLR mitigation.

There are a few known places where this toolchain mitigation does not protect
against speculation places:
- Some accesses to thread-local variables use a code sequence including a BLR
  instruction.   This code sequence is part of the binary interface between
  compiler and linker. If this BLR instruction needs to be mitigated, it'd
  probably be best to do so in the linker.
  It seems that the code sequence for thread-local variable access is unlikely
  to lead to a Spectre Revalation Gadget.
- PLT stubs are produced by the linker, and each contain a BLR instruction.
  It seems that at most this could introduce one spectre relevation gadget
  after the last PLT stub.
- Use of BR, RET, or BLR instructions in assembly are not mitigated.
- Use of BR, RET, or BLR instructions in libraries and run-time library
  routines that are not recompiled with this toolchain mitigation are not
  mitigated.

N.b. patches with similar functionality are being posted to LLVM.

Thanks,
Matthew.


Entire patch series attached to cover letter.

[-- Attachment #2: all-patches.tar.gz --]
[-- Type: application/gzip, Size: 10455 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags
  2020-06-08 14:10 Straight Line Speculation (SLS) mitigation Matthew Malcomson
@ 2020-06-08 14:10 ` Matthew Malcomson
  2020-06-23 15:48   ` Richard Sandiford
  2020-06-08 14:10 ` [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions Matthew Malcomson
  2020-06-08 14:10 ` [Patch 3/3] aarch64: Mitigate SLS for BLR instruction Matthew Malcomson
  2 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-08 14:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: Kyrylo.Tkachov, richard.sandiford, Kristof.Beyls, Richard.Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 6834 bytes --]

Here we introduce the flags that will be used for straight line speculation.

The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list of one
or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation barrier
inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function stub
using a BR with a speculation barrier after it.

Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.

gcc/ChangeLog:

2020-06-08  Matthew Malcomson  <matthew.malcomson@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_harden_sls_retbr_p):
	New.
	(aarch64_harden_sls_blr_p): New.
	* config/aarch64/aarch64.c (enum aarch64_sls_hardening_type):
	New.
	(aarch64_harden_sls_retbr_p): New.
	(aarch64_harden_sls_blr_p): New.
	(aarch64_validate_sls_mitigation): New.
	(aarch64_override_options): Parse options for SLS mitigation.
	* config/aarch64/aarch64.opt (-mharden-sls): New option.
	* doc/invoke.texi: Document new option.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..8ca67d7e69edaf73c84f079e7e1c483009ad10c0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,4 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+extern bool aarch64_harden_sls_retbr_p (void);
+extern bool aarch64_harden_sls_blr_p (void);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e92c7e69fcb7a8689a8b7098b86ff050dc9ab78b..775f49991e5f599a843d3ef490b8cd044acfe78f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14466,6 +14466,81 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
   return false;
 }
 
+
+/* Straight line speculation indicators.  */
+enum aarch64_sls_hardening_type
+{
+    SLS_NONE = 0,
+    SLS_RETBR = 1,
+    SLS_BLR = 2,
+    SLS_ALL = 3,
+};
+static enum aarch64_sls_hardening_type aarch64_sls_hardening;
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_retbr_p (void)
+{
+  return aarch64_sls_hardening & SLS_RETBR;
+}
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_blr_p (void)
+{
+  return aarch64_sls_hardening & SLS_BLR;
+}
+
+/* As of yet we only allow setting these options globally, in the future we may
+   allow setting them per function.  */
+static void
+aarch64_validate_sls_mitigation (const char *const_str)
+{
+  char *str_root = xstrdup (const_str);
+  char *token_save = NULL;
+  char *str = NULL;
+  int temp = SLS_NONE;
+
+  aarch64_sls_hardening = SLS_NONE;
+  if (strcmp (str_root, "none") == 0)
+    goto finish;
+  if (strcmp (str_root, "all") == 0)
+    {
+      aarch64_sls_hardening = SLS_ALL;
+      goto finish;
+    }
+
+  str = strtok_r (str_root, ",", &token_save);
+  if (!str)
+    {
+      error ("invalid argument given to %<-mharden-sls=%>");
+      goto finish;
+    }
+
+  while (str)
+    {
+      if (strcmp (str, "blr") == 0)
+	temp |= SLS_BLR;
+      else if (strcmp (str, "retbr") == 0)
+	temp |= SLS_RETBR;
+      else if (strcmp (str, "none") == 0 || strcmp (str, "all") == 0)
+	{
+	  error ("%<%s%> must be by itself for %<-mharden-sls=%>", str);
+	  break;
+	}
+      else
+	{
+	  error ("invalid argument %<%s%> for %<-mharden-sls=%>", str);
+	  break;
+	}
+      str = strtok_r (NULL, ",", &token_save);
+    }
+  aarch64_sls_hardening = (aarch64_sls_hardening_type) temp;
+finish:
+  free (str_root);
+  return;
+}
+
 /* Parses CONST_STR for branch protection features specified in
    aarch64_branch_protect_types, and set any global variables required.  Returns
    the parsing result and assigns LAST_STR to the last processed token from
@@ -14710,6 +14785,9 @@ aarch64_override_options (void)
   selected_arch = NULL;
   selected_tune = NULL;
 
+  if (aarch64_harden_sls_string)
+      aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
+
   if (aarch64_branch_protection_string)
     aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index d99d14c137d8774d3c8dab860d475f68c01a2817..5170361fd5e5721e044d1664e522b2718f654b8e 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -71,6 +71,10 @@ mgeneral-regs-only
 Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses only the general registers.
 
+mharden-sls=
+Target RejectNegative Joined Var(aarch64_harden_sls_string)
+Generate code to mitigate against straight line speculation.
+
 mfix-cortex-a53-835769
 Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
 Workaround for ARM Cortex-A53 Erratum number 835769.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 35e8242af5fa4c52744fd2c3e2cfee0a617e22bb..8a3fab2964c9bb06c820766d284768751d63ac9a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -696,6 +696,7 @@ Objective-C and Objective-C++ Dialects}.
 -msign-return-address=@var{scope} @gol
 -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
 +@var{b-key}]|@var{bti} @gol
+-mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr} @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
@@ -17045,6 +17046,15 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
 with the B-key instead of the A-key.
 @samp{bti} turns on branch target identification mechanism.
 
+@item -mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr}
+@opindex mharden-sls
+Enable compiler hardening against straight line speculation (SLS).
+There are two options for hardening against straight line speculation.
+@samp{retbr} allows inserting speculation barriers after every
+@samp{br} and @samp{ret} instruction.  While @samp{blr} enables replacing
+@samp{blr} instructions with a @samp{bl} to a function stub.
+@samp{all} enables all SLS hardening, while @samp{none} does not enable any.
+
 @item -msve-vector-bits=@var{bits}
 @opindex msve-vector-bits
 Specify the number of bits in an SVE vector register.  This option only has


[-- Attachment #2: sls-miti-diff0.patch --]
[-- Type: text/plain, Size: 5520 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..8ca67d7e69edaf73c84f079e7e1c483009ad10c0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,4 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+extern bool aarch64_harden_sls_retbr_p (void);
+extern bool aarch64_harden_sls_blr_p (void);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index e92c7e69fcb7a8689a8b7098b86ff050dc9ab78b..775f49991e5f599a843d3ef490b8cd044acfe78f 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14466,6 +14466,81 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
   return false;
 }
 
+
+/* Straight line speculation indicators.  */
+enum aarch64_sls_hardening_type
+{
+    SLS_NONE = 0,
+    SLS_RETBR = 1,
+    SLS_BLR = 2,
+    SLS_ALL = 3,
+};
+static enum aarch64_sls_hardening_type aarch64_sls_hardening;
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_retbr_p (void)
+{
+  return aarch64_sls_hardening & SLS_RETBR;
+}
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_blr_p (void)
+{
+  return aarch64_sls_hardening & SLS_BLR;
+}
+
+/* As of yet we only allow setting these options globally, in the future we may
+   allow setting them per function.  */
+static void
+aarch64_validate_sls_mitigation (const char *const_str)
+{
+  char *str_root = xstrdup (const_str);
+  char *token_save = NULL;
+  char *str = NULL;
+  int temp = SLS_NONE;
+
+  aarch64_sls_hardening = SLS_NONE;
+  if (strcmp (str_root, "none") == 0)
+    goto finish;
+  if (strcmp (str_root, "all") == 0)
+    {
+      aarch64_sls_hardening = SLS_ALL;
+      goto finish;
+    }
+
+  str = strtok_r (str_root, ",", &token_save);
+  if (!str)
+    {
+      error ("invalid argument given to %<-mharden-sls=%>");
+      goto finish;
+    }
+
+  while (str)
+    {
+      if (strcmp (str, "blr") == 0)
+	temp |= SLS_BLR;
+      else if (strcmp (str, "retbr") == 0)
+	temp |= SLS_RETBR;
+      else if (strcmp (str, "none") == 0 || strcmp (str, "all") == 0)
+	{
+	  error ("%<%s%> must be by itself for %<-mharden-sls=%>", str);
+	  break;
+	}
+      else
+	{
+	  error ("invalid argument %<%s%> for %<-mharden-sls=%>", str);
+	  break;
+	}
+      str = strtok_r (NULL, ",", &token_save);
+    }
+  aarch64_sls_hardening = (aarch64_sls_hardening_type) temp;
+finish:
+  free (str_root);
+  return;
+}
+
 /* Parses CONST_STR for branch protection features specified in
    aarch64_branch_protect_types, and set any global variables required.  Returns
    the parsing result and assigns LAST_STR to the last processed token from
@@ -14710,6 +14785,9 @@ aarch64_override_options (void)
   selected_arch = NULL;
   selected_tune = NULL;
 
+  if (aarch64_harden_sls_string)
+      aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
+
   if (aarch64_branch_protection_string)
     aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index d99d14c137d8774d3c8dab860d475f68c01a2817..5170361fd5e5721e044d1664e522b2718f654b8e 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -71,6 +71,10 @@ mgeneral-regs-only
 Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses only the general registers.
 
+mharden-sls=
+Target RejectNegative Joined Var(aarch64_harden_sls_string)
+Generate code to mitigate against straight line speculation.
+
 mfix-cortex-a53-835769
 Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
 Workaround for ARM Cortex-A53 Erratum number 835769.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 35e8242af5fa4c52744fd2c3e2cfee0a617e22bb..8a3fab2964c9bb06c820766d284768751d63ac9a 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -696,6 +696,7 @@ Objective-C and Objective-C++ Dialects}.
 -msign-return-address=@var{scope} @gol
 -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
 +@var{b-key}]|@var{bti} @gol
+-mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr} @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
@@ -17045,6 +17046,15 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
 with the B-key instead of the A-key.
 @samp{bti} turns on branch target identification mechanism.
 
+@item -mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr}
+@opindex mharden-sls
+Enable compiler hardening against straight line speculation (SLS).
+There are two options for hardening against straight line speculation.
+@samp{retbr} allows inserting speculation barriers after every
+@samp{br} and @samp{ret} instruction.  While @samp{blr} enables replacing
+@samp{blr} instructions with a @samp{bl} to a function stub.
+@samp{all} enables all SLS hardening, while @samp{none} does not enable any.
+
 @item -msve-vector-bits=@var{bits}
 @opindex msve-vector-bits
 Specify the number of bits in an SVE vector register.  This option only has


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-06-08 14:10 Straight Line Speculation (SLS) mitigation Matthew Malcomson
  2020-06-08 14:10 ` [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags Matthew Malcomson
@ 2020-06-08 14:10 ` Matthew Malcomson
  2020-06-23 16:17   ` Richard Sandiford
  2020-06-08 14:10 ` [Patch 3/3] aarch64: Mitigate SLS for BLR instruction Matthew Malcomson
  2 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-08 14:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: Kyrylo.Tkachov, richard.sandiford, Kristof.Beyls, Richard.Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 18930 bytes --]

Instructions following RET or BR are not necessarily executed.  In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.

Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.

The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.

We add tests for each of the cases where such an instruction was seen.

This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction.  We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.

There is one particular case which is slightly tricky.  The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against.  The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.

Testing:
  Bootstrap and regtest done on aarch64-none-linux
  Used a temporary hack(1) to use these options on every test in the
  testsuite and a script to check that the output never emitted an
  unmitigated RET or BR.


1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.


gcc/ChangeLog:

2020-06-08  Matthew Malcomson  <matthew.malcomson@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
	* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
	speculation barrier after BR instruction if needs be.
	(aarch64_sls_barrier): New.
	(aarch64_asm_trampoline_template): Add needed barriers.
	* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
	(TARGET_SB): New.
	(TRAMPOLINE_SIZE): Account for barrier.
	* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
	*do_return, simple_return, *sibcall_insn, *sibcall_value_insn):
	Emit barrier if needs be, also account for possible barrier in
	"length" attribute.
	* config/aarch64/aarch64.opt (-mharden-sls-retbr): Introduce new
	option.

gcc/testsuite/ChangeLog:

2020-06-08  Matthew Malcomson  <matthew.malcomson@arm.com>

	* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
	* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
	New test.
	* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
	* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
	New proc.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8ca67d7e69edaf73c84f079e7e1c483009ad10c0..d2eb739bc89ecd9d0212416b8dc3ee4ba236a271 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,6 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+const char * aarch64_sls_barrier (int);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 24767c747bab0d711627c5c646937c42f210d70b..5da3d94e335fc315e1d90e6a674f2f09cf1a4529 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -281,6 +281,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_F32MM	   (aarch64_isa_flags & AARCH64_FL_F32MM)
 #define AARCH64_ISA_F64MM	   (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16	   (aarch64_isa_flags & AARCH64_FL_BF16)
+#define AARCH64_ISA_SB  	   (aarch64_isa_flags & AARCH64_FL_SB)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -378,6 +379,9 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_FIX_ERR_A53_835769_DEFAULT 1
 #endif
 
+/* SB instruction is enabled through +sb.  */
+#define TARGET_SB (AARCH64_ISA_SB)
+
 /* Apply the workaround for Cortex-A53 erratum 835769.  */
 #define TARGET_FIX_ERR_A53_835769	\
   ((aarch64_fix_a53_err835769 == 2)	\
@@ -1058,8 +1062,11 @@ typedef struct
 
 #define RETURN_ADDR_RTX aarch64_return_addr
 
-/* BTI c + 3 insns + 2 pointer-sized entries.  */
-#define TRAMPOLINE_SIZE	(TARGET_ILP32 ? 24 : 32)
+/* BTI c + 3 insns
+   + sls barrier of DSB + ISB.
+   + 2 pointer-sized entries.  */
+#define TRAMPOLINE_SIZE	(24 \
+			 + (TARGET_ILP32 ? 8 : 16))
 
 /* Trampolines contain dwords, so must be dword aligned.  */
 #define TRAMPOLINE_ALIGNMENT 64
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 775f49991e5f599a843d3ef490b8cd044acfe78f..9356937fe266c68196392a1589b3cf96607de104 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10822,8 +10822,8 @@ aarch64_return_addr (int count, rtx frame ATTRIBUTE_UNUSED)
 static void
 aarch64_asm_trampoline_template (FILE *f)
 {
-  int offset1 = 16;
-  int offset2 = 20;
+  int offset1 = 24;
+  int offset2 = 28;
 
   if (aarch64_bti_enabled ())
     {
@@ -10846,6 +10846,17 @@ aarch64_asm_trampoline_template (FILE *f)
     }
   asm_fprintf (f, "\tbr\t%s\n", reg_names [IP1_REGNUM]);
 
+  /* We always emit a speculation barrier.
+     This is because the same trampoline template is used for every nested
+     function.  Since nested functions are not particularly common or
+     performant we don't worry too much about the extra instructions to copy
+     around.
+     This is not yet a problem, since we have not yet implemented function
+     specific attributes to choose between hardening against straight line
+     speculation or not, but such function specific attributes are likely to
+     happen in the future.  */
+  output_asm_insn ("dsb\tsy\n\tisb", NULL);
+
   /* The trampoline needs an extra padding instruction.  In case if BTI is
      enabled the padding instruction is replaced by the BTI instruction at
      the beginning.  */
@@ -10860,7 +10871,7 @@ static void
 aarch64_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
   rtx fnaddr, mem, a_tramp;
-  const int tramp_code_sz = 16;
+  const int tramp_code_sz = 24;
 
   /* Don't need to copy the trailing D-words, we fill those in below.  */
   emit_block_move (m_tramp, assemble_trampoline_template (),
@@ -11054,6 +11065,7 @@ aarch64_output_casesi (rtx *operands)
   output_asm_insn (buf, operands);
   output_asm_insn (patterns[index][1], operands);
   output_asm_insn ("br\t%3", operands);
+  output_asm_insn (aarch64_sls_barrier (aarch64_harden_sls_retbr_p ()), operands);
   assemble_label (asm_out_file, label);
   return "";
 }
@@ -22895,6 +22907,22 @@ aarch64_file_end_indicate_exec_stack ()
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
 
+/* Helper function for straight line speculation.
+   Return what barrier should be emitted for straight line speculation
+   mitigation.
+   When not mitigating against straight line speculation this function returns
+   an empty string.
+   When mitigating against straight line speculation, use:
+   * SB when the v8.5-A SB extension is enabled.
+   * DSB+ISB otherwise.  */
+const char *
+aarch64_sls_barrier (int mitigation_required)
+{
+  return mitigation_required
+    ? (TARGET_SB ? "sb" : "dsb\tsy\n\tisb")
+    : "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ff15505d45546124868d2531b7f4e5b0f1f5bebc..75ef87a3b4674cc73cb42cc82cfb8e782acf77f6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -447,8 +447,15 @@
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
-  "br\\t%0"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("br\\t%0", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	       (match_test "TARGET_SB") (const_int 8)]
+	      (const_int 12)))]
 )
 
 (define_insn "jump"
@@ -765,7 +772,10 @@
   "*
   return aarch64_output_casesi (operands);
   "
-  [(set_attr "length" "16")
+  [(set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 16)
+	       (match_test "TARGET_SB") (const_int 20)]
+	      (const_int 24)))
    (set_attr "type" "branch")]
 )
 
@@ -844,18 +854,26 @@
   [(return)]
   ""
   {
+    const char *ret = NULL;
     if (aarch64_return_address_signing_enabled ()
 	&& TARGET_ARMV8_3
 	&& !crtl->calls_eh_return)
       {
 	if (aarch64_ra_sign_key == AARCH64_KEY_B)
-	  return "retab";
+	  ret = "retab";
 	else
-	  return "retaa";
+	  ret = "retaa";
       }
-    return "ret";
+    else
+      ret = "ret";
+    output_asm_insn (ret, operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
   }
-  [(set_attr "type" "branch")]
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	       (match_test "TARGET_SB") (const_int 8)]
+	      (const_int 12)))]
 )
 
 (define_expand "return"
@@ -867,8 +885,15 @@
 (define_insn "simple_return"
   [(simple_return)]
   ""
-  "ret"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("ret", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	       (match_test "TARGET_SB") (const_int 8)]
+	      (const_int 12)))]
 )
 
 (define_insn "*cb<optab><mode>1"
@@ -1066,10 +1091,20 @@
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%0
-   b\\t%c0"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%0", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c0";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr_alternative "length"
+     [(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	     (match_test "TARGET_SB") (const_int 8)]
+	(const_int 12))
+      (const_int 4)])]
 )
 
 (define_insn "*sibcall_value_insn"
@@ -1080,10 +1115,20 @@
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%1
-   b\\t%c1"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%1", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c1";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr_alternative "length"
+     [(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	     (match_test "TARGET_SB") (const_int 8)]
+	(const_int 12))
+      (const_int 4)])]
 )
 
 ;; Call subroutine returning any type.
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
new file mode 100644
index 0000000000000000000000000000000000000000..11f614b4ef2eb0fa3707cb46a55583d6685b89d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mharden-sls=retbr -mbranch-protection=pac-ret -march=armv8.3-a" } */
+
+/* Testing the do_return pattern for retaa and retab.  */
+long retbr_subcall(void);
+long retbr_do_return_retaa(void)
+{
+    return retbr_subcall()+1;
+}
+__attribute__((target("branch-protection=pac-ret+b-key")))
+long retbr_do_return_retab(void)
+{
+    return retbr_subcall()+1;
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
+/* { dg-final { scan-assembler-not "ret\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
new file mode 100644
index 0000000000000000000000000000000000000000..5cd4da6bbb719a5135faa2c9818dc873e3d5af70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
@@ -0,0 +1,121 @@
+/* We ensure that -Wpedantic is off since it complains about the trampolines
+   we explicitly want to test.  */
+/* { dg-additional-options "-mharden-sls=retbr -Wno-pedantic " } */
+/*
+   Ensure that the SLS hardening of RET and BR leaves no unprotected RET/BR
+   instructions.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+int
+retbr_sibcall_value_insn (struct sls_testclass x)
+{
+  return x.x(x.left, x.right);
+}
+
+void
+retbr_sibcall_insn (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+}
+
+/* Aim to test two different returns.
+   One that introduces a tail call in the middle of the function, and one that
+   has a normal return.  */
+int
+retbr_multiple_returns (struct sls_testclass x)
+{
+  int temp;
+  if (x.left % 10)
+    return x.x(x.left, 100);
+  else if (x.right % 20)
+    {
+      return x.x(x.left * x.right, 100);
+    }
+  temp = x.left % x.right;
+  temp *= 100;
+  temp /= 2;
+  return temp % 3;
+}
+
+void
+retbr_multiple_returns_void (struct sls_testclass x)
+{
+  if (x.left % 10)
+    {
+      x.y(x.left, 100);
+    }
+  else if (x.right % 20)
+    {
+      x.y(x.left * x.right, 100);
+    }
+  return;
+}
+
+/* Testing the casesi jump via register.  */
+__attribute__ ((optimize ("Os")))
+int
+retbr_casesi_dispatch (struct sls_testclass x)
+{
+  switch (x.left)
+    {
+    case -5:
+      return -2;
+    case -3:
+      return -1;
+    case 0:
+      return 0;
+    case 3:
+      return 1;
+    case 5:
+      break;
+    default:
+      __builtin_unreachable ();
+    }
+  return x.right;
+}
+
+/* Testing the BR in trampolines is mitigated against.  */
+void f1 (void *);
+void f3 (void *, void (*)(void *));
+void f2 (void *);
+
+int
+retbr_trampolines (void *a, int b)
+{
+  if (!b)
+    {
+      f1 (a);
+      return 1;
+    }
+  if (b)
+    {
+      void retbr_tramp_internal (void *c)
+      {
+	if (c == a)
+	  f2 (c);
+      }
+      f3 (a, retbr_tramp_internal);
+    }
+  return 0;
+}
+
+/* Testing the indirect_jump pattern.  */
+typedef signed __attribute__((mode(DI))) intptr_t;
+intptr_t BUF[5];
+void
+retbr_indirect_jump (intptr_t *buf)
+{
+  __builtin_longjmp(buf, 1);
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
new file mode 100644
index 0000000000000000000000000000000000000000..fb63c6dfe230e64b11919381c30a3a05eee52e16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
@@ -0,0 +1,73 @@
+#  Regression driver for SLS mitigation on AArch64.
+#  Copyright (C) 2020-2020 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+load_lib torture-options.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " "
+}
+
+# Initialize `dg'.
+dg-init
+torture-init
+
+# Use different architectures as well as the normal optimisation options.
+# (i.e. use both SB and DSB+ISB barriers).
+
+set save-dg-do-what-default ${dg-do-what-default}
+# Main loop.
+# Run with torture tests (i.e. a bunch of different optimisation levels) just
+# to increase test coverage.
+set dg-do-what-default assemble
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+
+# Run the same tests but this time with SB extension.
+# Since not all supported assemblers will support that extension we decide
+# whether to assemble or just compile based on whether the extension is
+# supported for the available assembler.
+
+set templist {}
+foreach x $DG_TORTURE_OPTIONS {
+  lappend templist "$x -march=armv8.3-a+sb "
+  lappend templist "$x -march=armv8-a+sb "
+}
+set-torture-options $templist
+if { [check_effective_target_aarch64_asm_sb_ok] } {
+    set dg-do-what-default assemble
+} else {
+    set dg-do-what-default compile
+}
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+torture-finish
+dg-finish
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a96b0525ba902b4d39e21123186171d951bd4e9d..6018a3ce4069d462087102b6d267d8e25b6f04dd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9349,7 +9349,7 @@ proc check_effective_target_aarch64_tiny { } {
 # various architecture extensions via the .arch_extension pseudo-op.
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
-			  "i8mm" "f32mm" "f64mm" "bf16" } {
+			  "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
     eval [string map [list FUNC $aarch64_ext] {
 	proc check_effective_target_aarch64_asm_FUNC_ok { } {
 	  if { [istarget aarch64*-*-*] } {


[-- Attachment #2: sls-miti-diff1.patch --]
[-- Type: text/plain, Size: 15868 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8ca67d7e69edaf73c84f079e7e1c483009ad10c0..d2eb739bc89ecd9d0212416b8dc3ee4ba236a271 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,6 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+const char * aarch64_sls_barrier (int);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 24767c747bab0d711627c5c646937c42f210d70b..5da3d94e335fc315e1d90e6a674f2f09cf1a4529 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -281,6 +281,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_F32MM	   (aarch64_isa_flags & AARCH64_FL_F32MM)
 #define AARCH64_ISA_F64MM	   (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16	   (aarch64_isa_flags & AARCH64_FL_BF16)
+#define AARCH64_ISA_SB  	   (aarch64_isa_flags & AARCH64_FL_SB)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -378,6 +379,9 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_FIX_ERR_A53_835769_DEFAULT 1
 #endif
 
+/* SB instruction is enabled through +sb.  */
+#define TARGET_SB (AARCH64_ISA_SB)
+
 /* Apply the workaround for Cortex-A53 erratum 835769.  */
 #define TARGET_FIX_ERR_A53_835769	\
   ((aarch64_fix_a53_err835769 == 2)	\
@@ -1058,8 +1062,11 @@ typedef struct
 
 #define RETURN_ADDR_RTX aarch64_return_addr
 
-/* BTI c + 3 insns + 2 pointer-sized entries.  */
-#define TRAMPOLINE_SIZE	(TARGET_ILP32 ? 24 : 32)
+/* BTI c + 3 insns
+   + sls barrier of DSB + ISB.
+   + 2 pointer-sized entries.  */
+#define TRAMPOLINE_SIZE	(24 \
+			 + (TARGET_ILP32 ? 8 : 16))
 
 /* Trampolines contain dwords, so must be dword aligned.  */
 #define TRAMPOLINE_ALIGNMENT 64
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 775f49991e5f599a843d3ef490b8cd044acfe78f..9356937fe266c68196392a1589b3cf96607de104 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10822,8 +10822,8 @@ aarch64_return_addr (int count, rtx frame ATTRIBUTE_UNUSED)
 static void
 aarch64_asm_trampoline_template (FILE *f)
 {
-  int offset1 = 16;
-  int offset2 = 20;
+  int offset1 = 24;
+  int offset2 = 28;
 
   if (aarch64_bti_enabled ())
     {
@@ -10846,6 +10846,17 @@ aarch64_asm_trampoline_template (FILE *f)
     }
   asm_fprintf (f, "\tbr\t%s\n", reg_names [IP1_REGNUM]);
 
+  /* We always emit a speculation barrier.
+     This is because the same trampoline template is used for every nested
+     function.  Since nested functions are not particularly common or
+     performant we don't worry too much about the extra instructions to copy
+     around.
+     This is not yet a problem, since we have not yet implemented function
+     specific attributes to choose between hardening against straight line
+     speculation or not, but such function specific attributes are likely to
+     happen in the future.  */
+  output_asm_insn ("dsb\tsy\n\tisb", NULL);
+
   /* The trampoline needs an extra padding instruction.  In case if BTI is
      enabled the padding instruction is replaced by the BTI instruction at
      the beginning.  */
@@ -10860,7 +10871,7 @@ static void
 aarch64_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
   rtx fnaddr, mem, a_tramp;
-  const int tramp_code_sz = 16;
+  const int tramp_code_sz = 24;
 
   /* Don't need to copy the trailing D-words, we fill those in below.  */
   emit_block_move (m_tramp, assemble_trampoline_template (),
@@ -11054,6 +11065,7 @@ aarch64_output_casesi (rtx *operands)
   output_asm_insn (buf, operands);
   output_asm_insn (patterns[index][1], operands);
   output_asm_insn ("br\t%3", operands);
+  output_asm_insn (aarch64_sls_barrier (aarch64_harden_sls_retbr_p ()), operands);
   assemble_label (asm_out_file, label);
   return "";
 }
@@ -22895,6 +22907,22 @@ aarch64_file_end_indicate_exec_stack ()
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
 
+/* Helper function for straight line speculation.
+   Return what barrier should be emitted for straight line speculation
+   mitigation.
+   When not mitigating against straight line speculation this function returns
+   an empty string.
+   When mitigating against straight line speculation, use:
+   * SB when the v8.5-A SB extension is enabled.
+   * DSB+ISB otherwise.  */
+const char *
+aarch64_sls_barrier (int mitigation_required)
+{
+  return mitigation_required
+    ? (TARGET_SB ? "sb" : "dsb\tsy\n\tisb")
+    : "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index ff15505d45546124868d2531b7f4e5b0f1f5bebc..75ef87a3b4674cc73cb42cc82cfb8e782acf77f6 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -447,8 +447,15 @@
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
-  "br\\t%0"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("br\\t%0", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	       (match_test "TARGET_SB") (const_int 8)]
+	      (const_int 12)))]
 )
 
 (define_insn "jump"
@@ -765,7 +772,10 @@
   "*
   return aarch64_output_casesi (operands);
   "
-  [(set_attr "length" "16")
+  [(set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 16)
+	       (match_test "TARGET_SB") (const_int 20)]
+	      (const_int 24)))
    (set_attr "type" "branch")]
 )
 
@@ -844,18 +854,26 @@
   [(return)]
   ""
   {
+    const char *ret = NULL;
     if (aarch64_return_address_signing_enabled ()
 	&& TARGET_ARMV8_3
 	&& !crtl->calls_eh_return)
       {
 	if (aarch64_ra_sign_key == AARCH64_KEY_B)
-	  return "retab";
+	  ret = "retab";
 	else
-	  return "retaa";
+	  ret = "retaa";
       }
-    return "ret";
+    else
+      ret = "ret";
+    output_asm_insn (ret, operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
   }
-  [(set_attr "type" "branch")]
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	       (match_test "TARGET_SB") (const_int 8)]
+	      (const_int 12)))]
 )
 
 (define_expand "return"
@@ -867,8 +885,15 @@
 (define_insn "simple_return"
   [(simple_return)]
   ""
-  "ret"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("ret", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set (attr "length")
+	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	       (match_test "TARGET_SB") (const_int 8)]
+	      (const_int 12)))]
 )
 
 (define_insn "*cb<optab><mode>1"
@@ -1066,10 +1091,20 @@
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%0
-   b\\t%c0"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%0", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c0";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr_alternative "length"
+     [(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	     (match_test "TARGET_SB") (const_int 8)]
+	(const_int 12))
+      (const_int 4)])]
 )
 
 (define_insn "*sibcall_value_insn"
@@ -1080,10 +1115,20 @@
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%1
-   b\\t%c1"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%1", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c1";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr_alternative "length"
+     [(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+	     (match_test "TARGET_SB") (const_int 8)]
+	(const_int 12))
+      (const_int 4)])]
 )
 
 ;; Call subroutine returning any type.
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
new file mode 100644
index 0000000000000000000000000000000000000000..11f614b4ef2eb0fa3707cb46a55583d6685b89d0
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mharden-sls=retbr -mbranch-protection=pac-ret -march=armv8.3-a" } */
+
+/* Testing the do_return pattern for retaa and retab.  */
+long retbr_subcall(void);
+long retbr_do_return_retaa(void)
+{
+    return retbr_subcall()+1;
+}
+__attribute__((target("branch-protection=pac-ret+b-key")))
+long retbr_do_return_retab(void)
+{
+    return retbr_subcall()+1;
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
+/* { dg-final { scan-assembler-not "ret\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
new file mode 100644
index 0000000000000000000000000000000000000000..5cd4da6bbb719a5135faa2c9818dc873e3d5af70
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
@@ -0,0 +1,121 @@
+/* We ensure that -Wpedantic is off since it complains about the trampolines
+   we explicitly want to test.  */
+/* { dg-additional-options "-mharden-sls=retbr -Wno-pedantic " } */
+/*
+   Ensure that the SLS hardening of RET and BR leaves no unprotected RET/BR
+   instructions.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+int
+retbr_sibcall_value_insn (struct sls_testclass x)
+{
+  return x.x(x.left, x.right);
+}
+
+void
+retbr_sibcall_insn (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+}
+
+/* Aim to test two different returns.
+   One that introduces a tail call in the middle of the function, and one that
+   has a normal return.  */
+int
+retbr_multiple_returns (struct sls_testclass x)
+{
+  int temp;
+  if (x.left % 10)
+    return x.x(x.left, 100);
+  else if (x.right % 20)
+    {
+      return x.x(x.left * x.right, 100);
+    }
+  temp = x.left % x.right;
+  temp *= 100;
+  temp /= 2;
+  return temp % 3;
+}
+
+void
+retbr_multiple_returns_void (struct sls_testclass x)
+{
+  if (x.left % 10)
+    {
+      x.y(x.left, 100);
+    }
+  else if (x.right % 20)
+    {
+      x.y(x.left * x.right, 100);
+    }
+  return;
+}
+
+/* Testing the casesi jump via register.  */
+__attribute__ ((optimize ("Os")))
+int
+retbr_casesi_dispatch (struct sls_testclass x)
+{
+  switch (x.left)
+    {
+    case -5:
+      return -2;
+    case -3:
+      return -1;
+    case 0:
+      return 0;
+    case 3:
+      return 1;
+    case 5:
+      break;
+    default:
+      __builtin_unreachable ();
+    }
+  return x.right;
+}
+
+/* Testing the BR in trampolines is mitigated against.  */
+void f1 (void *);
+void f3 (void *, void (*)(void *));
+void f2 (void *);
+
+int
+retbr_trampolines (void *a, int b)
+{
+  if (!b)
+    {
+      f1 (a);
+      return 1;
+    }
+  if (b)
+    {
+      void retbr_tramp_internal (void *c)
+      {
+	if (c == a)
+	  f2 (c);
+      }
+      f3 (a, retbr_tramp_internal);
+    }
+  return 0;
+}
+
+/* Testing the indirect_jump pattern.  */
+typedef signed __attribute__((mode(DI))) intptr_t;
+intptr_t BUF[5];
+void
+retbr_indirect_jump (intptr_t *buf)
+{
+  __builtin_longjmp(buf, 1);
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
new file mode 100644
index 0000000000000000000000000000000000000000..fb63c6dfe230e64b11919381c30a3a05eee52e16
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
@@ -0,0 +1,73 @@
+#  Regression driver for SLS mitigation on AArch64.
+#  Copyright (C) 2020-2020 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+load_lib torture-options.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " "
+}
+
+# Initialize `dg'.
+dg-init
+torture-init
+
+# Use different architectures as well as the normal optimisation options.
+# (i.e. use both SB and DSB+ISB barriers).
+
+set save-dg-do-what-default ${dg-do-what-default}
+# Main loop.
+# Run with torture tests (i.e. a bunch of different optimisation levels) just
+# to increase test coverage.
+set dg-do-what-default assemble
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+
+# Run the same tests but this time with SB extension.
+# Since not all supported assemblers will support that extension we decide
+# whether to assemble or just compile based on whether the extension is
+# supported for the available assembler.
+
+set templist {}
+foreach x $DG_TORTURE_OPTIONS {
+  lappend templist "$x -march=armv8.3-a+sb "
+  lappend templist "$x -march=armv8-a+sb "
+}
+set-torture-options $templist
+if { [check_effective_target_aarch64_asm_sb_ok] } {
+    set dg-do-what-default assemble
+} else {
+    set dg-do-what-default compile
+}
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+torture-finish
+dg-finish
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index a96b0525ba902b4d39e21123186171d951bd4e9d..6018a3ce4069d462087102b6d267d8e25b6f04dd 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9349,7 +9349,7 @@ proc check_effective_target_aarch64_tiny { } {
 # various architecture extensions via the .arch_extension pseudo-op.
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
-			  "i8mm" "f32mm" "f64mm" "bf16" } {
+			  "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
     eval [string map [list FUNC $aarch64_ext] {
 	proc check_effective_target_aarch64_asm_FUNC_ok { } {
 	  if { [istarget aarch64*-*-*] } {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch 3/3] aarch64: Mitigate SLS for BLR instruction
  2020-06-08 14:10 Straight Line Speculation (SLS) mitigation Matthew Malcomson
  2020-06-08 14:10 ` [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags Matthew Malcomson
  2020-06-08 14:10 ` [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions Matthew Malcomson
@ 2020-06-08 14:10 ` Matthew Malcomson
  2020-06-23 14:57   ` [Patch v2 " Matthew Malcomson
  2 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-08 14:10 UTC (permalink / raw)
  To: gcc-patches
  Cc: Kyrylo.Tkachov, richard.sandiford, Kristof.Beyls, Richard.Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 17495 bytes --]

This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.

This mitigation replaces BLR instructions with a BL to a stub which
simply consists of a BR to the original register.  These function stubs
are then appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.

When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.

When optimising for size we use one set of stubs for the entire
compilation unit.
This set of stubs can have human readable names, and we are currently
using `__call_indirect_x<N>` for register x<N>.

As an example when optimising for size:
a
    BLR x0
instruction would get transformed to
    BL __call_indirect_x0
with __call_indirect_x0 labelling a thunk that contains
__call_indirect_x0:
    BR X0
    speculation barrier

Since we add these function stubs to the assembly output all in one
chunk, we need not add the speculation barrier directly after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.

Special care needs to be given to this transformation occuring in
a context where BTI is enabled.  A BLR can jump to a `BTI c` target,
while a BR can only jump to a `BTI c` target if it uses the registers
x16 or x17.
Hence we use constraints to limit the registers used when this
transformation is being made in an environment that uses BTI.

This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
  instruction.  This code sequence is part of the binary interface between
  compiler and linker. If this BLR instruction needs to be mitigated, it'd
  probably be best to do so in the linker. It seems that the code sequence
  for thread-local variable access is unlikely to lead to a Spectre Revalation
  Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
  It seems that at most only after the last PLT stub a Spectre Revalation
  Gadget might appear.

Testing:
  Bootstrap and regtest on AArch64
    (with BOOT_CFLAGS="-mharden-sls=retbr,blr")
  Used a temporary hack(1) in gcc-dg.exp to use these options on every
  test in the testsuite, a slight modification to emit the speculation
  barrier after every function stub, and a script to check that the
  output never emitted a BLR, or unmitigated BR or RET instruction.
  Similar on an aarch64-none-elf cross-compiler.

1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
  a) Every RET or BR is immediately followed by a speculation barrier.
  b) No BLR instruction is emitted by compiler.


gcc/ChangeLog:

2020-06-08  Matthew Malcomson  <matthew.malcomson@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
	New declaration.
	* config/aarch64/aarch64.c (aarch64_use_return_insn_p): Return
	false if hardening BLR instructions.
	(aarch64_sls_shared_thunks): Global array to store stub labels.
	(aarch64_create_blr_label): New.
	(print_asm_branch): New macro.
	(aarch64_sls_emit_blr_function_thunks): New.
	(aarch64_sls_emit_shared_blr_thunks): New.
	(aarch64_asm_file_end): New.
	(aarch64_indirect_call_asm): New.
	(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
	(TARGET_ASM_FUNCTION_EPILOGUE): Use
	aarch64_sls_emit_blr_function_thunks.
	* config/aarch64/aarch64.h (struct machine_function): Introduce
	`call_via` array to store function-local stub labels.
	* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
	aarch64_indirect_call_asm to emit code when hardening BLR
	instructions.

gcc/testsuite/ChangeLog:

2020-06-08  Matthew Malcomson  <matthew.malcomson@arm.com>

	* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
	* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
 const char * aarch64_sls_barrier (int);
+const char * aarch64_indirect_call_asm (rtx);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -862,8 +862,9 @@ typedef struct GTY (()) machine_function
   struct aarch64_frame frame;
   /* One entry for each hard register.  */
   bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
+  rtx call_via[LAST_SAVED_REGNUM];
   bool label_is_assembled;
 } machine_function;
 #endif
 
 /* Which ABI to use.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9356937fe266c68196392a1589b3cf96607de104..93552acda553e3258ccebdb9b82979b72489ba8e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8075,6 +8075,13 @@ aarch64_expand_prologue (void)
 bool
 aarch64_use_return_insn_p (void)
 {
+  /* Documentation says we should not have the "return" pattern enabled if we
+     wish to use the TARGET_ASM_FUNCTION_EPILOGUE hook.  We wish to use that
+     hook to implement the BLR function stubs, so we always disable this
+     pattern when using those stubs.  */
+  if (aarch64_harden_sls_blr_p ())
+    return false;
+
   if (!reload_completed)
     return false;
 
@@ -22923,6 +22930,180 @@ aarch64_sls_barrier (int mitigation_required)
     : "";
 }
 
+static GTY (()) rtx aarch64_sls_shared_thunks[31];
+static GTY (()) bool aarch64_sls_shared_thunks_needed = false;
+const char *indirect_symbol_names[31] = {
+    "__call_indirect_x0",
+    "__call_indirect_x1",
+    "__call_indirect_x2",
+    "__call_indirect_x3",
+    "__call_indirect_x4",
+    "__call_indirect_x5",
+    "__call_indirect_x6",
+    "__call_indirect_x7",
+    "__call_indirect_x8",
+    "__call_indirect_x9",
+    "__call_indirect_x10",
+    "__call_indirect_x11",
+    "__call_indirect_x12",
+    "__call_indirect_x13",
+    "__call_indirect_x14",
+    "__call_indirect_x15",
+    "__call_indirect_x16",
+    "__call_indirect_x17",
+    "__call_indirect_x18",
+    "__call_indirect_x19",
+    "__call_indirect_x20",
+    "__call_indirect_x21",
+    "__call_indirect_x22",
+    "__call_indirect_x23",
+    "__call_indirect_x24",
+    "__call_indirect_x25",
+    "__call_indirect_x26",
+    "__call_indirect_x27",
+    "__call_indirect_x28",
+    "__call_indirect_x29",
+    "__call_indirect_x30",
+};
+
+/* Function to create a BLR thunk.  This thunk is used to mitigate straight
+   line speculation.  Instead of a simple BLR that can be speculated past,
+   code emits a BL to this thunk, and this thunk emits a BR to the relevant
+   register.  These thunks have the relevant speculation barries put after
+   their indirect branch so that speculation is blocked.
+
+   We use such a thunk so the speculation barriers are kept off the
+   architecturally executed path in order to reduce the performance overhead.
+
+   When optimising for size we use stubs shared by the entire compilation unit.
+   When optimising for performance we emit stubs for each function in the hope
+   that the branch predictor can better train on jumps specific for a given
+   function.  */
+rtx
+aarch64_sls_create_blr_label (int regnum)
+{
+  gcc_assert (regnum < 31);
+  if (optimize_function_for_size_p (cfun))
+    {
+      /* For the thunks shared between different functions in this compilation
+	 unit we use a named symbol -- this is just for users to more easily
+	 understand the generated assembly.  */
+      aarch64_sls_shared_thunks_needed = true;
+      if (aarch64_sls_shared_thunks[regnum] == NULL)
+	aarch64_sls_shared_thunks[regnum]
+	  = gen_rtx_SYMBOL_REF (Pmode, indirect_symbol_names[regnum]);
+
+      return aarch64_sls_shared_thunks[regnum];
+    }
+
+  if (cfun->machine->call_via[regnum] == NULL)
+    cfun->machine->call_via[regnum]
+      = gen_rtx_LABEL_REF(Pmode, gen_label_rtx ());
+  return cfun->machine->call_via[regnum];
+}
+
+/* Emit all BLR stubs for this particular function.
+   Here we emit all the BLR stubs needed for the current function.  Since we
+   emit these stubs in a consecutive block we know there will be no speculation
+   gadgets between each stub, and hence we only emit a speculation barrier at
+   the end of the stub sequences.
+
+   This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook.  */
+#define print_asm_branch(regno) asm_fprintf (out_file, "\tbr\tx%d\n", regno)
+void
+aarch64_sls_emit_blr_function_thunks (FILE *out_file)
+{
+  if (! aarch64_harden_sls_blr_p ())
+    return;
+
+  bool any_functions_emitted = false;
+  /* We must save and restore the current function section since this assembly
+     is emitted at the end of the function.  This means it can be emitted *just
+     after* the cold section of a function.  That cold part would be emitted in
+     a different section. That switch would trigger a `.cfi_endproc` directive
+     to be emitted in the original section and a `.cfi_startproc` directive to
+     be emitted in the new section.  Switching to the original section without
+     restoring would mean that the `.cfi_endproc` emitted as a function ends
+     would happen in a different section -- leaving an unmatched
+     `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc`
+     in the standard text section.  */
+  section *save_text_section = in_section;
+  switch_to_section (function_section (current_function_decl));
+  for (int regnum = 0; regnum < 31; ++regnum)
+    {
+      rtx specu_label = cfun->machine->call_via[regnum];
+      if (specu_label == NULL)
+	continue;
+
+      output_operand (specu_label, 0);
+      asm_fprintf (out_file, ":\n");
+      print_asm_branch (regnum);
+      any_functions_emitted = true;
+    }
+  if (any_functions_emitted)
+    /* Can use the SB if needs be here, since this stub will only be used
+      by the current function, and hence for the current target.  */
+    output_asm_insn (aarch64_sls_barrier (true), NULL);
+  switch_to_section (save_text_section);
+}
+
+/* Emit all BLR stubs for the current compilation unit.
+   Over the course of compiling this unit we may have converted some BLR
+   instructions to a BL to a shared stub function.  This is where we emit those
+   stub functions.
+   This function is for the stubs shared between different functions in this
+   compilation unit.  We share when optimising for size instead of speed.
+
+   This function is called through the TARGET_ASM_FILE_END hook.  */
+void
+aarch64_sls_emit_shared_blr_thunks (FILE *out_file)
+{
+  if (! aarch64_sls_shared_thunks_needed)
+    return;
+
+  switch_to_section (text_section);
+  ASM_OUTPUT_ALIGN (out_file, 2);
+  for (int regnum = 0; regnum < 31; ++regnum)
+    {
+      rtx specu_label = aarch64_sls_shared_thunks[regnum];
+      if (!specu_label)
+	continue;
+
+      ASM_OUTPUT_LABEL (out_file, indirect_symbol_names[regnum]);
+      print_asm_branch (regnum);
+    }
+  /* Use the most conservative target to ensure it can always be used by any
+     function in the translation unit.  */
+  asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n");
+}
+#undef print_asm_branch
+
+/* Implement TARGET_ASM_FILE_END.  */
+void
+aarch64_asm_file_end ()
+{
+  aarch64_sls_emit_shared_blr_thunks (asm_out_file);
+  /* Since this function will be called for the ASM_FILE_END hook, we ensure
+     that what would be called otherwise (e.g. `file_end_indicate_exec_stack`
+     for FreeBSD) still gets called.  */
+#ifdef TARGET_ASM_FILE_END
+  TARGET_ASM_FILE_END ();
+#endif
+}
+
+const char *
+aarch64_indirect_call_asm (rtx addr)
+{
+  if (aarch64_harden_sls_blr_p () && REG_P (addr))
+    {
+      rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr));
+      output_asm_insn ("bl\t%0", &stub_label);
+    }
+  else
+    output_asm_insn ("blr\t%0", &addr);
+  return "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -23473,6 +23654,12 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END aarch64_asm_file_end
+
+#undef TARGET_ASM_FUNCTION_EPILOGUE
+#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1019,15 +1019,22 @@
 )
 
 (define_insn "*call_insn"
-  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf"))
+  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Ucs, Usf"))
 	 (match_operand 1 "" ""))
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%0
+  * return aarch64_indirect_call_asm (operands[0]);
+  * return aarch64_indirect_call_asm (operands[0]);
   bl\\t%c0"
-  [(set_attr "type" "call, call")]
+  [(set_attr "type" "call, call, call")
+   (set_attr_alternative
+   "enabled" [(if_then_else (and (match_test "aarch64_enable_bti")
+				 (match_test "aarch64_harden_sls_blr_p ()"))
+			    (const_string "no")
+			    (const_string "yes"))
+	      (const_string "yes") (const_string "yes")])]
 )
 
 (define_expand "call_value"
@@ -1047,15 +1054,22 @@
 
 (define_insn "*call_value_insn"
   [(set (match_operand 0 "" "")
-	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf"))
+	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Ucs, Usf"))
 		      (match_operand 2 "" "")))
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%1
+  * return aarch64_indirect_call_asm (operands[1]);
+  * return aarch64_indirect_call_asm (operands[1]);
   bl\\t%c1"
-  [(set_attr "type" "call, call")]
+  [(set_attr "type" "call, call, call")
+   (set_attr_alternative
+   "enabled" [(if_then_else (and (match_test "aarch64_enable_bti")
+				 (match_test "aarch64_harden_sls_blr_p ()"))
+			    (const_string "no")
+			    (const_string "yes"))
+	      (const_string "yes") (const_string "yes")])]
 )
 
 (define_expand "sibcall"
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
new file mode 100644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   Here we also check that there are no BR instructions with anything except an
+   x16 or x17 register.  This is because a `BTI c` instruction can be branched
+   to using a BLR instruction using any register, but can only be branched to
+   with a BR using an x16 or x17 register.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */
+/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
new file mode 100644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
@@ -0,0 +1,35 @@
+/* { dg-additional-options "-mharden-sls=blr -save-temps" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   We only test that all BLR instructions have been removed, not that the
+   resulting code makes sense. 
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */


[-- Attachment #2: sls-miti-diff2.patch --]
[-- Type: text/plain, Size: 13255 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
 const char * aarch64_sls_barrier (int);
+const char * aarch64_indirect_call_asm (rtx);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -862,8 +862,9 @@ typedef struct GTY (()) machine_function
   struct aarch64_frame frame;
   /* One entry for each hard register.  */
   bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
+  rtx call_via[LAST_SAVED_REGNUM];
   bool label_is_assembled;
 } machine_function;
 #endif
 
 /* Which ABI to use.  */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 9356937fe266c68196392a1589b3cf96607de104..93552acda553e3258ccebdb9b82979b72489ba8e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8075,6 +8075,13 @@ aarch64_expand_prologue (void)
 bool
 aarch64_use_return_insn_p (void)
 {
+  /* Documentation says we should not have the "return" pattern enabled if we
+     wish to use the TARGET_ASM_FUNCTION_EPILOGUE hook.  We wish to use that
+     hook to implement the BLR function stubs, so we always disable this
+     pattern when using those stubs.  */
+  if (aarch64_harden_sls_blr_p ())
+    return false;
+
   if (!reload_completed)
     return false;
 
@@ -22923,6 +22930,180 @@ aarch64_sls_barrier (int mitigation_required)
     : "";
 }
 
+static GTY (()) rtx aarch64_sls_shared_thunks[31];
+static GTY (()) bool aarch64_sls_shared_thunks_needed = false;
+const char *indirect_symbol_names[31] = {
+    "__call_indirect_x0",
+    "__call_indirect_x1",
+    "__call_indirect_x2",
+    "__call_indirect_x3",
+    "__call_indirect_x4",
+    "__call_indirect_x5",
+    "__call_indirect_x6",
+    "__call_indirect_x7",
+    "__call_indirect_x8",
+    "__call_indirect_x9",
+    "__call_indirect_x10",
+    "__call_indirect_x11",
+    "__call_indirect_x12",
+    "__call_indirect_x13",
+    "__call_indirect_x14",
+    "__call_indirect_x15",
+    "__call_indirect_x16",
+    "__call_indirect_x17",
+    "__call_indirect_x18",
+    "__call_indirect_x19",
+    "__call_indirect_x20",
+    "__call_indirect_x21",
+    "__call_indirect_x22",
+    "__call_indirect_x23",
+    "__call_indirect_x24",
+    "__call_indirect_x25",
+    "__call_indirect_x26",
+    "__call_indirect_x27",
+    "__call_indirect_x28",
+    "__call_indirect_x29",
+    "__call_indirect_x30",
+};
+
+/* Function to create a BLR thunk.  This thunk is used to mitigate straight
+   line speculation.  Instead of a simple BLR that can be speculated past,
+   code emits a BL to this thunk, and this thunk emits a BR to the relevant
+   register.  These thunks have the relevant speculation barries put after
+   their indirect branch so that speculation is blocked.
+
+   We use such a thunk so the speculation barriers are kept off the
+   architecturally executed path in order to reduce the performance overhead.
+
+   When optimising for size we use stubs shared by the entire compilation unit.
+   When optimising for performance we emit stubs for each function in the hope
+   that the branch predictor can better train on jumps specific for a given
+   function.  */
+rtx
+aarch64_sls_create_blr_label (int regnum)
+{
+  gcc_assert (regnum < 31);
+  if (optimize_function_for_size_p (cfun))
+    {
+      /* For the thunks shared between different functions in this compilation
+	 unit we use a named symbol -- this is just for users to more easily
+	 understand the generated assembly.  */
+      aarch64_sls_shared_thunks_needed = true;
+      if (aarch64_sls_shared_thunks[regnum] == NULL)
+	aarch64_sls_shared_thunks[regnum]
+	  = gen_rtx_SYMBOL_REF (Pmode, indirect_symbol_names[regnum]);
+
+      return aarch64_sls_shared_thunks[regnum];
+    }
+
+  if (cfun->machine->call_via[regnum] == NULL)
+    cfun->machine->call_via[regnum]
+      = gen_rtx_LABEL_REF(Pmode, gen_label_rtx ());
+  return cfun->machine->call_via[regnum];
+}
+
+/* Emit all BLR stubs for this particular function.
+   Here we emit all the BLR stubs needed for the current function.  Since we
+   emit these stubs in a consecutive block we know there will be no speculation
+   gadgets between each stub, and hence we only emit a speculation barrier at
+   the end of the stub sequences.
+
+   This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook.  */
+#define print_asm_branch(regno) asm_fprintf (out_file, "\tbr\tx%d\n", regno)
+void
+aarch64_sls_emit_blr_function_thunks (FILE *out_file)
+{
+  if (! aarch64_harden_sls_blr_p ())
+    return;
+
+  bool any_functions_emitted = false;
+  /* We must save and restore the current function section since this assembly
+     is emitted at the end of the function.  This means it can be emitted *just
+     after* the cold section of a function.  That cold part would be emitted in
+     a different section. That switch would trigger a `.cfi_endproc` directive
+     to be emitted in the original section and a `.cfi_startproc` directive to
+     be emitted in the new section.  Switching to the original section without
+     restoring would mean that the `.cfi_endproc` emitted as a function ends
+     would happen in a different section -- leaving an unmatched
+     `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc`
+     in the standard text section.  */
+  section *save_text_section = in_section;
+  switch_to_section (function_section (current_function_decl));
+  for (int regnum = 0; regnum < 31; ++regnum)
+    {
+      rtx specu_label = cfun->machine->call_via[regnum];
+      if (specu_label == NULL)
+	continue;
+
+      output_operand (specu_label, 0);
+      asm_fprintf (out_file, ":\n");
+      print_asm_branch (regnum);
+      any_functions_emitted = true;
+    }
+  if (any_functions_emitted)
+    /* Can use the SB if needs be here, since this stub will only be used
+      by the current function, and hence for the current target.  */
+    output_asm_insn (aarch64_sls_barrier (true), NULL);
+  switch_to_section (save_text_section);
+}
+
+/* Emit all BLR stubs for the current compilation unit.
+   Over the course of compiling this unit we may have converted some BLR
+   instructions to a BL to a shared stub function.  This is where we emit those
+   stub functions.
+   This function is for the stubs shared between different functions in this
+   compilation unit.  We share when optimising for size instead of speed.
+
+   This function is called through the TARGET_ASM_FILE_END hook.  */
+void
+aarch64_sls_emit_shared_blr_thunks (FILE *out_file)
+{
+  if (! aarch64_sls_shared_thunks_needed)
+    return;
+
+  switch_to_section (text_section);
+  ASM_OUTPUT_ALIGN (out_file, 2);
+  for (int regnum = 0; regnum < 31; ++regnum)
+    {
+      rtx specu_label = aarch64_sls_shared_thunks[regnum];
+      if (!specu_label)
+	continue;
+
+      ASM_OUTPUT_LABEL (out_file, indirect_symbol_names[regnum]);
+      print_asm_branch (regnum);
+    }
+  /* Use the most conservative target to ensure it can always be used by any
+     function in the translation unit.  */
+  asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n");
+}
+#undef print_asm_branch
+
+/* Implement TARGET_ASM_FILE_END.  */
+void
+aarch64_asm_file_end ()
+{
+  aarch64_sls_emit_shared_blr_thunks (asm_out_file);
+  /* Since this function will be called for the ASM_FILE_END hook, we ensure
+     that what would be called otherwise (e.g. `file_end_indicate_exec_stack`
+     for FreeBSD) still gets called.  */
+#ifdef TARGET_ASM_FILE_END
+  TARGET_ASM_FILE_END ();
+#endif
+}
+
+const char *
+aarch64_indirect_call_asm (rtx addr)
+{
+  if (aarch64_harden_sls_blr_p () && REG_P (addr))
+    {
+      rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr));
+      output_asm_insn ("bl\t%0", &stub_label);
+    }
+  else
+    output_asm_insn ("blr\t%0", &addr);
+  return "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -23473,6 +23654,12 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END aarch64_asm_file_end
+
+#undef TARGET_ASM_FUNCTION_EPILOGUE
+#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1019,15 +1019,22 @@
 )
 
 (define_insn "*call_insn"
-  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf"))
+  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Ucs, Usf"))
 	 (match_operand 1 "" ""))
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%0
+  * return aarch64_indirect_call_asm (operands[0]);
+  * return aarch64_indirect_call_asm (operands[0]);
   bl\\t%c0"
-  [(set_attr "type" "call, call")]
+  [(set_attr "type" "call, call, call")
+   (set_attr_alternative
+   "enabled" [(if_then_else (and (match_test "aarch64_enable_bti")
+				 (match_test "aarch64_harden_sls_blr_p ()"))
+			    (const_string "no")
+			    (const_string "yes"))
+	      (const_string "yes") (const_string "yes")])]
 )
 
 (define_expand "call_value"
@@ -1047,15 +1054,22 @@
 
 (define_insn "*call_value_insn"
   [(set (match_operand 0 "" "")
-	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf"))
+	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Ucs, Usf"))
 		      (match_operand 2 "" "")))
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%1
+  * return aarch64_indirect_call_asm (operands[1]);
+  * return aarch64_indirect_call_asm (operands[1]);
   bl\\t%c1"
-  [(set_attr "type" "call, call")]
+  [(set_attr "type" "call, call, call")
+   (set_attr_alternative
+   "enabled" [(if_then_else (and (match_test "aarch64_enable_bti")
+				 (match_test "aarch64_harden_sls_blr_p ()"))
+			    (const_string "no")
+			    (const_string "yes"))
+	      (const_string "yes") (const_string "yes")])]
 )
 
 (define_expand "sibcall"
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
new file mode 100644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   Here we also check that there are no BR instructions with anything except an
+   x16 or x17 register.  This is because a `BTI c` instruction can be branched
+   to using a BLR instruction using any register, but can only be branched to
+   with a BR using an x16 or x17 register.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */
+/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
new file mode 100644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
@@ -0,0 +1,35 @@
+/* { dg-additional-options "-mharden-sls=blr -save-temps" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   We only test that all BLR instructions have been removed, not that the
+   resulting code makes sense. 
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [Patch v2 3/3] aarch64: Mitigate SLS for BLR instruction
  2020-06-08 14:10 ` [Patch 3/3] aarch64: Mitigate SLS for BLR instruction Matthew Malcomson
@ 2020-06-23 14:57   ` Matthew Malcomson
  2020-06-23 16:31     ` Richard Sandiford
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-23 14:57 UTC (permalink / raw)
  To: gcc-patches
  Cc: Richard.Earnshaw, Kyrylo.Tkachov, Marcus.Shawcroft,
	Kristof.Beyls, richard.sandiford

[-- Attachment #1: Type: text/plain, Size: 25507 bytes --]

This patch introduces the mitigation for Straight Line Speculation past
the BLR instruction.

This mitigation replaces BLR instructions with a BL to a stub which uses
a BR to jump to the original value.  These function stubs are then
appended with a speculation barrier to ensure no straight line
speculation happens after these jumps.

When optimising for speed we use a set of stubs for each function since
this should help the branch predictor make more accurate predictions
about where a stub should branch.

When optimising for size we use one set of stubs for all functions.
This set of stubs can have human readable names, and we are using
`__call_indirect_x<N>` for register x<N>.

When BTI branch protection is enabled the BLR instruction can jump to a
`BTI c` instruction using any register, while the BR instruction can
only jump to a `BTI c` instruction using the x16 or x17 registers.
Hence, in order to ensure this transformation is safe we mov the value
of the original register into x16 and use x16 for the BR.

As an example when optimising for size:
a
    BLR x0
instruction would get transformed to something like
    BL __call_indirect_x0
where __call_indirect_x0 labels a thunk that contains
__call_indirect_x0:
    MOV X16, X0
    BR X16
    <speculation barrier>


The first version of this patch used local symbols specific to a
compilation unit to try and avoid relocations.
This was mistaken since functions coming from the same compilation unit
can still be in different sections, and the assembler will insert
relocations at jumps between sections.

On any relocation the linker is permitted to emit a veneer to handle
jumps between symbols that are very far apart.  The registers x16 and
x17 may be clobbered by these veneers.
Hence the function stubs cannot rely on the values of x16 and x17 being
the same as just before the function stub is called.

Similar can be said for the hot/cold partitioning of single functions,
so function-local stubs have the same restriction.

This updated version of the patch never emits function stubs for x16 and
x17, and instead forces other registers to be used.


Given the above, there is now no benefit to local symbols (since they
are not enough to avoid dealing with linker intricacies).  This patch
now uses global symbols with hidden visibility each stored in their own
COMDAT section.  This means stubs can be shared between compilation
units while still avoiding the PLT indirection.


This patch also removes the `__call_indirect_x30` stub (and
function-local equivalent) which would simply jump back to the original
location.


The function-local stubs are emitted to the assembly output file in one
chunk, which means we need not add the speculation barrier directly
after each one.
This is because we know for certain that the instructions directly after
the BR in all but the last function stub will be from another one of
these stubs and hence will not contain a speculation gadget.
Instead we add a speculation barrier at the end of the sequence of
stubs.

The global stubs are emitted in COMDAT/.linkonce sections by
themselves so that the linker can remove duplicates from multiple object
files.  This means they are not emitted in one chunk, and each one must
include the speculation barrier.

Another difference is that since the global stubs are shared across
compilation units we do not know that all functions will be targeting an
architecture supporting the SB instruction.
Rather than provide multiple stubs for each architecture, we provide a
stub that will work for all architectures -- using the DSB+ISB barrier.


This mitigation does not apply for BLR instructions in the following
places:
- Some accesses to thread-local variables use a code sequence with a BLR
  instruction.  This code sequence is part of the binary interface between
  compiler and linker. If this BLR instruction needs to be mitigated, it'd
  probably be best to do so in the linker. It seems that the code sequence
  for thread-local variable access is unlikely to lead to a Spectre Revalation
  Gadget.
- PLT stubs are produced by the linker and each contain a BLR instruction.
  It seems that at most only after the last PLT stub a Spectre Revalation
  Gadget might appear.

Testing:
  Bootstrap and regtest on AArch64
    (with BOOT_CFLAGS="-mharden-sls=retbr,blr")
  Used a temporary hack(1) in gcc-dg.exp to use these options on every
  test in the testsuite, a slight modification to emit the speculation
  barrier after every function stub, and a script to check that the
  output never emitted a BLR, or unmitigated BR or RET instruction.
  Similar on an aarch64-none-elf cross-compiler.

1) Temporary hack emitted a speculation barrier at the end of every stub
function, and used a script to ensure that:
  a) Every RET or BR is immediately followed by a speculation barrier.
  b) No BLR instruction is emitted by compiler.


gcc/ChangeLog:

2020-06-23  Matthew Malcomson  <matthew.malcomson@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_indirect_call_asm):
	New declaration.
	* config/aarch64/aarch64.c (aarch64_regno_regclass): Handle new
	stub registers class.
	(aarch64_class_max_nregs): Likewise.
	(aarch64_register_move_cost): Likewise.
	(aarch64_sls_shared_thunks): Global array to store stub labels.
	(aarch64_sls_emit_function_stub): New.
	(aarch64_create_blr_label): New.
	(aarch64_sls_emit_blr_function_thunks): New.
	(aarch64_sls_emit_shared_blr_thunks): New.
	(aarch64_asm_file_end): New.
	(aarch64_indirect_call_asm): New.
	(TARGET_ASM_FILE_END): Use aarch64_asm_file_end.
	(TARGET_ASM_FUNCTION_EPILOGUE): Use
	aarch64_sls_emit_blr_function_thunks.
	* config/aarch64/aarch64.h (STB_REGNUM_P): New.
	(enum reg_class): Add STUB_REGS class.
	(machine_function): Introduce `call_via` array for
	function-local stub labels.
	* config/aarch64/aarch64.md (*call_insn, *call_value_insn): Use
	aarch64_indirect_call_asm to emit code when hardening BLR
	instructions.
	* config/aarch64/constraints.md (Ucr): New constraint
	representing registers for indirect calls.  Is GENERAL_REGS
	usually, and STUB_REGS when hardening BLR instruction against
	SLS.
	* config/aarch64/predicates.md (aarch64_general_reg): STUB_REGS class
	is also a general register.

gcc/testsuite/ChangeLog:

2020-06-23  Matthew Malcomson  <matthew.malcomson@arm.com>

	* gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c: New test.
	* gcc.target/aarch64/sls-mitigation/sls-miti-blr.c: New test.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
 const char * aarch64_sls_barrier (int);
+const char * aarch64_indirect_call_asm (rtx);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f996472d6990b7709602ae93f7a2cb7daa0e84b0..9795c929b8733f89722d3660456f5e7d6405d902 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version;
 #define GP_REGNUM_P(REGNO)						\
   (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM))
 
+/* Registers known to be preserved over a BL instruction.  This consists of the
+   GENERAL_REGS without x16, x17, and x30.  The x30 register is changed by the BL
+   instruction itself, while the x16 and x17 registers may be used by veneers
+   which can be inserted by the linker.  */
+#define STUB_REGNUM_P(REGNO) \
+  (GP_REGNUM_P (REGNO) \
+   && ((unsigned) (REGNO - R0_REGNUM)) != (R16_REGNUM - R0_REGNUM) \
+   && ((unsigned) (REGNO - R0_REGNUM)) != (R17_REGNUM - R0_REGNUM) \
+   && ((unsigned) (REGNO - R0_REGNUM)) != (R30_REGNUM - R0_REGNUM)) \
+
 #define FP_REGNUM_P(REGNO)			\
   (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
 
@@ -667,6 +677,7 @@ enum reg_class
 {
   NO_REGS,
   TAILCALL_ADDR_REGS,
+  STUB_REGS,
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
@@ -689,6 +700,7 @@ enum reg_class
 {						\
   "NO_REGS",					\
   "TAILCALL_ADDR_REGS",				\
+  "STUB_REGS",					\
   "GENERAL_REGS",				\
   "STACK_REG",					\
   "POINTER_REGS",				\
@@ -708,6 +720,7 @@ enum reg_class
 {									\
   { 0x00000000, 0x00000000, 0x00000000 },	/* NO_REGS */		\
   { 0x00030000, 0x00000000, 0x00000000 },	/* TAILCALL_ADDR_REGS */\
+  { 0x3ffcffff, 0x00000000, 0x00000000 },	/* STUB_REGS */		\
   { 0x7fffffff, 0x00000000, 0x00000003 },	/* GENERAL_REGS */	\
   { 0x80000000, 0x00000000, 0x00000000 },	/* STACK_REG */		\
   { 0xffffffff, 0x00000000, 0x00000003 },	/* POINTER_REGS */	\
@@ -879,6 +892,8 @@ typedef struct GTY (()) machine_function
   struct aarch64_frame frame;
   /* One entry for each hard register.  */
   bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
+  /* One entry for each general purpose register.  */
+  rtx call_via[SP_REGNUM];
   bool label_is_assembled;
 } machine_function;
 #endif
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 27a6b78ec6925106f7b745d949b510b6f273c651..17b040e2d09a8a4960fd6b02d53f4ccee78f9e93 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10607,6 +10607,9 @@ aarch64_label_mentioned_p (rtx x)
 enum reg_class
 aarch64_regno_regclass (unsigned regno)
 {
+  if (STUB_REGNUM_P (regno))
+    return STUB_REGS;
+
   if (GP_REGNUM_P (regno))
     return GENERAL_REGS;
 
@@ -10869,7 +10872,7 @@ aarch64_asm_trampoline_template (FILE *f)
      specific attributes to choose between hardening against straight line
      speculation or not, but such function specific attributes are likely to
      happen in the future.  */
-  output_asm_insn ("dsb\tsy\n\tisb", NULL);
+  asm_fprintf (f, "\tdsb\tsy\n\tisb\n");
 
   /* The trampoline needs an extra padding instruction.  In case if BTI is
      enabled the padding instruction is replaced by the BTI instruction at
@@ -10919,6 +10922,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
   unsigned int nregs, vec_flags;
   switch (regclass)
     {
+    case STUB_REGS:
     case TAILCALL_ADDR_REGS:
     case POINTER_REGS:
     case GENERAL_REGS:
@@ -13157,10 +13161,12 @@ aarch64_register_move_cost (machine_mode mode,
     = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
+  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
+      || to == STUB_REGS)
     to = GENERAL_REGS;
 
-  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
+  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
+      || from == STUB_REGS)
     from = GENERAL_REGS;
 
   /* Make RDFFR very expensive.  In particular, if we know that the FFR
@@ -22964,6 +22970,215 @@ aarch64_sls_barrier (int mitigation_required)
     : "";
 }
 
+static GTY (()) tree aarch64_sls_shared_thunks[30];
+static GTY (()) bool aarch64_sls_shared_thunks_needed = false;
+const char *indirect_symbol_names[30] = {
+    "__call_indirect_x0",
+    "__call_indirect_x1",
+    "__call_indirect_x2",
+    "__call_indirect_x3",
+    "__call_indirect_x4",
+    "__call_indirect_x5",
+    "__call_indirect_x6",
+    "__call_indirect_x7",
+    "__call_indirect_x8",
+    "__call_indirect_x9",
+    "__call_indirect_x10",
+    "__call_indirect_x11",
+    "__call_indirect_x12",
+    "__call_indirect_x13",
+    "__call_indirect_x14",
+    "__call_indirect_x15",
+    "", /* "__call_indirect_x16",  */
+    "", /* "__call_indirect_x17",  */
+    "__call_indirect_x18",
+    "__call_indirect_x19",
+    "__call_indirect_x20",
+    "__call_indirect_x21",
+    "__call_indirect_x22",
+    "__call_indirect_x23",
+    "__call_indirect_x24",
+    "__call_indirect_x25",
+    "__call_indirect_x26",
+    "__call_indirect_x27",
+    "__call_indirect_x28",
+    "__call_indirect_x29",
+};
+
+/* Function to create a BLR thunk.  This thunk is used to mitigate straight
+   line speculation.  Instead of a simple BLR that can be speculated past,
+   we emit a BL to this thunk, and this thunk contains a BR to the relevant
+   register.  These thunks have the relevant speculation barries put after
+   their indirect branch so that speculation is blocked.
+
+   We use such a thunk so the speculation barriers are kept off the
+   architecturally executed path in order to reduce the performance overhead.
+
+   When optimising for size we use stubs shared by the linked object.
+   When optimising for performance we emit stubs for each function in the hope
+   that the branch predictor can better train on jumps specific for a given
+   function.  */
+rtx
+aarch64_sls_create_blr_label (int regnum)
+{
+  gcc_assert (regnum < 30 && regnum != 16 && regnum != 17);
+  if (optimize_function_for_size_p (cfun))
+    {
+      /* For the thunks shared between different functions in this compilation
+	 unit we use a named symbol -- this is just for users to more easily
+	 understand the generated assembly.  */
+      aarch64_sls_shared_thunks_needed = true;
+      const char *thunk_name = indirect_symbol_names[regnum];
+      if (aarch64_sls_shared_thunks[regnum] == NULL)
+	{
+	  /* Build a decl representing this function stub and record it for
+	     later.  We build a decl here so we can use the GCC machinery for
+	     handling sections automatically (through `get_named_section` and
+	     `make_decl_one_only`).  That saves us a lot of trouble handling
+	     the specifics of different output file formats.  */
+	  tree decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+				  get_identifier (thunk_name),
+				  build_function_type_list (void_type_node,
+							    NULL_TREE));
+	  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
+					   NULL_TREE, void_type_node);
+	  TREE_PUBLIC (decl) = 1;
+	  TREE_STATIC (decl) = 1;
+	  DECL_IGNORED_P (decl) = 1;
+	  DECL_ARTIFICIAL (decl) = 1;
+	  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+	  resolve_unique_section (decl, 0, false);
+	  aarch64_sls_shared_thunks[regnum] = decl;
+	}
+
+      return gen_rtx_SYMBOL_REF (Pmode, thunk_name);
+    }
+
+  if (cfun->machine->call_via[regnum] == NULL)
+    cfun->machine->call_via[regnum]
+      = gen_rtx_LABEL_REF (Pmode, gen_label_rtx ());
+  return cfun->machine->call_via[regnum];
+}
+
+/* Helper function for aarch64_sls_emit_blr_function_thunks and
+   aarch64_sls_emit_shared_blr_thunks below.  */
+static void
+aarch64_sls_emit_function_stub (FILE *out_file, int regnum)
+{
+  /* Save in x16 and branch to that function so this transformation does
+     not prevent jumping to `BTI c` instructions.  */
+  asm_fprintf (out_file, "\tmov\tx16, x%d\n", regnum);
+  asm_fprintf (out_file, "\tbr\tx16\n");
+}
+
+/* Emit all BLR stubs for this particular function.
+   Here we emit all the BLR stubs needed for the current function.  Since we
+   emit these stubs in a consecutive block we know there will be no speculation
+   gadgets between each stub, and hence we only emit a speculation barrier at
+   the end of the stub sequences.
+
+   This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook.  */
+void
+aarch64_sls_emit_blr_function_thunks (FILE *out_file)
+{
+  if (! aarch64_harden_sls_blr_p ())
+    return;
+
+  bool any_functions_emitted = false;
+  /* We must save and restore the current function section since this assembly
+     is emitted at the end of the function.  This means it can be emitted *just
+     after* the cold section of a function.  That cold part would be emitted in
+     a different section. That switch would trigger a `.cfi_endproc` directive
+     to be emitted in the original section and a `.cfi_startproc` directive to
+     be emitted in the new section.  Switching to the original section without
+     restoring would mean that the `.cfi_endproc` emitted as a function ends
+     would happen in a different section -- leaving an unmatched
+     `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc`
+     in the standard text section.  */
+  section *save_text_section = in_section;
+  switch_to_section (function_section (current_function_decl));
+  for (int regnum = 0; regnum < 30; ++regnum)
+    {
+      rtx specu_label = cfun->machine->call_via[regnum];
+      if (specu_label == NULL)
+	continue;
+
+      targetm.asm_out.print_operand (out_file, specu_label, 0);
+      asm_fprintf (out_file, ":\n");
+      aarch64_sls_emit_function_stub (out_file, regnum);
+      any_functions_emitted = true;
+    }
+  if (any_functions_emitted)
+    /* Can use the SB if needs be here, since this stub will only be used
+      by the current function, and hence for the current target.  */
+    asm_fprintf (out_file, "\t%s\n", aarch64_sls_barrier (true));
+  switch_to_section (save_text_section);
+}
+
+/* Emit shared BLR stubs for the current compilation unit.
+   Over the course of compiling this unit we may have converted some BLR
+   instructions to a BL to a shared stub function.  This is where we emit those
+   stub functions.
+   This function is for the stubs shared between different functions in this
+   compilation unit.  We share when optimising for size instead of speed.
+
+   This function is called through the TARGET_ASM_FILE_END hook.  */
+void
+aarch64_sls_emit_shared_blr_thunks (FILE *out_file)
+{
+  if (! aarch64_sls_shared_thunks_needed)
+    return;
+
+  for (int regnum = 0; regnum < 30; ++regnum)
+    {
+      tree decl = aarch64_sls_shared_thunks[regnum];
+      if (!decl)
+	continue;
+
+      const char *name = indirect_symbol_names[regnum];
+      switch_to_section (get_named_section (decl, NULL, 0));
+      ASM_OUTPUT_ALIGN (out_file, 2);
+      targetm.asm_out.globalize_label (out_file, name);
+      /* Only emits if the compiler is configured for an assembler that can
+	 handle visibility directives.  */
+      targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
+      ASM_OUTPUT_TYPE_DIRECTIVE (out_file, name, "function");
+      ASM_OUTPUT_LABEL (out_file, name);
+      aarch64_sls_emit_function_stub (out_file, regnum);
+      /* Use the most conservative target to ensure it can always be used by any
+	 function in the translation unit.  */
+      asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n");
+      ASM_DECLARE_FUNCTION_SIZE (out_file, name, decl);
+    }
+}
+
+/* Implement TARGET_ASM_FILE_END.  */
+void
+aarch64_asm_file_end ()
+{
+  aarch64_sls_emit_shared_blr_thunks (asm_out_file);
+  /* Since this function will be called for the ASM_FILE_END hook, we ensure
+     that what would be called otherwise (e.g. `file_end_indicate_exec_stack`
+     for FreeBSD) still gets called.  */
+#ifdef TARGET_ASM_FILE_END
+  TARGET_ASM_FILE_END ();
+#endif
+}
+
+const char *
+aarch64_indirect_call_asm (rtx addr)
+{
+  gcc_assert (REG_P (addr));
+  if (aarch64_harden_sls_blr_p ())
+    {
+      rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr));
+      output_asm_insn ("bl\t%0", &stub_label);
+    }
+  else
+   output_asm_insn ("blr\t%0", &addr);
+  return "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -23514,6 +23729,12 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END aarch64_asm_file_end
+
+#undef TARGET_ASM_FUNCTION_EPILOGUE
+#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2a29d650a24cb5e576620f81b7f6541b0c08d044..660eb207fc87477b9cadbe74b102fca53d64400d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1019,16 +1019,15 @@
 )
 
 (define_insn "*call_insn"
-  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf"))
+  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "Ucr, Usf"))
 	 (match_operand 1 "" ""))
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%0
+  * return aarch64_indirect_call_asm (operands[0]);
   bl\\t%c0"
-  [(set_attr "type" "call, call")]
-)
+  [(set_attr "type" "call, call")])
 
 (define_expand "call_value"
   [(parallel
@@ -1047,13 +1046,13 @@
 
 (define_insn "*call_value_insn"
   [(set (match_operand 0 "" "")
-	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf"))
+	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "Ucr, Usf"))
 		      (match_operand 2 "" "")))
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%1
+  * return aarch64_indirect_call_asm (operands[1]);
   bl\\t%c1"
   [(set_attr "type" "call, call")]
 )
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index d993268a187fad9c80c32b16d8e95b26783bde24..8cc6f50888122b707a087984afc6d5ec354e1e2c 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -24,6 +24,15 @@
 (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
   "@internal Registers suitable for an indirect tail call")
 
+(define_register_constraint "Ucr"
+    "aarch64_harden_sls_blr_p () ? STUB_REGS : GENERAL_REGS"
+  "@internal Registers to be used for an indirect call.
+   This is usually the general registers, but when we are hardening against
+   Straight Line Speculation we disallow x16, x17, and x30 so we can use
+   indirection stubs.  These indirection stubs cannot use the above registers
+   since they will be reached by a BL that may have to go through a linker
+   veneer.")
+
 (define_register_constraint "w" "FP_REGS"
   "Floating point and SIMD vector registers.")
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955340288572e816216274faf84ce7b0..1754b1eff9f9bfa1117e03acaf226fde36d53375 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -32,7 +32,8 @@
 
 (define_predicate "aarch64_general_reg"
   (and (match_operand 0 "register_operand")
-       (match_test "REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
+       (match_test "REGNO_REG_CLASS (REGNO (op)) == STUB_REGS
+		    || REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
 
 ;; Return true if OP a (const_int 0) operand.
 (define_predicate "const0_operand"
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
new file mode 100644
index 0000000000000000000000000000000000000000..8adf753b4c5b4802bc80c725c9b36a5e9997b52f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   Here we also check that there are no BR instructions with anything except an
+   x16 or x17 register.  This is because a `BTI c` instruction can be branched
+   to using a BLR instruction using any register, but can only be branched to
+   with a BR using an x16 or x17 register.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */
+/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
new file mode 100644
index 0000000000000000000000000000000000000000..e8d22f438b22e763e1ee3171efc1b8c464b17185
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
@@ -0,0 +1,35 @@
+/* { dg-additional-options "-mharden-sls=blr -save-temps" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   We only test that all BLR instructions have been removed, not that the
+   resulting code makes sense. 
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */


[-- Attachment #2: sls-indirect.patch --]
[-- Type: text/plain, Size: 18960 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index d2eb739bc89ecd9d0212416b8dc3ee4ba236a271..e79f9cbc783e75132e999395ff975f9768436419 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -781,6 +781,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
 const char * aarch64_sls_barrier (int);
+const char * aarch64_indirect_call_asm (rtx);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index f996472d6990b7709602ae93f7a2cb7daa0e84b0..9795c929b8733f89722d3660456f5e7d6405d902 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version;
 #define GP_REGNUM_P(REGNO)						\
   (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM))
 
+/* Registers known to be preserved over a BL instruction.  This consists of the
+   GENERAL_REGS without x16, x17, and x30.  The x30 register is changed by the BL
+   instruction itself, while the x16 and x17 registers may be used by veneers
+   which can be inserted by the linker.  */
+#define STUB_REGNUM_P(REGNO) \
+  (GP_REGNUM_P (REGNO) \
+   && ((unsigned) (REGNO - R0_REGNUM)) != (R16_REGNUM - R0_REGNUM) \
+   && ((unsigned) (REGNO - R0_REGNUM)) != (R17_REGNUM - R0_REGNUM) \
+   && ((unsigned) (REGNO - R0_REGNUM)) != (R30_REGNUM - R0_REGNUM)) \
+
 #define FP_REGNUM_P(REGNO)			\
   (((unsigned) (REGNO - V0_REGNUM)) <= (V31_REGNUM - V0_REGNUM))
 
@@ -667,6 +677,7 @@ enum reg_class
 {
   NO_REGS,
   TAILCALL_ADDR_REGS,
+  STUB_REGS,
   GENERAL_REGS,
   STACK_REG,
   POINTER_REGS,
@@ -689,6 +700,7 @@ enum reg_class
 {						\
   "NO_REGS",					\
   "TAILCALL_ADDR_REGS",				\
+  "STUB_REGS",					\
   "GENERAL_REGS",				\
   "STACK_REG",					\
   "POINTER_REGS",				\
@@ -708,6 +720,7 @@ enum reg_class
 {									\
   { 0x00000000, 0x00000000, 0x00000000 },	/* NO_REGS */		\
   { 0x00030000, 0x00000000, 0x00000000 },	/* TAILCALL_ADDR_REGS */\
+  { 0x3ffcffff, 0x00000000, 0x00000000 },	/* STUB_REGS */		\
   { 0x7fffffff, 0x00000000, 0x00000003 },	/* GENERAL_REGS */	\
   { 0x80000000, 0x00000000, 0x00000000 },	/* STACK_REG */		\
   { 0xffffffff, 0x00000000, 0x00000003 },	/* POINTER_REGS */	\
@@ -879,6 +892,8 @@ typedef struct GTY (()) machine_function
   struct aarch64_frame frame;
   /* One entry for each hard register.  */
   bool reg_is_wrapped_separately[LAST_SAVED_REGNUM];
+  /* One entry for each general purpose register.  */
+  rtx call_via[SP_REGNUM];
   bool label_is_assembled;
 } machine_function;
 #endif
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 27a6b78ec6925106f7b745d949b510b6f273c651..17b040e2d09a8a4960fd6b02d53f4ccee78f9e93 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10607,6 +10607,9 @@ aarch64_label_mentioned_p (rtx x)
 enum reg_class
 aarch64_regno_regclass (unsigned regno)
 {
+  if (STUB_REGNUM_P (regno))
+    return STUB_REGS;
+
   if (GP_REGNUM_P (regno))
     return GENERAL_REGS;
 
@@ -10869,7 +10872,7 @@ aarch64_asm_trampoline_template (FILE *f)
      specific attributes to choose between hardening against straight line
      speculation or not, but such function specific attributes are likely to
      happen in the future.  */
-  output_asm_insn ("dsb\tsy\n\tisb", NULL);
+  asm_fprintf (f, "\tdsb\tsy\n\tisb\n");
 
   /* The trampoline needs an extra padding instruction.  In case if BTI is
      enabled the padding instruction is replaced by the BTI instruction at
@@ -10919,6 +10922,7 @@ aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
   unsigned int nregs, vec_flags;
   switch (regclass)
     {
+    case STUB_REGS:
     case TAILCALL_ADDR_REGS:
     case POINTER_REGS:
     case GENERAL_REGS:
@@ -13157,10 +13161,12 @@ aarch64_register_move_cost (machine_mode mode,
     = aarch64_tune_params.regmove_cost;
 
   /* Caller save and pointer regs are equivalent to GENERAL_REGS.  */
-  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS)
+  if (to == TAILCALL_ADDR_REGS || to == POINTER_REGS
+      || to == STUB_REGS)
     to = GENERAL_REGS;
 
-  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS)
+  if (from == TAILCALL_ADDR_REGS || from == POINTER_REGS
+      || from == STUB_REGS)
     from = GENERAL_REGS;
 
   /* Make RDFFR very expensive.  In particular, if we know that the FFR
@@ -22964,6 +22970,215 @@ aarch64_sls_barrier (int mitigation_required)
     : "";
 }
 
+static GTY (()) tree aarch64_sls_shared_thunks[30];
+static GTY (()) bool aarch64_sls_shared_thunks_needed = false;
+const char *indirect_symbol_names[30] = {
+    "__call_indirect_x0",
+    "__call_indirect_x1",
+    "__call_indirect_x2",
+    "__call_indirect_x3",
+    "__call_indirect_x4",
+    "__call_indirect_x5",
+    "__call_indirect_x6",
+    "__call_indirect_x7",
+    "__call_indirect_x8",
+    "__call_indirect_x9",
+    "__call_indirect_x10",
+    "__call_indirect_x11",
+    "__call_indirect_x12",
+    "__call_indirect_x13",
+    "__call_indirect_x14",
+    "__call_indirect_x15",
+    "", /* "__call_indirect_x16",  */
+    "", /* "__call_indirect_x17",  */
+    "__call_indirect_x18",
+    "__call_indirect_x19",
+    "__call_indirect_x20",
+    "__call_indirect_x21",
+    "__call_indirect_x22",
+    "__call_indirect_x23",
+    "__call_indirect_x24",
+    "__call_indirect_x25",
+    "__call_indirect_x26",
+    "__call_indirect_x27",
+    "__call_indirect_x28",
+    "__call_indirect_x29",
+};
+
+/* Function to create a BLR thunk.  This thunk is used to mitigate straight
+   line speculation.  Instead of a simple BLR that can be speculated past,
+   we emit a BL to this thunk, and this thunk contains a BR to the relevant
+   register.  These thunks have the relevant speculation barries put after
+   their indirect branch so that speculation is blocked.
+
+   We use such a thunk so the speculation barriers are kept off the
+   architecturally executed path in order to reduce the performance overhead.
+
+   When optimising for size we use stubs shared by the linked object.
+   When optimising for performance we emit stubs for each function in the hope
+   that the branch predictor can better train on jumps specific for a given
+   function.  */
+rtx
+aarch64_sls_create_blr_label (int regnum)
+{
+  gcc_assert (regnum < 30 && regnum != 16 && regnum != 17);
+  if (optimize_function_for_size_p (cfun))
+    {
+      /* For the thunks shared between different functions in this compilation
+	 unit we use a named symbol -- this is just for users to more easily
+	 understand the generated assembly.  */
+      aarch64_sls_shared_thunks_needed = true;
+      const char *thunk_name = indirect_symbol_names[regnum];
+      if (aarch64_sls_shared_thunks[regnum] == NULL)
+	{
+	  /* Build a decl representing this function stub and record it for
+	     later.  We build a decl here so we can use the GCC machinery for
+	     handling sections automatically (through `get_named_section` and
+	     `make_decl_one_only`).  That saves us a lot of trouble handling
+	     the specifics of different output file formats.  */
+	  tree decl = build_decl (BUILTINS_LOCATION, FUNCTION_DECL,
+				  get_identifier (thunk_name),
+				  build_function_type_list (void_type_node,
+							    NULL_TREE));
+	  DECL_RESULT (decl) = build_decl (BUILTINS_LOCATION, RESULT_DECL,
+					   NULL_TREE, void_type_node);
+	  TREE_PUBLIC (decl) = 1;
+	  TREE_STATIC (decl) = 1;
+	  DECL_IGNORED_P (decl) = 1;
+	  DECL_ARTIFICIAL (decl) = 1;
+	  make_decl_one_only (decl, DECL_ASSEMBLER_NAME (decl));
+	  resolve_unique_section (decl, 0, false);
+	  aarch64_sls_shared_thunks[regnum] = decl;
+	}
+
+      return gen_rtx_SYMBOL_REF (Pmode, thunk_name);
+    }
+
+  if (cfun->machine->call_via[regnum] == NULL)
+    cfun->machine->call_via[regnum]
+      = gen_rtx_LABEL_REF (Pmode, gen_label_rtx ());
+  return cfun->machine->call_via[regnum];
+}
+
+/* Helper function for aarch64_sls_emit_blr_function_thunks and
+   aarch64_sls_emit_shared_blr_thunks below.  */
+static void
+aarch64_sls_emit_function_stub (FILE *out_file, int regnum)
+{
+  /* Save in x16 and branch to that function so this transformation does
+     not prevent jumping to `BTI c` instructions.  */
+  asm_fprintf (out_file, "\tmov\tx16, x%d\n", regnum);
+  asm_fprintf (out_file, "\tbr\tx16\n");
+}
+
+/* Emit all BLR stubs for this particular function.
+   Here we emit all the BLR stubs needed for the current function.  Since we
+   emit these stubs in a consecutive block we know there will be no speculation
+   gadgets between each stub, and hence we only emit a speculation barrier at
+   the end of the stub sequences.
+
+   This is called in the TARGET_ASM_FUNCTION_EPILOGUE hook.  */
+void
+aarch64_sls_emit_blr_function_thunks (FILE *out_file)
+{
+  if (! aarch64_harden_sls_blr_p ())
+    return;
+
+  bool any_functions_emitted = false;
+  /* We must save and restore the current function section since this assembly
+     is emitted at the end of the function.  This means it can be emitted *just
+     after* the cold section of a function.  That cold part would be emitted in
+     a different section. That switch would trigger a `.cfi_endproc` directive
+     to be emitted in the original section and a `.cfi_startproc` directive to
+     be emitted in the new section.  Switching to the original section without
+     restoring would mean that the `.cfi_endproc` emitted as a function ends
+     would happen in a different section -- leaving an unmatched
+     `.cfi_startproc` in the cold text section and an unmatched `.cfi_endproc`
+     in the standard text section.  */
+  section *save_text_section = in_section;
+  switch_to_section (function_section (current_function_decl));
+  for (int regnum = 0; regnum < 30; ++regnum)
+    {
+      rtx specu_label = cfun->machine->call_via[regnum];
+      if (specu_label == NULL)
+	continue;
+
+      targetm.asm_out.print_operand (out_file, specu_label, 0);
+      asm_fprintf (out_file, ":\n");
+      aarch64_sls_emit_function_stub (out_file, regnum);
+      any_functions_emitted = true;
+    }
+  if (any_functions_emitted)
+    /* Can use the SB if needs be here, since this stub will only be used
+      by the current function, and hence for the current target.  */
+    asm_fprintf (out_file, "\t%s\n", aarch64_sls_barrier (true));
+  switch_to_section (save_text_section);
+}
+
+/* Emit shared BLR stubs for the current compilation unit.
+   Over the course of compiling this unit we may have converted some BLR
+   instructions to a BL to a shared stub function.  This is where we emit those
+   stub functions.
+   This function is for the stubs shared between different functions in this
+   compilation unit.  We share when optimising for size instead of speed.
+
+   This function is called through the TARGET_ASM_FILE_END hook.  */
+void
+aarch64_sls_emit_shared_blr_thunks (FILE *out_file)
+{
+  if (! aarch64_sls_shared_thunks_needed)
+    return;
+
+  for (int regnum = 0; regnum < 30; ++regnum)
+    {
+      tree decl = aarch64_sls_shared_thunks[regnum];
+      if (!decl)
+	continue;
+
+      const char *name = indirect_symbol_names[regnum];
+      switch_to_section (get_named_section (decl, NULL, 0));
+      ASM_OUTPUT_ALIGN (out_file, 2);
+      targetm.asm_out.globalize_label (out_file, name);
+      /* Only emits if the compiler is configured for an assembler that can
+	 handle visibility directives.  */
+      targetm.asm_out.assemble_visibility (decl, VISIBILITY_HIDDEN);
+      ASM_OUTPUT_TYPE_DIRECTIVE (out_file, name, "function");
+      ASM_OUTPUT_LABEL (out_file, name);
+      aarch64_sls_emit_function_stub (out_file, regnum);
+      /* Use the most conservative target to ensure it can always be used by any
+	 function in the translation unit.  */
+      asm_fprintf (out_file, "\tdsb\tsy\n\tisb\n");
+      ASM_DECLARE_FUNCTION_SIZE (out_file, name, decl);
+    }
+}
+
+/* Implement TARGET_ASM_FILE_END.  */
+void
+aarch64_asm_file_end ()
+{
+  aarch64_sls_emit_shared_blr_thunks (asm_out_file);
+  /* Since this function will be called for the ASM_FILE_END hook, we ensure
+     that what would be called otherwise (e.g. `file_end_indicate_exec_stack`
+     for FreeBSD) still gets called.  */
+#ifdef TARGET_ASM_FILE_END
+  TARGET_ASM_FILE_END ();
+#endif
+}
+
+const char *
+aarch64_indirect_call_asm (rtx addr)
+{
+  gcc_assert (REG_P (addr));
+  if (aarch64_harden_sls_blr_p ())
+    {
+      rtx stub_label = aarch64_sls_create_blr_label (REGNO (addr));
+      output_asm_insn ("bl\t%0", &stub_label);
+    }
+  else
+   output_asm_insn ("blr\t%0", &addr);
+  return "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -23514,6 +23729,12 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_MD_ASM_ADJUST
 #define TARGET_MD_ASM_ADJUST arm_md_asm_adjust
 
+#undef TARGET_ASM_FILE_END
+#define TARGET_ASM_FILE_END aarch64_asm_file_end
+
+#undef TARGET_ASM_FUNCTION_EPILOGUE
+#define TARGET_ASM_FUNCTION_EPILOGUE aarch64_sls_emit_blr_function_thunks
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-aarch64.h"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2a29d650a24cb5e576620f81b7f6541b0c08d044..660eb207fc87477b9cadbe74b102fca53d64400d 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1019,16 +1019,15 @@
 )
 
 (define_insn "*call_insn"
-  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "r, Usf"))
+  [(call (mem:DI (match_operand:DI 0 "aarch64_call_insn_operand" "Ucr, Usf"))
 	 (match_operand 1 "" ""))
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%0
+  * return aarch64_indirect_call_asm (operands[0]);
   bl\\t%c0"
-  [(set_attr "type" "call, call")]
-)
+  [(set_attr "type" "call, call")])
 
 (define_expand "call_value"
   [(parallel
@@ -1047,13 +1046,13 @@
 
 (define_insn "*call_value_insn"
   [(set (match_operand 0 "" "")
-	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "r, Usf"))
+	(call (mem:DI (match_operand:DI 1 "aarch64_call_insn_operand" "Ucr, Usf"))
 		      (match_operand 2 "" "")))
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (clobber (reg:DI LR_REGNUM))]
   ""
   "@
-  blr\\t%1
+  * return aarch64_indirect_call_asm (operands[1]);
   bl\\t%c1"
   [(set_attr "type" "call, call")]
 )
diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md
index d993268a187fad9c80c32b16d8e95b26783bde24..8cc6f50888122b707a087984afc6d5ec354e1e2c 100644
--- a/gcc/config/aarch64/constraints.md
+++ b/gcc/config/aarch64/constraints.md
@@ -24,6 +24,15 @@
 (define_register_constraint "Ucs" "TAILCALL_ADDR_REGS"
   "@internal Registers suitable for an indirect tail call")
 
+(define_register_constraint "Ucr"
+    "aarch64_harden_sls_blr_p () ? STUB_REGS : GENERAL_REGS"
+  "@internal Registers to be used for an indirect call.
+   This is usually the general registers, but when we are hardening against
+   Straight Line Speculation we disallow x16, x17, and x30 so we can use
+   indirection stubs.  These indirection stubs cannot use the above registers
+   since they will be reached by a BL that may have to go through a linker
+   veneer.")
+
 (define_register_constraint "w" "FP_REGS"
   "Floating point and SIMD vector registers.")
 
diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md
index 215fcec5955340288572e816216274faf84ce7b0..1754b1eff9f9bfa1117e03acaf226fde36d53375 100644
--- a/gcc/config/aarch64/predicates.md
+++ b/gcc/config/aarch64/predicates.md
@@ -32,7 +32,8 @@
 
 (define_predicate "aarch64_general_reg"
   (and (match_operand 0 "register_operand")
-       (match_test "REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
+       (match_test "REGNO_REG_CLASS (REGNO (op)) == STUB_REGS
+		    || REGNO_REG_CLASS (REGNO (op)) == GENERAL_REGS")))
 
 ;; Return true if OP a (const_int 0) operand.
 (define_predicate "const0_operand"
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
new file mode 100644
index 0000000000000000000000000000000000000000..8adf753b4c5b4802bc80c725c9b36a5e9997b52f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr-bti.c
@@ -0,0 +1,40 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mharden-sls=blr -mbranch-protection=bti" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   Here we also check that there are no BR instructions with anything except an
+   x16 or x17 register.  This is because a `BTI c` instruction can be branched
+   to using a BLR instruction using any register, but can only be branched to
+   with a BR using an x16 or x17 register.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler-not "\tbr\tx(?!16|17)" } } */
+/* { dg-final { scan-assembler "\tbr\tx(16|17)" } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
new file mode 100644
index 0000000000000000000000000000000000000000..e8d22f438b22e763e1ee3171efc1b8c464b17185
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-blr.c
@@ -0,0 +1,35 @@
+/* { dg-additional-options "-mharden-sls=blr -save-temps" } */
+/*
+   Ensure that the SLS hardening of BLR leaves no BLR instructions.
+   We only test that all BLR instructions have been removed, not that the
+   resulting code makes sense. 
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+/* We test both RTL patterns for a call which returns a value and a call which
+   does not.  */
+int blr_call_value (struct sls_testclass x)
+{
+  int retval = x.x(x.left, x.right);
+  if (retval % 10)
+    return 100;
+  return 9;
+}
+
+int blr_call (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+  if (x.left % 10)
+    return 100;
+  return 9;
+}
+
+/* { dg-final { scan-assembler-not "\tblr\t" } } */
+/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags
  2020-06-08 14:10 ` [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags Matthew Malcomson
@ 2020-06-23 15:48   ` Richard Sandiford
  2020-06-23 17:07     ` Matthew Malcomson
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Sandiford @ 2020-06-23 15:48 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> @@ -14466,6 +14466,81 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
>    return false;
>  }
>  
> +

Should just be one blank line here.

> +/* Straight line speculation indicators.  */
> +enum aarch64_sls_hardening_type
> +{
> +    SLS_NONE = 0,
> +    SLS_RETBR = 1,
> +    SLS_BLR = 2,
> +    SLS_ALL = 3,

Just indent by two spaces rather than four.

> +};
> +static enum aarch64_sls_hardening_type aarch64_sls_hardening;

Maybe easier to read with a line break here.

> +/* Return whether we should mitigatate Straight Line Speculation for the RET
> +   and BR instructions.  */
> +bool
> +aarch64_harden_sls_retbr_p (void)
> +{
> +  return aarch64_sls_hardening & SLS_RETBR;
> +}

…and here.

> +/* Return whether we should mitigatate Straight Line Speculation for the RET
> +   and BR instructions.  */
> +bool
> +aarch64_harden_sls_blr_p (void)
> +{
> +  return aarch64_sls_hardening & SLS_BLR;
> +}

Pasto: returns true for BLR speculation instead of RET + BR.

> +
> +/* As of yet we only allow setting these options globally, in the future we may
> +   allow setting them per function.  */
> +static void
> +aarch64_validate_sls_mitigation (const char *const_str)
> +{
> +  char *str_root = xstrdup (const_str);
> +  char *token_save = NULL;
> +  char *str = NULL;
> +  int temp = SLS_NONE;
> +
> +  aarch64_sls_hardening = SLS_NONE;
> +  if (strcmp (str_root, "none") == 0)
> +    goto finish;

In Clang I think this would override any previous option, so should
we set aarch64_sls_hardening to 0?

> +  if (strcmp (str_root, "all") == 0)
> +    {
> +      aarch64_sls_hardening = SLS_ALL;
> +      goto finish;
> +    }
> +
> +  str = strtok_r (str_root, ",", &token_save);
> +  if (!str)
> +    {
> +      error ("invalid argument given to %<-mharden-sls=%>");
> +      goto finish;
> +    }

I'm not particularly anti-goto, but in this case it looks simpler
to do the full-string comparisons on const_str and only duplicate
the string before the strtok_r.

> +  while (str)
> +    {
> +      if (strcmp (str, "blr") == 0)
> +	temp |= SLS_BLR;
> +      else if (strcmp (str, "retbr") == 0)
> +	temp |= SLS_RETBR;
> +      else if (strcmp (str, "none") == 0 || strcmp (str, "all") == 0)
> +	{
> +	  error ("%<%s%> must be by itself for %<-mharden-sls=%>", str);
> +	  break;
> +	}
> +      else
> +	{
> +	  error ("invalid argument %<%s%> for %<-mharden-sls=%>", str);
> +	  break;
> +	}
> +      str = strtok_r (NULL, ",", &token_save);
> +    }
> +  aarch64_sls_hardening = (aarch64_sls_hardening_type) temp;
> +finish:
> +  free (str_root);
> +  return;
> +}

Think it's more usual in gcc not to have explicit end-of-function void
returns.

>  /* Parses CONST_STR for branch protection features specified in
>     aarch64_branch_protect_types, and set any global variables required.  Returns
>     the parsing result and assigns LAST_STR to the last processed token from
> @@ -14710,6 +14785,9 @@ aarch64_override_options (void)
>    selected_arch = NULL;
>    selected_tune = NULL;
>  
> +  if (aarch64_harden_sls_string)
> +      aarch64_validate_sls_mitigation (aarch64_harden_sls_string);

Last line is indented two spaces too many.

> +
>    if (aarch64_branch_protection_string)
>      aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
>  
> diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
> index d99d14c137d8774d3c8dab860d475f68c01a2817..5170361fd5e5721e044d1664e522b2718f654b8e 100644
> --- a/gcc/config/aarch64/aarch64.opt
> +++ b/gcc/config/aarch64/aarch64.opt
> @@ -71,6 +71,10 @@ mgeneral-regs-only
>  Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
>  Generate code which uses only the general registers.
>  
> +mharden-sls=
> +Target RejectNegative Joined Var(aarch64_harden_sls_string)
> +Generate code to mitigate against straight line speculation.
> +
>  mfix-cortex-a53-835769
>  Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
>  Workaround for ARM Cortex-A53 Erratum number 835769.
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 35e8242af5fa4c52744fd2c3e2cfee0a617e22bb..8a3fab2964c9bb06c820766d284768751d63ac9a 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -696,6 +696,7 @@ Objective-C and Objective-C++ Dialects}.
>  -msign-return-address=@var{scope} @gol
>  -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
>  +@var{b-key}]|@var{bti} @gol
> +-mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr} @gol
>  -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
>  -moverride=@var{string}  -mverbose-cost-dump @gol
>  -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
> @@ -17045,6 +17046,15 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
>  with the B-key instead of the A-key.
>  @samp{bti} turns on branch target identification mechanism.
>  
> +@item -mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr}
> +@opindex mharden-sls
> +Enable compiler hardening against straight line speculation (SLS).
> +There are two options for hardening against straight line speculation.
> +@samp{retbr} allows inserting speculation barriers after every
> +@samp{br} and @samp{ret} instruction.  While @samp{blr} enables replacing
> +@samp{blr} instructions with a @samp{bl} to a function stub.
> +@samp{all} enables all SLS hardening, while @samp{none} does not enable any.

OK, so this is even more picky, sorry, but the syntax and description
imply to me that you can choose only one of the four options.  I think
it would be more accurate to say something like:

@item -mharden-sls=@var{opts}
@opindex mharden-sls
Enable compiler hardening against straight line speculation (SLS).
@var{opts} is a comma-separated list of the following options:
@table @samp
@item retbr
…
@item blr
…
@end table
In addition, @samp{-mharden-sls=all} enables all SLS hardening
while @samp{-mharden-sls=none} disables all SLS hardening.

(assuming the above behaviour change for “none”)

Thanks,
Richard


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-06-08 14:10 ` [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions Matthew Malcomson
@ 2020-06-23 16:17   ` Richard Sandiford
  2020-06-23 16:49     ` Matthew Malcomson
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Sandiford @ 2020-06-23 16:17 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -780,6 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
>  
>  tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
>  
> +const char * aarch64_sls_barrier (int);

Should be no space after the “*”.

>  extern bool aarch64_harden_sls_retbr_p (void);
>  extern bool aarch64_harden_sls_blr_p (void);
>  
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index 24767c747bab0d711627c5c646937c42f210d70b..5da3d94e335fc315e1d90e6a674f2f09cf1a4529 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -281,6 +281,7 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_ISA_F32MM	   (aarch64_isa_flags & AARCH64_FL_F32MM)
>  #define AARCH64_ISA_F64MM	   (aarch64_isa_flags & AARCH64_FL_F64MM)
>  #define AARCH64_ISA_BF16	   (aarch64_isa_flags & AARCH64_FL_BF16)
> +#define AARCH64_ISA_SB  	   (aarch64_isa_flags & AARCH64_FL_SB)
>  
>  /* Crypto is an optional extension to AdvSIMD.  */
>  #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
> @@ -378,6 +379,9 @@ extern unsigned aarch64_architecture_version;
>  #define TARGET_FIX_ERR_A53_835769_DEFAULT 1
>  #endif
>  
> +/* SB instruction is enabled through +sb.  */
> +#define TARGET_SB (AARCH64_ISA_SB)
> +
>  /* Apply the workaround for Cortex-A53 erratum 835769.  */
>  #define TARGET_FIX_ERR_A53_835769	\
>    ((aarch64_fix_a53_err835769 == 2)	\
> @@ -1058,8 +1062,11 @@ typedef struct
>  
>  #define RETURN_ADDR_RTX aarch64_return_addr
>  
> -/* BTI c + 3 insns + 2 pointer-sized entries.  */
> -#define TRAMPOLINE_SIZE	(TARGET_ILP32 ? 24 : 32)
> +/* BTI c + 3 insns
> +   + sls barrier of DSB + ISB.
> +   + 2 pointer-sized entries.  */
> +#define TRAMPOLINE_SIZE	(24 \
> +			 + (TARGET_ILP32 ? 8 : 16))

Personal taste, sorry, but IMO this is easier to read on one line.

>  
>  /* Trampolines contain dwords, so must be dword aligned.  */
>  #define TRAMPOLINE_ALIGNMENT 64
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 775f49991e5f599a843d3ef490b8cd044acfe78f..9356937fe266c68196392a1589b3cf96607de104 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10822,8 +10822,8 @@ aarch64_return_addr (int count, rtx frame ATTRIBUTE_UNUSED)
>  static void
>  aarch64_asm_trampoline_template (FILE *f)
>  {
> -  int offset1 = 16;
> -  int offset2 = 20;
> +  int offset1 = 24;
> +  int offset2 = 28;

Huh, the offset handling in this function is a bit twisty, but that's
not your fault :-)

> […]
> @@ -11054,6 +11065,7 @@ aarch64_output_casesi (rtx *operands)
>    output_asm_insn (buf, operands);
>    output_asm_insn (patterns[index][1], operands);
>    output_asm_insn ("br\t%3", operands);
> +  output_asm_insn (aarch64_sls_barrier (aarch64_harden_sls_retbr_p ()), operands);

Long line.

> […]
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index ff15505d45546124868d2531b7f4e5b0f1f5bebc..75ef87a3b4674cc73cb42cc82cfb8e782acf77f6 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -447,8 +447,15 @@
>  (define_insn "indirect_jump"
>    [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
>    ""
> -  "br\\t%0"
> -  [(set_attr "type" "branch")]
> +  {
> +    output_asm_insn ("br\\t%0", operands);
> +    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
> +  }
> +  [(set_attr "type" "branch")
> +   (set (attr "length")
> +	(cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
> +	       (match_test "TARGET_SB") (const_int 8)]
> +	      (const_int 12)))]

Rather than duplicating this several times, I think it would be better
to add a new attribute like “sls_mitigation”, set that attribute in the
define_insns, and then use “sls_mitigation” in the default “length”
calculation.  See e.g. what rth did with “movprfx”.

> […]
> diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..11f614b4ef2eb0fa3707cb46a55583d6685b89d0
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-mharden-sls=retbr -mbranch-protection=pac-ret -march=armv8.3-a" } */
> +
> +/* Testing the do_return pattern for retaa and retab.  */
> +long retbr_subcall(void);
> +long retbr_do_return_retaa(void)
> +{
> +    return retbr_subcall()+1;
> +}
> +__attribute__((target("branch-protection=pac-ret+b-key")))
> +long retbr_do_return_retab(void)
> +{
> +    return retbr_subcall()+1;
> +}
> +
> +/* Ensure there are no BR or RET instructions which are not directly followed
> +   by a speculation barrier.  */
> +/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */

Isn't the “sb” alternative invalid given the -march option?

Probably slightly easier to read if the regexp is quoted using {…}
rather than "…".  Same for the other tests.

> […]
> +/* { dg-final { scan-assembler-not "ret\t" } } */
> diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
> new file mode 100644
> index 0000000000000000000000000000000000000000..5cd4da6bbb719a5135faa2c9818dc873e3d5af70
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
> […]
> +/* Testing the indirect_jump pattern.  */
> +typedef signed __attribute__((mode(DI))) intptr_t;

Just to check, have you tested this with -mabi=ilp32?  Looks like it'll
probably be OK, was just suspicious because this isn't “intptr_t” there.

xp b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
> new file mode 100644
> index 0000000000000000000000000000000000000000..fb63c6dfe230e64b11919381c30a3a05eee52e16
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
> @@ -0,0 +1,73 @@
> +#  Regression driver for SLS mitigation on AArch64.
> +#  Copyright (C) 2020-2020 Free Software Foundation, Inc.

Just 2020.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch v2 3/3] aarch64: Mitigate SLS for BLR instruction
  2020-06-23 14:57   ` [Patch v2 " Matthew Malcomson
@ 2020-06-23 16:31     ` Richard Sandiford
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2020-06-23 16:31 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Richard.Earnshaw, Kyrylo.Tkachov, Marcus.Shawcroft,
	Kristof.Beyls

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
> index f996472d6990b7709602ae93f7a2cb7daa0e84b0..9795c929b8733f89722d3660456f5e7d6405d902 100644
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -643,6 +643,16 @@ extern unsigned aarch64_architecture_version;
>  #define GP_REGNUM_P(REGNO)						\
>    (((unsigned) (REGNO - R0_REGNUM)) <= (R30_REGNUM - R0_REGNUM))
>  
> +/* Registers known to be preserved over a BL instruction.  This consists of the
> +   GENERAL_REGS without x16, x17, and x30.  The x30 register is changed by the BL

Long line.

> +   instruction itself, while the x16 and x17 registers may be used by veneers
> +   which can be inserted by the linker.  */
> +#define STUB_REGNUM_P(REGNO) \
> +  (GP_REGNUM_P (REGNO) \
> +   && ((unsigned) (REGNO - R0_REGNUM)) != (R16_REGNUM - R0_REGNUM) \
> +   && ((unsigned) (REGNO - R0_REGNUM)) != (R17_REGNUM - R0_REGNUM) \
> +   && ((unsigned) (REGNO - R0_REGNUM)) != (R30_REGNUM - R0_REGNUM)) \

Sorry, I should have noticed this before, but we can just compare
(REGNO) directly with R16_REGNUM etc, with subtracting R0_REGNUM from
both sides.  The R0_REGNUM stuff is only needed for range comparisons,
where the idea is to avoid reevaluating REGNO.

> […]
> @@ -10869,7 +10872,7 @@ aarch64_asm_trampoline_template (FILE *f)
>       specific attributes to choose between hardening against straight line
>       speculation or not, but such function specific attributes are likely to
>       happen in the future.  */
> -  output_asm_insn ("dsb\tsy\n\tisb", NULL);
> +  asm_fprintf (f, "\tdsb\tsy\n\tisb\n");

Looks like this should be part of 2/3.

> […]
> +rtx
> +aarch64_sls_create_blr_label (int regnum)
> +{
> +  gcc_assert (regnum < 30 && regnum != 16 && regnum != 17);

Can just use STUB_REGNUM_P here.

> […]
> +/* Emit shared BLR stubs for the current compilation unit.
> +   Over the course of compiling this unit we may have converted some BLR
> +   instructions to a BL to a shared stub function.  This is where we emit those
> +   stub functions.
> +   This function is for the stubs shared between different functions in this
> +   compilation unit.  We share when optimising for size instead of speed.

optimizing (alas).

> […]
> +/* { dg-final { scan-assembler "\tbr\tx\[0-9\]\[0-9\]?" } } */

Probably easier to read with {…} quoting rather than "…" quoting,
so that no backslashes are needed for [ and ].

OK with those changes, thanks.

Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-06-23 16:17   ` Richard Sandiford
@ 2020-06-23 16:49     ` Matthew Malcomson
  2020-06-23 16:56       ` Richard Sandiford
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-23 16:49 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd,
	richard.sandiford

On 23/06/2020 17:17, Richard Sandiford wrote:
> Matthew Malcomson <matthew.malcomson@arm.com> writes:
>> --- a/gcc/config/aarch64/aarch64-protos.h
>> +/* Ensure there are no BR or RET instructions which are not directly followed
>> +   by a speculation barrier.  */
>> +/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
> 
> Isn't the “sb” alternative invalid given the -march option?
> 
> Probably slightly easier to read if the regexp is quoted using {…}
> rather than "…".  Same for the other tests.
> 

Just to check before I respin:  Using {} instead of "" means I need to 
replace \t with a literal tab -- do you still prefer it?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-06-23 16:49     ` Matthew Malcomson
@ 2020-06-23 16:56       ` Richard Sandiford
  2020-06-23 16:58         ` Matthew Malcomson
  2020-07-03 13:33         ` Matthew Malcomson
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Sandiford @ 2020-06-23 16:56 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> On 23/06/2020 17:17, Richard Sandiford wrote:
>> Matthew Malcomson <matthew.malcomson@arm.com> writes:
>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>> +/* Ensure there are no BR or RET instructions which are not directly followed
>>> +   by a speculation barrier.  */
>>> +/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
>> 
>> Isn't the “sb” alternative invalid given the -march option?
>> 
>> Probably slightly easier to read if the regexp is quoted using {…}
>> rather than "…".  Same for the other tests.
>> 
>
> Just to check before I respin:  Using {} instead of "" means I need to 
> replace \t with a literal tab -- do you still prefer it?

Are you sure?  We've been using tests like:

/* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */

for SVE without problems.  Using {…} means that backslash quoting
is applied by the regexp parser rather than the Tcl string parser,
but both should work for things like \t.

Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-06-23 16:56       ` Richard Sandiford
@ 2020-06-23 16:58         ` Matthew Malcomson
  2020-07-03 13:33         ` Matthew Malcomson
  1 sibling, 0 replies; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-23 16:58 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd,
	richard.sandiford

On 23/06/2020 17:56, Richard Sandiford wrote:
> Matthew Malcomson <matthew.malcomson@arm.com> writes:
>> On 23/06/2020 17:17, Richard Sandiford wrote:
>>> Matthew Malcomson <matthew.malcomson@arm.com> writes:
>>>> --- a/gcc/config/aarch64/aarch64-protos.h
>>>> +/* Ensure there are no BR or RET instructions which are not directly followed
>>>> +   by a speculation barrier.  */
>>>> +/* { dg-final { scan-assembler-not "\t(br|ret|retaa|retab)\tx\[0-9\]\[0-9\]?\n\t(?!dsb\tsy\n\tisb|sb)" } } */
>>>
>>> Isn't the “sb” alternative invalid given the -march option?
>>>
>>> Probably slightly easier to read if the regexp is quoted using {…}
>>> rather than "…".  Same for the other tests.
>>>
>>
>> Just to check before I respin:  Using {} instead of "" means I need to
>> replace \t with a literal tab -- do you still prefer it?
> 
> Are you sure?  We've been using tests like:
> 
> /* { dg-final { scan-assembler-times {\tadd\tz[0-9]+\.s, z[0-9]+\.s, z[0-9]+\.s\n} 1 } } */
> 
> for SVE without problems.  Using {…} means that backslash quoting
> is applied by the regexp parser rather than the Tcl string parser,
> but both should work for things like \t.
> 
> Richard
> 

Ah -- my mistake -- I was just checking with `string compare` while 
making the change and didn't think too hard when I saw a -1.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags
  2020-06-23 15:48   ` Richard Sandiford
@ 2020-06-23 17:07     ` Matthew Malcomson
  2020-06-23 17:12       ` Richard Sandiford
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-06-23 17:07 UTC (permalink / raw)
  To: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd,
	richard.sandiford

On 23/06/2020 16:48, Richard Sandiford wrote:
> Matthew Malcomson <matthew.malcomson@arm.com> writes:
>> @@ -14466,6 +14466,81 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
>>     return false;
>>   mfix-cortex-a53-835769
>>   Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
>>   Workaround for ARM Cortex-A53 Erratum number 835769.
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 35e8242af5fa4c52744fd2c3e2cfee0a617e22bb..8a3fab2964c9bb06c820766d284768751d63ac9a 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -696,6 +696,7 @@ Objective-C and Objective-C++ Dialects}.
>>   -msign-return-address=@var{scope} @gol
>>   -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
>>   +@var{b-key}]|@var{bti} @gol
>> +-mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr} @gol
>>   -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
>>   -moverride=@var{string}  -mverbose-cost-dump @gol
>>   -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
>> @@ -17045,6 +17046,15 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
>>   with the B-key instead of the A-key.
>>   @samp{bti} turns on branch target identification mechanism.
>>   
>> +@item -mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr}
>> +@opindex mharden-sls
>> +Enable compiler hardening against straight line speculation (SLS).
>> +There are two options for hardening against straight line speculation.
>> +@samp{retbr} allows inserting speculation barriers after every
>> +@samp{br} and @samp{ret} instruction.  While @samp{blr} enables replacing
>> +@samp{blr} instructions with a @samp{bl} to a function stub.
>> +@samp{all} enables all SLS hardening, while @samp{none} does not enable any.
> 
> OK, so this is even more picky, sorry, but the syntax and description
> imply to me that you can choose only one of the four options.  I think
> it would be more accurate to say something like:
> 
> @item -mharden-sls=@var{opts}
> @opindex mharden-sls
> Enable compiler hardening against straight line speculation (SLS).
> @var{opts} is a comma-separated list of the following options:
> @table @samp
> @item retbr
> …
> @item blr
> …
> @end table
> In addition, @samp{-mharden-sls=all} enables all SLS hardening
> while @samp{-mharden-sls=none} disables all SLS hardening.
> 
> (assuming the above behaviour change for “none”)
> 
> Thanks,
> Richard
> 

Another "just to check": the same change should be made in the short 
form right? (i.e. the hunk above is now `-mharden-sls=@var{opts}`)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags
  2020-06-23 17:07     ` Matthew Malcomson
@ 2020-06-23 17:12       ` Richard Sandiford
  2020-07-03 13:27         ` Matthew Malcomson
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Sandiford @ 2020-06-23 17:12 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> On 23/06/2020 16:48, Richard Sandiford wrote:
>> Matthew Malcomson <matthew.malcomson@arm.com> writes:
>>> @@ -14466,6 +14466,81 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
>>>     return false;
>>>   mfix-cortex-a53-835769
>>>   Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
>>>   Workaround for ARM Cortex-A53 Erratum number 835769.
>>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>>> index 35e8242af5fa4c52744fd2c3e2cfee0a617e22bb..8a3fab2964c9bb06c820766d284768751d63ac9a 100644
>>> --- a/gcc/doc/invoke.texi
>>> +++ b/gcc/doc/invoke.texi
>>> @@ -696,6 +696,7 @@ Objective-C and Objective-C++ Dialects}.
>>>   -msign-return-address=@var{scope} @gol
>>>   -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
>>>   +@var{b-key}]|@var{bti} @gol
>>> +-mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr} @gol
>>>   -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
>>>   -moverride=@var{string}  -mverbose-cost-dump @gol
>>>   -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
>>> @@ -17045,6 +17046,15 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
>>>   with the B-key instead of the A-key.
>>>   @samp{bti} turns on branch target identification mechanism.
>>>   
>>> +@item -mharden-sls=@var{none}|@var{all}|@var{retbr}|@var{blr}
>>> +@opindex mharden-sls
>>> +Enable compiler hardening against straight line speculation (SLS).
>>> +There are two options for hardening against straight line speculation.
>>> +@samp{retbr} allows inserting speculation barriers after every
>>> +@samp{br} and @samp{ret} instruction.  While @samp{blr} enables replacing
>>> +@samp{blr} instructions with a @samp{bl} to a function stub.
>>> +@samp{all} enables all SLS hardening, while @samp{none} does not enable any.
>> 
>> OK, so this is even more picky, sorry, but the syntax and description
>> imply to me that you can choose only one of the four options.  I think
>> it would be more accurate to say something like:
>> 
>> @item -mharden-sls=@var{opts}
>> @opindex mharden-sls
>> Enable compiler hardening against straight line speculation (SLS).
>> @var{opts} is a comma-separated list of the following options:
>> @table @samp
>> @item retbr
>> …
>> @item blr
>> …
>> @end table
>> In addition, @samp{-mharden-sls=all} enables all SLS hardening
>> while @samp{-mharden-sls=none} disables all SLS hardening.
>> 
>> (assuming the above behaviour change for “none”)
>> 
>> Thanks,
>> Richard
>> 
>
> Another "just to check": the same change should be made in the short 
> form right? (i.e. the hunk above is now `-mharden-sls=@var{opts}`)

Yeah.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags
  2020-06-23 17:12       ` Richard Sandiford
@ 2020-07-03 13:27         ` Matthew Malcomson
  2020-07-06 10:05           ` Richard Sandiford
  0 siblings, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-07-03 13:27 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd

[-- Attachment #1: Type: text/plain, Size: 6662 bytes --]

With suggestions applied.


Here we introduce the flags that will be used for straight line speculation.

The new flag introduced is `-mharden-sls=`.
This flag can take arguments of `none`, `all`, or a comma seperated list of one
or more of `retbr` or `blr`.
`none` indicates no special mitigation of the straight line speculation
vulnerability.
`all` requests all mitigations currently implemented.
`retbr` requests that the RET and BR instructions have a speculation barrier
inserted after them.
`blr` requests that BLR instructions are replaced by a BL to a function stub
using a BR with a speculation barrier after it.

Setting this on a per-function basis using attributes or the like is not
enabled, but may be in the future.

gcc/ChangeLog:

2020-07-03  Matthew Malcomson  <matthew.malcomson@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_harden_sls_retbr_p):
	New.
	(aarch64_harden_sls_blr_p): New.
	* config/aarch64/aarch64.c (enum aarch64_sls_hardening_type):
	New.
	(aarch64_harden_sls_retbr_p): New.
	(aarch64_harden_sls_blr_p): New.
	(aarch64_validate_sls_mitigation): New.
	(aarch64_override_options): Parse options for SLS mitigation.
	* config/aarch64/aarch64.opt (-mharden-sls): New option.
	* doc/invoke.texi: Document new option.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..8ca67d7e69edaf73c84f079e7e1c483009ad10c0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,4 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+extern bool aarch64_harden_sls_retbr_p (void);
+extern bool aarch64_harden_sls_blr_p (void);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f3551a73d87c4e686540f39224985592c3c66fd1..b1a7c10c4eaadd78eb45926c23efc51a8272b5fd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14502,6 +14502,80 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
   return false;
 }
 
+/* Straight line speculation indicators.  */
+enum aarch64_sls_hardening_type
+{
+  SLS_NONE = 0,
+  SLS_RETBR = 1,
+  SLS_BLR = 2,
+  SLS_ALL = 3,
+};
+static enum aarch64_sls_hardening_type aarch64_sls_hardening;
+
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_retbr_p (void)
+{
+  return aarch64_sls_hardening & SLS_RETBR;
+}
+
+/* Return whether we should mitigatate Straight Line Speculation for the BLR
+   instruction.  */
+bool
+aarch64_harden_sls_blr_p (void)
+{
+  return aarch64_sls_hardening & SLS_BLR;
+}
+
+/* As of yet we only allow setting these options globally, in the future we may
+   allow setting them per function.  */
+static void
+aarch64_validate_sls_mitigation (const char *const_str)
+{
+  char *token_save = NULL;
+  char *str = NULL;
+
+  aarch64_sls_hardening = SLS_NONE;
+  if (strcmp (const_str, "none") == 0)
+    {
+      aarch64_sls_hardening = SLS_NONE;
+      return;
+    }
+  if (strcmp (const_str, "all") == 0)
+    {
+      aarch64_sls_hardening = SLS_ALL;
+      return;
+    }
+
+  char *str_root = xstrdup (const_str);
+  str = strtok_r (str_root, ",", &token_save);
+  if (!str)
+    error ("invalid argument given to %<-mharden-sls=%>");
+
+  int temp = SLS_NONE;
+  while (str)
+    {
+      if (strcmp (str, "blr") == 0)
+	temp |= SLS_BLR;
+      else if (strcmp (str, "retbr") == 0)
+	temp |= SLS_RETBR;
+      else if (strcmp (str, "none") == 0 || strcmp (str, "all") == 0)
+	{
+	  error ("%<%s%> must be by itself for %<-mharden-sls=%>", str);
+	  break;
+	}
+      else
+	{
+	  error ("invalid argument %<%s%> for %<-mharden-sls=%>", str);
+	  break;
+	}
+      str = strtok_r (NULL, ",", &token_save);
+    }
+  aarch64_sls_hardening = (aarch64_sls_hardening_type) temp;
+  free (str_root);
+}
+
 /* Parses CONST_STR for branch protection features specified in
    aarch64_branch_protect_types, and set any global variables required.  Returns
    the parsing result and assigns LAST_STR to the last processed token from
@@ -14746,6 +14820,9 @@ aarch64_override_options (void)
   selected_arch = NULL;
   selected_tune = NULL;
 
+  if (aarch64_harden_sls_string)
+    aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
+
   if (aarch64_branch_protection_string)
     aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index d99d14c137d8774d3c8dab860d475f68c01a2817..5170361fd5e5721e044d1664e522b2718f654b8e 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -71,6 +71,10 @@ mgeneral-regs-only
 Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses only the general registers.
 
+mharden-sls=
+Target RejectNegative Joined Var(aarch64_harden_sls_string)
+Generate code to mitigate against straight line speculation.
+
 mfix-cortex-a53-835769
 Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
 Workaround for ARM Cortex-A53 Erratum number 835769.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 98cc0f2f0de1d89928d98edcb2f2fa99f040f195..fd71f7c79019b681008d08284eff2878790b7ffa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -698,6 +698,7 @@ Objective-C and Objective-C++ Dialects}.
 -msign-return-address=@var{scope} @gol
 -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
 +@var{b-key}]|@var{bti} @gol
+-mharden-sls=@var{opts} @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
@@ -17367,6 +17368,17 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
 with the B-key instead of the A-key.
 @samp{bti} turns on branch target identification mechanism.
 
+@item -mharden-sls=@var{opts}
+@opindex mharden-sls
+Enable compiler hardening against straight line speculation (SLS).
+@var{opts} is a comma-separated list of the following options:
+@table @samp
+@item retbr
+@item blr
+@end table
+In addition, @samp{-mharden-sls=all} enables all SLS hardening while
+@samp{-mharden-sls=none} disables all SLS hardening.
+
 @item -msve-vector-bits=@var{bits}
 @opindex msve-vector-bits
 Specify the number of bits in an SVE vector register.  This option only has


[-- Attachment #2: sls-cli.patch --]
[-- Type: text/plain, Size: 5320 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 9e43adb7db0373df6cc5ef1d2b22f217aca2aad2..8ca67d7e69edaf73c84f079e7e1c483009ad10c0 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,4 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+extern bool aarch64_harden_sls_retbr_p (void);
+extern bool aarch64_harden_sls_blr_p (void);
+
 #endif /* GCC_AARCH64_PROTOS_H */
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index f3551a73d87c4e686540f39224985592c3c66fd1..b1a7c10c4eaadd78eb45926c23efc51a8272b5fd 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -14502,6 +14502,80 @@ aarch64_validate_mcpu (const char *str, const struct processor **res,
   return false;
 }
 
+/* Straight line speculation indicators.  */
+enum aarch64_sls_hardening_type
+{
+  SLS_NONE = 0,
+  SLS_RETBR = 1,
+  SLS_BLR = 2,
+  SLS_ALL = 3,
+};
+static enum aarch64_sls_hardening_type aarch64_sls_hardening;
+
+/* Return whether we should mitigatate Straight Line Speculation for the RET
+   and BR instructions.  */
+bool
+aarch64_harden_sls_retbr_p (void)
+{
+  return aarch64_sls_hardening & SLS_RETBR;
+}
+
+/* Return whether we should mitigatate Straight Line Speculation for the BLR
+   instruction.  */
+bool
+aarch64_harden_sls_blr_p (void)
+{
+  return aarch64_sls_hardening & SLS_BLR;
+}
+
+/* As of yet we only allow setting these options globally, in the future we may
+   allow setting them per function.  */
+static void
+aarch64_validate_sls_mitigation (const char *const_str)
+{
+  char *token_save = NULL;
+  char *str = NULL;
+
+  aarch64_sls_hardening = SLS_NONE;
+  if (strcmp (const_str, "none") == 0)
+    {
+      aarch64_sls_hardening = SLS_NONE;
+      return;
+    }
+  if (strcmp (const_str, "all") == 0)
+    {
+      aarch64_sls_hardening = SLS_ALL;
+      return;
+    }
+
+  char *str_root = xstrdup (const_str);
+  str = strtok_r (str_root, ",", &token_save);
+  if (!str)
+    error ("invalid argument given to %<-mharden-sls=%>");
+
+  int temp = SLS_NONE;
+  while (str)
+    {
+      if (strcmp (str, "blr") == 0)
+	temp |= SLS_BLR;
+      else if (strcmp (str, "retbr") == 0)
+	temp |= SLS_RETBR;
+      else if (strcmp (str, "none") == 0 || strcmp (str, "all") == 0)
+	{
+	  error ("%<%s%> must be by itself for %<-mharden-sls=%>", str);
+	  break;
+	}
+      else
+	{
+	  error ("invalid argument %<%s%> for %<-mharden-sls=%>", str);
+	  break;
+	}
+      str = strtok_r (NULL, ",", &token_save);
+    }
+  aarch64_sls_hardening = (aarch64_sls_hardening_type) temp;
+  free (str_root);
+}
+
 /* Parses CONST_STR for branch protection features specified in
    aarch64_branch_protect_types, and set any global variables required.  Returns
    the parsing result and assigns LAST_STR to the last processed token from
@@ -14746,6 +14820,9 @@ aarch64_override_options (void)
   selected_arch = NULL;
   selected_tune = NULL;
 
+  if (aarch64_harden_sls_string)
+    aarch64_validate_sls_mitigation (aarch64_harden_sls_string);
+
   if (aarch64_branch_protection_string)
     aarch64_validate_mbranch_protection (aarch64_branch_protection_string);
 
diff --git a/gcc/config/aarch64/aarch64.opt b/gcc/config/aarch64/aarch64.opt
index d99d14c137d8774d3c8dab860d475f68c01a2817..5170361fd5e5721e044d1664e522b2718f654b8e 100644
--- a/gcc/config/aarch64/aarch64.opt
+++ b/gcc/config/aarch64/aarch64.opt
@@ -71,6 +71,10 @@ mgeneral-regs-only
 Target Report RejectNegative Mask(GENERAL_REGS_ONLY) Save
 Generate code which uses only the general registers.
 
+mharden-sls=
+Target RejectNegative Joined Var(aarch64_harden_sls_string)
+Generate code to mitigate against straight line speculation.
+
 mfix-cortex-a53-835769
 Target Report Var(aarch64_fix_a53_err835769) Init(2) Save
 Workaround for ARM Cortex-A53 Erratum number 835769.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 98cc0f2f0de1d89928d98edcb2f2fa99f040f195..fd71f7c79019b681008d08284eff2878790b7ffa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -698,6 +698,7 @@ Objective-C and Objective-C++ Dialects}.
 -msign-return-address=@var{scope} @gol
 -mbranch-protection=@var{none}|@var{standard}|@var{pac-ret}[+@var{leaf}
 +@var{b-key}]|@var{bti} @gol
+-mharden-sls=@var{opts} @gol
 -march=@var{name}  -mcpu=@var{name}  -mtune=@var{name}  @gol
 -moverride=@var{string}  -mverbose-cost-dump @gol
 -mstack-protector-guard=@var{guard} -mstack-protector-guard-reg=@var{sysreg} @gol
@@ -17367,6 +17368,17 @@ functions.  The optional argument @samp{b-key} can be used to sign the functions
 with the B-key instead of the A-key.
 @samp{bti} turns on branch target identification mechanism.
 
+@item -mharden-sls=@var{opts}
+@opindex mharden-sls
+Enable compiler hardening against straight line speculation (SLS).
+@var{opts} is a comma-separated list of the following options:
+@table @samp
+@item retbr
+@item blr
+@end table
+In addition, @samp{-mharden-sls=all} enables all SLS hardening while
+@samp{-mharden-sls=none} disables all SLS hardening.
+
 @item -msve-vector-bits=@var{bits}
 @opindex msve-vector-bits
 Specify the number of bits in an SVE vector register.  This option only has


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-06-23 16:56       ` Richard Sandiford
  2020-06-23 16:58         ` Matthew Malcomson
@ 2020-07-03 13:33         ` Matthew Malcomson
  2020-07-08 14:30           ` Richard Sandiford
  1 sibling, 1 reply; 17+ messages in thread
From: Matthew Malcomson @ 2020-07-03 13:33 UTC (permalink / raw)
  To: Richard Sandiford
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw,
	matthew.malcomson

[-- Attachment #1: Type: text/plain, Size: 19919 bytes --]

With suggestions applied.
Testing with `-mabi=ilp32` found a bug around the trampoline
initialisation where the new larger size of the trampoline caused a
different execution path of `emit_block_move` which ICE'd on the
pre-existing `ptr_mode` address.


Commit Message
------

Instructions following RET or BR are not necessarily executed.  In order
to avoid speculation past RET and BR we can simply append a speculation
barrier.

Since these speculation barriers will not be architecturally executed,
they are not expected to add a high performance penalty.

The speculation barrier is to be SB when targeting architectures which
have this enabled, and DSB SY + ISB otherwise.

We add tests for each of the cases where such an instruction was seen.

This is implemented by modifying each machine description pattern that
emits either a RET or a BR instruction.  We choose not to use something
like `TARGET_ASM_FUNCTION_EPILOGUE` since it does not affect the
`indirect_jump`, `jump`, `sibcall_insn` and `sibcall_value_insn`
patterns and we find it preferable to implement the functionality in the
same way for every pattern.

There is one particular case which is slightly tricky.  The
implementation of TARGET_ASM_TRAMPOLINE_TEMPLATE uses a BR which needs
to be mitigated against.  The trampoline template is used *once* per
compilation unit, and the TRAMPOLINE_SIZE is exposed to the user via the
builtin macro __LIBGCC_TRAMPOLINE_SIZE__.
In the future we may implement function specific attributes to turn on
and off hardening on a per-function basis.
The fixed nature of the trampoline described above implies it will be
safer to ensure this speculation barrier is always used.

Testing:
  Bootstrap and regtest done on aarch64-none-linux
  Used a temporary hack(1) to use these options on every test in the
  testsuite and a script to check that the output never emitted an
  unmitigated RET or BR.


1) Temporary hack was a change to the testsuite to always use
`-save-temps` and run a script on the assembly output of those
compilations which produced one to ensure every RET or BR is immediately
followed by a speculation barrier.


gcc/ChangeLog:

2020-07-03  Matthew Malcomson  <matthew.malcomson@arm.com>

	* config/aarch64/aarch64-protos.h (aarch64_sls_barrier): New.
	* config/aarch64/aarch64.c (aarch64_output_casesi): Emit
	speculation barrier after BR instruction if needs be.
	(aarch64_trampoline_init): Handle ptr_mode value & adjust size
	of code copied.
	(aarch64_sls_barrier): New.
	(aarch64_asm_trampoline_template): Add needed barriers.
	* config/aarch64/aarch64.h (AARCH64_ISA_SB): New.
	(TARGET_SB): New.
	(TRAMPOLINE_SIZE): Account for barrier.
	* config/aarch64/aarch64.md (indirect_jump, *casesi_dispatch,
	*do_return, simple_return, *sibcall_insn, *sibcall_value_insn):
	Emit barrier if needs be, also account for possible barrier using
	"sls_length" attribute.
	(sls_length): New attribute.
	(length): Determine default using any non-default sls_length
	value.
	* config/aarch64/aarch64.opt (-mharden-sls-retbr): Introduce new
	option.

gcc/testsuite/ChangeLog:

2020-07-03  Matthew Malcomson  <matthew.malcomson@arm.com>

	* gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c: New test.
	* gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c:
	New test.
	* gcc.target/aarch64/sls-mitigation/sls-mitigation.exp: New file.
	* lib/target-supports.exp (check_effective_target_aarch64_asm_sb_ok):
	New proc.



###############     Attachment also inlined for ease of reply    ###############


diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8ca67d7e69edaf73c84f079e7e1c483009ad10c0..b035e4ec78e2ef1c9a931148dffacf6a50345b84 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,6 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+const char *aarch64_sls_barrier (int);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2be52fd4d73def0007795159298e3d3e8fc4399d..d60d295830d3e422bb4267de275597d2087b99e6 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -281,6 +281,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_F32MM	   (aarch64_isa_flags & AARCH64_FL_F32MM)
 #define AARCH64_ISA_F64MM	   (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16	   (aarch64_isa_flags & AARCH64_FL_BF16)
+#define AARCH64_ISA_SB		   (aarch64_isa_flags & AARCH64_FL_SB)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -378,6 +379,9 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_FIX_ERR_A53_835769_DEFAULT 1
 #endif
 
+/* SB instruction is enabled through +sb.  */
+#define TARGET_SB (AARCH64_ISA_SB)
+
 /* Apply the workaround for Cortex-A53 erratum 835769.  */
 #define TARGET_FIX_ERR_A53_835769	\
   ((aarch64_fix_a53_err835769 == 2)	\
@@ -1075,8 +1079,10 @@ typedef struct
 
 #define RETURN_ADDR_RTX aarch64_return_addr
 
-/* BTI c + 3 insns + 2 pointer-sized entries.  */
-#define TRAMPOLINE_SIZE	(TARGET_ILP32 ? 24 : 32)
+/* BTI c + 3 insns
+   + sls barrier of DSB + ISB.
+   + 2 pointer-sized entries.  */
+#define TRAMPOLINE_SIZE	(24 + (TARGET_ILP32 ? 8 : 16))
 
 /* Trampolines contain dwords, so must be dword aligned.  */
 #define TRAMPOLINE_ALIGNMENT 64
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1a7c10c4eaadd78eb45926c23efc51a8272b5fd..18ac55ab15d4f42c1e3744ac3741b5b90f888c91 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10836,8 +10836,8 @@ aarch64_return_addr (int count, rtx frame ATTRIBUTE_UNUSED)
 static void
 aarch64_asm_trampoline_template (FILE *f)
 {
-  int offset1 = 16;
-  int offset2 = 20;
+  int offset1 = 24;
+  int offset2 = 28;
 
   if (aarch64_bti_enabled ())
     {
@@ -10860,6 +10860,17 @@ aarch64_asm_trampoline_template (FILE *f)
     }
   asm_fprintf (f, "\tbr\t%s\n", reg_names [IP1_REGNUM]);
 
+  /* We always emit a speculation barrier.
+     This is because the same trampoline template is used for every nested
+     function.  Since nested functions are not particularly common or
+     performant we don't worry too much about the extra instructions to copy
+     around.
+     This is not yet a problem, since we have not yet implemented function
+     specific attributes to choose between hardening against straight line
+     speculation or not, but such function specific attributes are likely to
+     happen in the future.  */
+  asm_fprintf (f, "\tdsb\tsy\n\tisb\n");
+
   /* The trampoline needs an extra padding instruction.  In case if BTI is
      enabled the padding instruction is replaced by the BTI instruction at
      the beginning.  */
@@ -10874,10 +10885,14 @@ static void
 aarch64_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
   rtx fnaddr, mem, a_tramp;
-  const int tramp_code_sz = 16;
+  const int tramp_code_sz = 24;
 
   /* Don't need to copy the trailing D-words, we fill those in below.  */
-  emit_block_move (m_tramp, assemble_trampoline_template (),
+  /* We create our own memory address in Pmode so that `emit_block_move` can
+     use parts of the backend which expect Pmode addresses.  */
+  rtx temp = convert_memory_address (Pmode, XEXP (m_tramp, 0));
+  emit_block_move (gen_rtx_MEM (BLKmode, temp),
+		   assemble_trampoline_template (),
 		   GEN_INT (tramp_code_sz), BLOCK_OP_NORMAL);
   mem = adjust_address (m_tramp, ptr_mode, tramp_code_sz);
   fnaddr = XEXP (DECL_RTL (fndecl), 0);
@@ -11068,6 +11083,8 @@ aarch64_output_casesi (rtx *operands)
   output_asm_insn (buf, operands);
   output_asm_insn (patterns[index][1], operands);
   output_asm_insn ("br\t%3", operands);
+  output_asm_insn (aarch64_sls_barrier (aarch64_harden_sls_retbr_p ()),
+		   operands);
   assemble_label (asm_out_file, label);
   return "";
 }
@@ -22935,6 +22952,22 @@ aarch64_file_end_indicate_exec_stack ()
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
 
+/* Helper function for straight line speculation.
+   Return what barrier should be emitted for straight line speculation
+   mitigation.
+   When not mitigating against straight line speculation this function returns
+   an empty string.
+   When mitigating against straight line speculation, use:
+   * SB when the v8.5-A SB extension is enabled.
+   * DSB+ISB otherwise.  */
+const char *
+aarch64_sls_barrier (int mitigation_required)
+{
+  return mitigation_required
+    ? (TARGET_SB ? "sb" : "dsb\tsy\n\tisb")
+    : "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index deca0004fedcb41c1e6b88ef3f8b4b187b4eecf8..9e76741d2d7e3e5d238fbbe8b41e6d824f97bd35 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -407,10 +407,25 @@
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
 
+;; Attribute to specify that an alternative has the length of a single
+;; instruction plus a speculation barrier.
+(define_attr "sls_length" "none,retbr,casesi" (const_string "none"))
+
 (define_attr "length" ""
   (cond [(eq_attr "movprfx" "yes")
            (const_int 8)
-        ] (const_int 4)))
+
+	 (eq_attr "sls_length" "retbr")
+	   (cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+		  (match_test "TARGET_SB") (const_int 8)]
+		 (const_int 12))
+
+	 (eq_attr "sls_length" "casesi")
+	   (cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 16)
+		  (match_test "TARGET_SB") (const_int 20)]
+		 (const_int 24))
+	]
+	  (const_int 4)))
 
 ;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has
 ;; no predicated insns.
@@ -447,8 +462,12 @@
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
-  "br\\t%0"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("br\\t%0", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set_attr "sls_length" "retbr")]
 )
 
 (define_insn "jump"
@@ -765,7 +784,7 @@
   "*
   return aarch64_output_casesi (operands);
   "
-  [(set_attr "length" "16")
+  [(set_attr "sls_length" "casesi")
    (set_attr "type" "branch")]
 )
 
@@ -844,18 +863,23 @@
   [(return)]
   ""
   {
+    const char *ret = NULL;
     if (aarch64_return_address_signing_enabled ()
 	&& TARGET_ARMV8_3
 	&& !crtl->calls_eh_return)
       {
 	if (aarch64_ra_sign_key == AARCH64_KEY_B)
-	  return "retab";
+	  ret = "retab";
 	else
-	  return "retaa";
+	  ret = "retaa";
       }
-    return "ret";
+    else
+      ret = "ret";
+    output_asm_insn (ret, operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
   }
-  [(set_attr "type" "branch")]
+  [(set_attr "type" "branch")
+   (set_attr "sls_length" "retbr")]
 )
 
 (define_expand "return"
@@ -867,8 +891,12 @@
 (define_insn "simple_return"
   [(simple_return)]
   ""
-  "ret"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("ret", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set_attr "sls_length" "retbr")]
 )
 
 (define_insn "*cb<optab><mode>1"
@@ -1066,10 +1094,16 @@
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%0
-   b\\t%c0"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%0", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c0";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr "sls_length" "retbr,none")]
 )
 
 (define_insn "*sibcall_value_insn"
@@ -1080,10 +1114,16 @@
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%1
-   b\\t%c1"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%1", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c1";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr "sls_length" "retbr,none")]
 )
 
 ;; Call subroutine returning any type.
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
new file mode 100644
index 0000000000000000000000000000000000000000..fa1887a71e75d11be6cfff8bb5a7f4d89627d01f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
@@ -0,0 +1,21 @@
+/* Avoid ILP32 since pacret is only available for LP64 */
+/* { dg-do compile { target { ! ilp32 } } } */
+/* { dg-additional-options "-mharden-sls=retbr -mbranch-protection=pac-ret -march=armv8.3-a" } */
+
+/* Testing the do_return pattern for retaa and retab.  */
+long retbr_subcall(void);
+long retbr_do_return_retaa(void)
+{
+    return retbr_subcall()+1;
+}
+
+__attribute__((target("branch-protection=pac-ret+b-key")))
+long retbr_do_return_retab(void)
+{
+    return retbr_subcall()+1;
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not {\t(br|ret|retaa|retab)\tx[0-9][0-9]?\n\t(?!dsb\tsy\n\tisb)} } } */
+/* { dg-final { scan-assembler-not {ret\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
new file mode 100644
index 0000000000000000000000000000000000000000..76b8d03afe499227c359245251dbebc0ab453c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
@@ -0,0 +1,119 @@
+/* We ensure that -Wpedantic is off since it complains about the trampolines
+   we explicitly want to test.  */
+/* { dg-additional-options "-mharden-sls=retbr -Wno-pedantic " } */
+/*
+   Ensure that the SLS hardening of RET and BR leaves no unprotected RET/BR
+   instructions.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+int
+retbr_sibcall_value_insn (struct sls_testclass x)
+{
+  return x.x(x.left, x.right);
+}
+
+void
+retbr_sibcall_insn (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+}
+
+/* Aim to test two different returns.
+   One that introduces a tail call in the middle of the function, and one that
+   has a normal return.  */
+int
+retbr_multiple_returns (struct sls_testclass x)
+{
+  int temp;
+  if (x.left % 10)
+    return x.x(x.left, 100);
+  else if (x.right % 20)
+    {
+      return x.x(x.left * x.right, 100);
+    }
+  temp = x.left % x.right;
+  temp *= 100;
+  temp /= 2;
+  return temp % 3;
+}
+
+void
+retbr_multiple_returns_void (struct sls_testclass x)
+{
+  if (x.left % 10)
+    {
+      x.y(x.left, 100);
+    }
+  else if (x.right % 20)
+    {
+      x.y(x.left * x.right, 100);
+    }
+  return;
+}
+
+/* Testing the casesi jump via register.  */
+__attribute__ ((optimize ("Os")))
+int
+retbr_casesi_dispatch (struct sls_testclass x)
+{
+  switch (x.left)
+    {
+    case -5:
+      return -2;
+    case -3:
+      return -1;
+    case 0:
+      return 0;
+    case 3:
+      return 1;
+    case 5:
+      break;
+    default:
+      __builtin_unreachable ();
+    }
+  return x.right;
+}
+
+/* Testing the BR in trampolines is mitigated against.  */
+void f1 (void *);
+void f3 (void *, void (*)(void *));
+void f2 (void *);
+
+int
+retbr_trampolines (void *a, int b)
+{
+  if (!b)
+    {
+      f1 (a);
+      return 1;
+    }
+  if (b)
+    {
+      void retbr_tramp_internal (void *c)
+      {
+	if (c == a)
+	  f2 (c);
+      }
+      f3 (a, retbr_tramp_internal);
+    }
+  return 0;
+}
+
+/* Testing the indirect_jump pattern.  */
+void
+retbr_indirect_jump (int *buf)
+{
+  __builtin_longjmp(buf, 1);
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not {\t(br|ret|retaa|retab)\tx[0-9][0-9]?\n\t(?!dsb\tsy\n\tisb|sb)} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
new file mode 100644
index 0000000000000000000000000000000000000000..812250379f877b3a924667c6e53b06cd7fecca7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
@@ -0,0 +1,73 @@
+#  Regression driver for SLS mitigation on AArch64.
+#  Copyright (C) 2020 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+load_lib torture-options.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " "
+}
+
+# Initialize `dg'.
+dg-init
+torture-init
+
+# Use different architectures as well as the normal optimisation options.
+# (i.e. use both SB and DSB+ISB barriers).
+
+set save-dg-do-what-default ${dg-do-what-default}
+# Main loop.
+# Run with torture tests (i.e. a bunch of different optimisation levels) just
+# to increase test coverage.
+set dg-do-what-default assemble
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+
+# Run the same tests but this time with SB extension.
+# Since not all supported assemblers will support that extension we decide
+# whether to assemble or just compile based on whether the extension is
+# supported for the available assembler.
+
+set templist {}
+foreach x $DG_TORTURE_OPTIONS {
+  lappend templist "$x -march=armv8.3-a+sb "
+  lappend templist "$x -march=armv8-a+sb "
+}
+set-torture-options $templist
+if { [check_effective_target_aarch64_asm_sb_ok] } {
+    set dg-do-what-default assemble
+} else {
+    set dg-do-what-default compile
+}
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+torture-finish
+dg-finish
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index cf0cfa11eb982698e1c2a4384c76789399a32664..5b305f126174a605372bb23f5e7ec44c10691eae 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9429,7 +9429,7 @@ proc check_effective_target_aarch64_tiny { } {
 # various architecture extensions via the .arch_extension pseudo-op.
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
-			  "i8mm" "f32mm" "f64mm" "bf16" } {
+			  "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
     eval [string map [list FUNC $aarch64_ext] {
 	proc check_effective_target_aarch64_asm_FUNC_ok { } {
 	  if { [istarget aarch64*-*-*] } {


[-- Attachment #2: sls-retbr.patch --]
[-- Type: text/plain, Size: 16392 bytes --]

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 8ca67d7e69edaf73c84f079e7e1c483009ad10c0..b035e4ec78e2ef1c9a931148dffacf6a50345b84 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -780,6 +780,7 @@ extern const atomic_ool_names aarch64_ool_ldeor_names;
 
 tree aarch64_resolve_overloaded_builtin_general (location_t, tree, void *);
 
+const char *aarch64_sls_barrier (int);
 extern bool aarch64_harden_sls_retbr_p (void);
 extern bool aarch64_harden_sls_blr_p (void);
 
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 2be52fd4d73def0007795159298e3d3e8fc4399d..d60d295830d3e422bb4267de275597d2087b99e6 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -281,6 +281,7 @@ extern unsigned aarch64_architecture_version;
 #define AARCH64_ISA_F32MM	   (aarch64_isa_flags & AARCH64_FL_F32MM)
 #define AARCH64_ISA_F64MM	   (aarch64_isa_flags & AARCH64_FL_F64MM)
 #define AARCH64_ISA_BF16	   (aarch64_isa_flags & AARCH64_FL_BF16)
+#define AARCH64_ISA_SB  	   (aarch64_isa_flags & AARCH64_FL_SB)
 
 /* Crypto is an optional extension to AdvSIMD.  */
 #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPTO)
@@ -378,6 +379,9 @@ extern unsigned aarch64_architecture_version;
 #define TARGET_FIX_ERR_A53_835769_DEFAULT 1
 #endif
 
+/* SB instruction is enabled through +sb.  */
+#define TARGET_SB (AARCH64_ISA_SB)
+
 /* Apply the workaround for Cortex-A53 erratum 835769.  */
 #define TARGET_FIX_ERR_A53_835769	\
   ((aarch64_fix_a53_err835769 == 2)	\
@@ -1075,8 +1079,10 @@ typedef struct
 
 #define RETURN_ADDR_RTX aarch64_return_addr
 
-/* BTI c + 3 insns + 2 pointer-sized entries.  */
-#define TRAMPOLINE_SIZE	(TARGET_ILP32 ? 24 : 32)
+/* BTI c + 3 insns
+   + sls barrier of DSB + ISB.
+   + 2 pointer-sized entries.  */
+#define TRAMPOLINE_SIZE	(24 + (TARGET_ILP32 ? 8 : 16))
 
 /* Trampolines contain dwords, so must be dword aligned.  */
 #define TRAMPOLINE_ALIGNMENT 64
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1a7c10c4eaadd78eb45926c23efc51a8272b5fd..18ac55ab15d4f42c1e3744ac3741b5b90f888c91 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -10836,8 +10836,8 @@ aarch64_return_addr (int count, rtx frame ATTRIBUTE_UNUSED)
 static void
 aarch64_asm_trampoline_template (FILE *f)
 {
-  int offset1 = 16;
-  int offset2 = 20;
+  int offset1 = 24;
+  int offset2 = 28;
 
   if (aarch64_bti_enabled ())
     {
@@ -10860,6 +10860,17 @@ aarch64_asm_trampoline_template (FILE *f)
     }
   asm_fprintf (f, "\tbr\t%s\n", reg_names [IP1_REGNUM]);
 
+  /* We always emit a speculation barrier.
+     This is because the same trampoline template is used for every nested
+     function.  Since nested functions are not particularly common or
+     performant we don't worry too much about the extra instructions to copy
+     around.
+     This is not yet a problem, since we have not yet implemented function
+     specific attributes to choose between hardening against straight line
+     speculation or not, but such function specific attributes are likely to
+     happen in the future.  */
+  asm_fprintf (f, "\tdsb\tsy\n\tisb\n");
+
   /* The trampoline needs an extra padding instruction.  In case if BTI is
      enabled the padding instruction is replaced by the BTI instruction at
      the beginning.  */
@@ -10874,10 +10885,14 @@ static void
 aarch64_trampoline_init (rtx m_tramp, tree fndecl, rtx chain_value)
 {
   rtx fnaddr, mem, a_tramp;
-  const int tramp_code_sz = 16;
+  const int tramp_code_sz = 24;
 
   /* Don't need to copy the trailing D-words, we fill those in below.  */
-  emit_block_move (m_tramp, assemble_trampoline_template (),
+  /* We create our own memory address in Pmode so that `emit_block_move` can
+     use parts of the backend which expect Pmode addresses.  */
+  rtx temp = convert_memory_address (Pmode, XEXP (m_tramp, 0));
+  emit_block_move (gen_rtx_MEM (BLKmode, temp),
+		   assemble_trampoline_template (),
 		   GEN_INT (tramp_code_sz), BLOCK_OP_NORMAL);
   mem = adjust_address (m_tramp, ptr_mode, tramp_code_sz);
   fnaddr = XEXP (DECL_RTL (fndecl), 0);
@@ -11068,6 +11083,8 @@ aarch64_output_casesi (rtx *operands)
   output_asm_insn (buf, operands);
   output_asm_insn (patterns[index][1], operands);
   output_asm_insn ("br\t%3", operands);
+  output_asm_insn (aarch64_sls_barrier (aarch64_harden_sls_retbr_p ()),
+		   operands);
   assemble_label (asm_out_file, label);
   return "";
 }
@@ -22935,6 +22952,22 @@ aarch64_file_end_indicate_exec_stack ()
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_BTI
 #undef GNU_PROPERTY_AARCH64_FEATURE_1_AND
 
+/* Helper function for straight line speculation.
+   Return what barrier should be emitted for straight line speculation
+   mitigation.
+   When not mitigating against straight line speculation this function returns
+   an empty string.
+   When mitigating against straight line speculation, use:
+   * SB when the v8.5-A SB extension is enabled.
+   * DSB+ISB otherwise.  */
+const char *
+aarch64_sls_barrier (int mitigation_required)
+{
+  return mitigation_required
+    ? (TARGET_SB ? "sb" : "dsb\tsy\n\tisb")
+    : "";
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index deca0004fedcb41c1e6b88ef3f8b4b187b4eecf8..9e76741d2d7e3e5d238fbbe8b41e6d824f97bd35 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -407,10 +407,25 @@
 ;; Attribute that specifies whether the alternative uses MOVPRFX.
 (define_attr "movprfx" "no,yes" (const_string "no"))
 
+;; Attribute to specify that an alternative has the length of a single
+;; instruction plus a speculation barrier.
+(define_attr "sls_length" "none,retbr,casesi" (const_string "none"))
+
 (define_attr "length" ""
   (cond [(eq_attr "movprfx" "yes")
            (const_int 8)
-        ] (const_int 4)))
+
+	 (eq_attr "sls_length" "retbr")
+	   (cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 4)
+		  (match_test "TARGET_SB") (const_int 8)]
+		 (const_int 12))
+
+	 (eq_attr "sls_length" "casesi")
+	   (cond [(match_test "!aarch64_harden_sls_retbr_p ()") (const_int 16)
+		  (match_test "TARGET_SB") (const_int 20)]
+		 (const_int 24))
+	]
+	  (const_int 4)))
 
 ;; Strictly for compatibility with AArch32 in pipeline models, since AArch64 has
 ;; no predicated insns.
@@ -447,8 +462,12 @@
 (define_insn "indirect_jump"
   [(set (pc) (match_operand:DI 0 "register_operand" "r"))]
   ""
-  "br\\t%0"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("br\\t%0", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set_attr "sls_length" "retbr")]
 )
 
 (define_insn "jump"
@@ -765,7 +784,7 @@
   "*
   return aarch64_output_casesi (operands);
   "
-  [(set_attr "length" "16")
+  [(set_attr "sls_length" "casesi")
    (set_attr "type" "branch")]
 )
 
@@ -844,18 +863,23 @@
   [(return)]
   ""
   {
+    const char *ret = NULL;
     if (aarch64_return_address_signing_enabled ()
 	&& TARGET_ARMV8_3
 	&& !crtl->calls_eh_return)
       {
 	if (aarch64_ra_sign_key == AARCH64_KEY_B)
-	  return "retab";
+	  ret = "retab";
 	else
-	  return "retaa";
+	  ret = "retaa";
       }
-    return "ret";
+    else
+      ret = "ret";
+    output_asm_insn (ret, operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
   }
-  [(set_attr "type" "branch")]
+  [(set_attr "type" "branch")
+   (set_attr "sls_length" "retbr")]
 )
 
 (define_expand "return"
@@ -867,8 +891,12 @@
 (define_insn "simple_return"
   [(simple_return)]
   ""
-  "ret"
-  [(set_attr "type" "branch")]
+  {
+    output_asm_insn ("ret", operands);
+    return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+  }
+  [(set_attr "type" "branch")
+   (set_attr "sls_length" "retbr")]
 )
 
 (define_insn "*cb<optab><mode>1"
@@ -1066,10 +1094,16 @@
    (unspec:DI [(match_operand:DI 2 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%0
-   b\\t%c0"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%0", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c0";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr "sls_length" "retbr,none")]
 )
 
 (define_insn "*sibcall_value_insn"
@@ -1080,10 +1114,16 @@
    (unspec:DI [(match_operand:DI 3 "const_int_operand")] UNSPEC_CALLEE_ABI)
    (return)]
   "SIBLING_CALL_P (insn)"
-  "@
-   br\\t%1
-   b\\t%c1"
-  [(set_attr "type" "branch, branch")]
+  {
+    if (which_alternative == 0)
+      {
+	output_asm_insn ("br\\t%1", operands);
+	return aarch64_sls_barrier (aarch64_harden_sls_retbr_p ());
+      }
+    return "b\\t%c1";
+  }
+  [(set_attr "type" "branch, branch")
+   (set_attr "sls_length" "retbr,none")]
 )
 
 ;; Call subroutine returning any type.
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
new file mode 100644
index 0000000000000000000000000000000000000000..fa1887a71e75d11be6cfff8bb5a7f4d89627d01f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr-pacret.c
@@ -0,0 +1,21 @@
+/* Avoid ILP32 since pacret is only available for LP64 */
+/* { dg-do compile { target { ! ilp32 } } } */
+/* { dg-additional-options "-mharden-sls=retbr -mbranch-protection=pac-ret -march=armv8.3-a" } */
+
+/* Testing the do_return pattern for retaa and retab.  */
+long retbr_subcall(void);
+long retbr_do_return_retaa(void)
+{
+    return retbr_subcall()+1;
+}
+
+__attribute__((target("branch-protection=pac-ret+b-key")))
+long retbr_do_return_retab(void)
+{
+    return retbr_subcall()+1;
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not {\t(br|ret|retaa|retab)\tx[0-9][0-9]?\n\t(?!dsb\tsy\n\tisb)} } } */
+/* { dg-final { scan-assembler-not {ret\t} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
new file mode 100644
index 0000000000000000000000000000000000000000..76b8d03afe499227c359245251dbebc0ab453c7d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-miti-retbr.c
@@ -0,0 +1,119 @@
+/* We ensure that -Wpedantic is off since it complains about the trampolines
+   we explicitly want to test.  */
+/* { dg-additional-options "-mharden-sls=retbr -Wno-pedantic " } */
+/*
+   Ensure that the SLS hardening of RET and BR leaves no unprotected RET/BR
+   instructions.
+  */
+typedef int (foo) (int, int);
+typedef void (bar) (int, int);
+struct sls_testclass {
+    foo *x;
+    bar *y;
+    int left;
+    int right;
+};
+
+int
+retbr_sibcall_value_insn (struct sls_testclass x)
+{
+  return x.x(x.left, x.right);
+}
+
+void
+retbr_sibcall_insn (struct sls_testclass x)
+{
+  x.y(x.left, x.right);
+}
+
+/* Aim to test two different returns.
+   One that introduces a tail call in the middle of the function, and one that
+   has a normal return.  */
+int
+retbr_multiple_returns (struct sls_testclass x)
+{
+  int temp;
+  if (x.left % 10)
+    return x.x(x.left, 100);
+  else if (x.right % 20)
+    {
+      return x.x(x.left * x.right, 100);
+    }
+  temp = x.left % x.right;
+  temp *= 100;
+  temp /= 2;
+  return temp % 3;
+}
+
+void
+retbr_multiple_returns_void (struct sls_testclass x)
+{
+  if (x.left % 10)
+    {
+      x.y(x.left, 100);
+    }
+  else if (x.right % 20)
+    {
+      x.y(x.left * x.right, 100);
+    }
+  return;
+}
+
+/* Testing the casesi jump via register.  */
+__attribute__ ((optimize ("Os")))
+int
+retbr_casesi_dispatch (struct sls_testclass x)
+{
+  switch (x.left)
+    {
+    case -5:
+      return -2;
+    case -3:
+      return -1;
+    case 0:
+      return 0;
+    case 3:
+      return 1;
+    case 5:
+      break;
+    default:
+      __builtin_unreachable ();
+    }
+  return x.right;
+}
+
+/* Testing the BR in trampolines is mitigated against.  */
+void f1 (void *);
+void f3 (void *, void (*)(void *));
+void f2 (void *);
+
+int
+retbr_trampolines (void *a, int b)
+{
+  if (!b)
+    {
+      f1 (a);
+      return 1;
+    }
+  if (b)
+    {
+      void retbr_tramp_internal (void *c)
+      {
+	if (c == a)
+	  f2 (c);
+      }
+      f3 (a, retbr_tramp_internal);
+    }
+  return 0;
+}
+
+/* Testing the indirect_jump pattern.  */
+void
+retbr_indirect_jump (int *buf)
+{
+  __builtin_longjmp(buf, 1);
+}
+
+/* Ensure there are no BR or RET instructions which are not directly followed
+   by a speculation barrier.  */
+/* { dg-final { scan-assembler-not {\t(br|ret|retaa|retab)\tx[0-9][0-9]?\n\t(?!dsb\tsy\n\tisb|sb)} } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
new file mode 100644
index 0000000000000000000000000000000000000000..812250379f877b3a924667c6e53b06cd7fecca7b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sls-mitigation/sls-mitigation.exp
@@ -0,0 +1,73 @@
+#  Regression driver for SLS mitigation on AArch64.
+#  Copyright (C) 2020 Free Software Foundation, Inc.
+#  Contributed by ARM Ltd.
+#
+#  This file is part of GCC.
+#
+#  GCC is free software; you can redistribute it and/or modify it
+#  under the terms of the GNU General Public License as published by
+#  the Free Software Foundation; either version 3, or (at your option)
+#  any later version.
+#
+#  GCC is distributed in the hope that it will be useful, but
+#  WITHOUT ANY WARRANTY; without even the implied warranty of
+#  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+#  General Public License for more details.
+#
+#  You should have received a copy of the GNU General Public License
+#  along with GCC; see the file COPYING3.  If not see
+#  <http://www.gnu.org/licenses/>.  */
+
+# Exit immediately if this isn't an AArch64 target.
+if {![istarget aarch64*-*-*] } then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+load_lib torture-options.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " "
+}
+
+# Initialize `dg'.
+dg-init
+torture-init
+
+# Use different architectures as well as the normal optimisation options.
+# (i.e. use both SB and DSB+ISB barriers).
+
+set save-dg-do-what-default ${dg-do-what-default}
+# Main loop.
+# Run with torture tests (i.e. a bunch of different optimisation levels) just
+# to increase test coverage.
+set dg-do-what-default assemble
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+
+# Run the same tests but this time with SB extension.
+# Since not all supported assemblers will support that extension we decide
+# whether to assemble or just compile based on whether the extension is
+# supported for the available assembler.
+
+set templist {}
+foreach x $DG_TORTURE_OPTIONS {
+  lappend templist "$x -march=armv8.3-a+sb "
+  lappend templist "$x -march=armv8-a+sb "
+}
+set-torture-options $templist
+if { [check_effective_target_aarch64_asm_sb_ok] } {
+    set dg-do-what-default assemble
+} else {
+    set dg-do-what-default compile
+}
+gcc-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/*.\[cCS\]]] \
+	"-save-temps" $DEFAULT_CFLAGS
+set dg-do-what-default ${save-dg-do-what-default}
+
+# All done.
+torture-finish
+dg-finish
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index cf0cfa11eb982698e1c2a4384c76789399a32664..5b305f126174a605372bb23f5e7ec44c10691eae 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -9429,7 +9429,7 @@ proc check_effective_target_aarch64_tiny { } {
 # various architecture extensions via the .arch_extension pseudo-op.
 
 foreach { aarch64_ext } { "fp" "simd" "crypto" "crc" "lse" "dotprod" "sve"
-			  "i8mm" "f32mm" "f64mm" "bf16" } {
+			  "i8mm" "f32mm" "f64mm" "bf16" "sb" } {
     eval [string map [list FUNC $aarch64_ext] {
 	proc check_effective_target_aarch64_asm_FUNC_ok { } {
 	  if { [istarget aarch64*-*-*] } {


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags
  2020-07-03 13:27         ` Matthew Malcomson
@ 2020-07-06 10:05           ` Richard Sandiford
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2020-07-06 10:05 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw, nd

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> +  aarch64_sls_hardening = SLS_NONE;
> +  if (strcmp (const_str, "none") == 0)
> +    {
> +      aarch64_sls_hardening = SLS_NONE;
> +      return;

Gah, I totally misread the previous patch and didn't see that
you were already setting aarch64_sls_hardening to SLS_NONE above.
So this line is obviously redundant after all, sorry for the noise.

OK with the line above removed, thanks.

Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions
  2020-07-03 13:33         ` Matthew Malcomson
@ 2020-07-08 14:30           ` Richard Sandiford
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Sandiford @ 2020-07-08 14:30 UTC (permalink / raw)
  To: Matthew Malcomson
  Cc: gcc-patches, Kyrylo.Tkachov, Kristof.Beyls, Richard.Earnshaw

Matthew Malcomson <matthew.malcomson@arm.com> writes:
> With suggestions applied.
> Testing with `-mabi=ilp32` found a bug around the trampoline
> initialisation where the new larger size of the trampoline caused a
> different execution path of `emit_block_move` which ICE'd on the
> pre-existing `ptr_mode` address.

OK, thanks, and sorry for the slow review.

Richard

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2020-07-08 14:30 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-08 14:10 Straight Line Speculation (SLS) mitigation Matthew Malcomson
2020-06-08 14:10 ` [Patch 1/3] aarch64: New Straight Line Speculation (SLS) mitigation flags Matthew Malcomson
2020-06-23 15:48   ` Richard Sandiford
2020-06-23 17:07     ` Matthew Malcomson
2020-06-23 17:12       ` Richard Sandiford
2020-07-03 13:27         ` Matthew Malcomson
2020-07-06 10:05           ` Richard Sandiford
2020-06-08 14:10 ` [Patch 2/3] aarch64: Introduce SLS mitigation for RET and BR instructions Matthew Malcomson
2020-06-23 16:17   ` Richard Sandiford
2020-06-23 16:49     ` Matthew Malcomson
2020-06-23 16:56       ` Richard Sandiford
2020-06-23 16:58         ` Matthew Malcomson
2020-07-03 13:33         ` Matthew Malcomson
2020-07-08 14:30           ` Richard Sandiford
2020-06-08 14:10 ` [Patch 3/3] aarch64: Mitigate SLS for BLR instruction Matthew Malcomson
2020-06-23 14:57   ` [Patch v2 " Matthew Malcomson
2020-06-23 16:31     ` Richard Sandiford

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).