public inbox for gcc@gcc.gnu.org
 help / color / mirror / Atom feed
* Spectre V1 diagnostic / mitigation
@ 2018-12-18 15:37 Richard Biener
  2018-12-18 16:17 ` Jeff Law
  2018-12-18 16:48 ` Richard Earnshaw (lists)
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Biener @ 2018-12-18 15:37 UTC (permalink / raw)
  To: gcc


Hi,

in the past weeks I've been looking into prototyping both spectre V1 
(speculative array bound bypass) diagnostics and mitigation in an
architecture independent manner to assess feasability and some kind
of upper bound on the performance impact one can expect.
https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
an interesting read in this context as well.

For simplicity I have implemented mitigation on GIMPLE right before
RTL expansion and have chosen TLS to do mitigation across function
boundaries.  Diagnostics sit in the same place but both are not in
any way dependent on each other.

The mitigation strategy chosen is that of tracking speculation
state via a mask that can be used to zero parts of the addresses
that leak the actual data.  That's similar to what aarch64 does
with -mtrack-speculation (but oddly there's no mitigation there).

I've optimized things to the point that is reasonable when working
target independent on GIMPLE but I've only looked at x86 assembly
and performance.  I expect any "final" mitigation if we choose to
implement and integrate such would be after RTL expansion since
RTL expansion can end up introducing quite some control flow whose
speculation state is not properly tracked by the prototype.

I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
mitigation and =3 does mitigation global with passing the state
via TLS memory.

The following was measured on a Haswell desktop CPU:

	-O2 vs. -O2 -fspectre-v1=2

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
400.perlbench    9770        245       39.8 *    9770        452       21.6 *  184%
401.bzip2        9650        378       25.5 *    9650        726       13.3 *  192%
403.gcc          8050        236       34.2 *    8050        352       22.8 *  149%
429.mcf          9120        223       40.9 *    9120        656       13.9 *  294%
445.gobmk       10490        400       26.2 *   10490        666       15.8 *  167%
456.hmmer        9330        388       24.1 *    9330        536       17.4 *  138%
458.sjeng       12100        437       27.7 *   12100        661       18.3 *  151%
462.libquantum  20720        300       69.1 *   20720        384       53.9 *  128%
464.h264ref     22130        451       49.1 *   22130        586       37.8 *  130%
471.omnetpp      6250        291       21.5 *    6250        398       15.7 *  137%
473.astar        7020        334       21.0 *    7020        522       13.5 *  156%
483.xalancbmk    6900        182       37.9 *    6900        306       22.6 *  168%
 Est. SPECint_base2006                   --
 Est. SPECint2006                                                        --

   -O2 -fspectre-v1=3

                                  Estimated                       Estimated
                Base     Base       Base        Peak     Peak       Peak
Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
-------------- ------  ---------  ---------    ------  ---------  ---------
400.perlbench                                    9770        497       19.6 *  203%
401.bzip2                                        9650        772       12.5 *  204%
403.gcc                                          8050        427       18.9 *  181%
429.mcf                                          9120        696       13.1 *  312%
445.gobmk                                       10490        726       14.4 *  181%
456.hmmer                                        9330        537       17.4 *  138%
458.sjeng                                       12100        721       16.8 *  165%
462.libquantum                                  20720        446       46.4 *  149%
464.h264ref                                     22130        613       36.1 *  136%
471.omnetpp                                      6250        471       13.3 *  162%
473.astar                                        7020        579       12.1 *  173%
483.xalancbmk                                    6900        350       19.7 *  192%
 Est. SPECint(R)_base2006           Not Run
 Est. SPECint2006                                                        --


While the following was measured on a Zen Epyc server:

-O2 vs -O2 -fspectre-v1=2

                       Estimated                       Estimated
                 Base     Base        Base        Peak     Peak        Peak
Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
--------------- -------  ---------  ---------    -------  ---------  ---------
500.perlbench_r       1        499       3.19  *       1        621       2.56  * 124%
502.gcc_r             1        286       4.95  *       1        392       3.61  * 137%
505.mcf_r             1        331       4.88  *       1        456       3.55  * 138%
520.omnetpp_r         1        454       2.89  *       1        563       2.33  * 124%
523.xalancbmk_r       1        328       3.22  *       1        569       1.86  * 173%
525.x264_r            1        518       3.38  *       1        776       2.26  * 150%
531.deepsjeng_r       1        365       3.14  *       1        448       2.56  * 123%
541.leela_r           1        598       2.77  *       1        729       2.27  * 122%
548.exchange2_r       1        460       5.69  *       1        756       3.46  * 164%
557.xz_r              1        403       2.68  *       1        586       1.84  * 145%
 Est. SPECrate2017_int_base              3.55
 Est. SPECrate2017_int_peak                                               2.56    72%

-O2 -fspectre-v2=3

                       Estimated                       Estimated
                 Base     Base        Base        Peak     Peak        Peak
Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
--------------- -------  ---------  ---------    -------  ---------  ---------
500.perlbench_r                               NR       1        700       2.27  * 140%
502.gcc_r                                     NR       1        485       2.92  * 170%
505.mcf_r                                     NR       1        596       2.71  * 180%
520.omnetpp_r                                 NR       1        604       2.17  * 133%
523.xalancbmk_r                               NR       1        643       1.64  * 196%
525.x264_r                                    NR       1        797       2.20  * 154%
531.deepsjeng_r                               NR       1        542       2.12  * 149%
541.leela_r                                   NR       1        872       1.90  * 146%
548.exchange2_r                               NR       1        761       3.44  * 165%
557.xz_r                                      NR       1        595       1.81  * 148%
 Est. SPECrate2017_int_base           Not Run
 Est. SPECrate2017_int_peak                                               2.26    64%



you can see, even thoug we're comparing apples and oranges, that the 
performance impact is quite dependent on the microarchitecture.

Similarly interesting as performance is the effect on text size which is
surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
bytes per instrumented memory).

CPU2016:
   BASE  -O2
   text	   data	    bss	    dec	    hex	filename
1117726	  20928	  12704	1151358	 11917e	400.perlbench
  56568	   3800	   4416	  64784	   fd10	401.bzip2
3419568	   7912	 751520	4179000	 3fc438	403.gcc
  12212	    712	  11984	  24908	   614c	429.mcf
1460694	2081772	2330096	5872562	 599bb2	445.gobmk
 284929	   5956	  82040	 372925	  5b0bd	456.hmmer
 130782	   2152	2576896	2709830	 295946	458.sjeng
  41915	    764	     96	  42775	   a717	462.libquantum
 505452	  11220	 372320	 888992	  d90a0	464.h264ref
 638188	   9584	  14664	 662436	  a1ba4	471.omnetpp
  38859	    900	   5216	  44975	   afaf	473.astar
4033878	 140248	  12168	4186294	 3fe0b6	483.xalancbmk
   PEAK -O2 -fspectre-v1=2
   text	   data	    bss	    dec	    hex	filename
1508032	  20928	  12704	1541664	 178620	400.perlbench	135%
  76098	   3800	   4416	  84314	  1495a	401.bzip2	135%
4483530	   7912	 751520	5242962	 500052	403.gcc		131%
  16006	    712	  11984	  28702	   701e	429.mcf		131%
1647384	2081772	2330096	6059252	 5c74f4	445.gobmk	112%
 377259	   5956	  82040	 465255	  71967	456.hmmer	132%
 164672	   2152	2576896	2743720	 29dda8	458.sjeng	126%
  47901	    764	     96	  48761	   be79	462.libquantum	114%
 649854	  11220	 372320	1033394	  fc4b2	464.h264ref	129%
 706908	   9584	  14664	 731156	  b2814	471.omnetpp	111%
  48493	    900	   5216	  54609	   d551	473.astar	125%
4862056	 140248	  12168	5014472	 4c83c8	483.xalancbmk	121%
   PEAK -O2 -fspectre-v1=3
   text	   data	    bss	    dec	    hex	filename
1742008	  20936	  12704	1775648	 1b1820	400.perlbench	156%
  83338	   3808	   4416	  91562	  165aa	401.bzip2	147%
5219850	   7920	 751520	5979290	 5b3c9a	403.gcc		153%
  17422	    720	  11984	  30126	   75ae	429.mcf		143%
1801688	2081780	2330096	6213564	 5ecfbc	445.gobmk	123%
 431827	   5964	  82040	 519831	  7ee97	456.hmmer	152%
 182200	   2160	2576896	2761256	 2a2228	458.sjeng	139%
  53773	    772	     96	  54641	   d571	462.libquantum	128%
 691798	  11228	 372320	1075346	 106892	464.h264ref	137%
 976692	   9592	  14664	1000948	  f45f4	471.omnetpp	153%
  54525	    908	   5216	  60649	   ece9	473.astar	140%
5808306	 140256	  12168	5960730	 5af41a	483.xalancbmk	144%

CPU2017:
   BASE -O2 -g
   text    data     bss     dec     hex filename
2209713    8576    9080 2227369  21fca9 500.perlbench_r
9295702   37432 1150664 10483798 9ff856 502.gcc_r
  21795     712     744   23251    5ad3 505.mcf_r
2067560    8984   46888 2123432  2066a8 520.omnetpp_r
5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
 508402    6102   29592  544096   84d60 525.x264_r
  84222     784 12138360 12223366 ba8386 531.deepsjeng_r
 223480    8544   30072  262096   3ffd0 541.leela_r
  70554     864    6384   77802   12fea 548.exchange2_r
 180640     884   17704  199228   30a3c 557.xz_r
   PEAK -fspectre-v2=2
   text    data     bss     dec     hex filename
2991161    8576    9080 3008817  2de931 500.perlbench_r	135%
12244886  37432 1150664 13432982 ccf896 502.gcc_r	132%
  28475     712     744   29931    74eb 505.mcf_r	131%
2397026    8984   46888 2452898  256da2 520.omnetpp_r	116%
6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r	119%
 645730    6102   29592  681424   a65d0 525.x264_r	127%
 111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
 260835    8544   30072  299451   491bb 541.leela_r     117%
  96874     864    6384  104122   196ba 548.exchange2_r	137%
 215288     884   17704  233876   39194 557.xz_r	119%
   PEAK -fspectre-v2=3
   text    data     bss     dec     hex filename
3365945    8584    9080 3383609  33a139 500.perlbench_r	152%
14790638  37440 1150664 15978742 f3d0f6 502.gcc_r	159%
  31419     720     744   32883    8073 505.mcf_r	144%
2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r	139%
8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r	142%
 697434    6110   29592  733136   b2fd0 525.x264_r	137%
 123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
 315347    8552   30072  353971   566b3 541.leela_r	141%
  98578     872    6384  105834   19d6a 548.exchange2_r	140%
 239144     892   17704  257740   3eecc 557.xz_r	133%


The patch relies heavily on RTL optimizations for DCE purposes.  At the
same time we rely on RTL not statically computing the mask (RTL has no
conditional constant propagation).  Full instrumentation of the classic
Spectre V1 testcase

char a[1024];
int b[1024];
int foo (int i, int bound)
{
  if (i < bound)
    return b[a[i]];
}

is the following:

foo:
.LFB0:  
        .cfi_startproc
        xorl    %eax, %eax
        cmpl    %esi, %edi
        setge   %al
        subq    $1, %rax
        jne     .L4
        ret
        .p2align 4,,10
        .p2align 3
.L4:
        andl    %eax, %edi
        movslq  %edi, %rdi
        movsbq  a(%rdi), %rax
        movl    b(,%rax,4), %eax
        ret

so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.

Patch below for reference (and your own testing in case you are curious).
I do not plan to pursue this further at this point.

Richard.

From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
From: Richard Guenther <rguenther@suse.de>
Date: Wed, 5 Dec 2018 13:17:02 +0100
Subject: [PATCH] warn-spectrev1


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7960cace16a..64d472d7fa0 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1334,6 +1334,7 @@ OBJS = \
 	gimple-ssa-sprintf.o \
 	gimple-ssa-warn-alloca.o \
 	gimple-ssa-warn-restrict.o \
+	gimple-ssa-spectrev1.o \
 	gimple-streamer-in.o \
 	gimple-streamer-out.o \
 	gimple-walk.o \
diff --git a/gcc/common.opt b/gcc/common.opt
index 45d7f6189e5..1ae7fcfe177 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -702,6 +702,10 @@ Warn when one local variable shadows another local variable or parameter of comp
 Wshadow-compatible-local
 Common Warning Undocumented Alias(Wshadow=compatible-local)
 
+Wspectre-v1
+Common Var(warn_spectrev1) Warning
+Warn about code susceptible to spectre v1 style attacks.
+
 Wstack-protector
 Common Var(warn_stack_protect) Warning
 Warn when not issuing stack smashing protection for some reason.
@@ -2406,6 +2410,14 @@ fsingle-precision-constant
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
 
+fspectre-v1
+Common Alias(fspectre-v1=, 2, 0)
+Insert code to mitigate spectre v1 style attacks.
+
+fspectre-v1=
+Common Report RejectNegative Joined UInteger IntegerRange(0, 3) Var(flag_spectrev1) Optimization
+Insert code to mitigate spectre v1 style attacks.
+
 fsplit-ivs-in-unroller
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
new file mode 100644
index 00000000000..c2a5dc95324
--- /dev/null
+++ b/gcc/gimple-ssa-spectrev1.cc
@@ -0,0 +1,824 @@
+/* Loop interchange.
+   Copyright (C) 2017-2018 Free Software Foundation, Inc.
+   Contributed by ARM Ltd.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "is-a.h"
+#include "tree.h"
+#include "gimple.h"
+#include "tree-pass.h"
+#include "ssa.h"
+#include "gimple-pretty-print.h"
+#include "gimple-iterator.h"
+#include "params.h"
+#include "tree-ssa.h"
+#include "cfganal.h"
+#include "gimple-walk.h"
+#include "tree-ssa-loop.h"
+#include "tree-dfa.h"
+#include "tree-cfg.h"
+#include "fold-const.h"
+#include "builtins.h"
+#include "alias.h"
+#include "cfgloop.h"
+#include "varasm.h"
+#include "cgraph.h"
+#include "gimple-fold.h"
+#include "diagnostic.h"
+
+/* The Spectre V1 situation is as follows:
+
+      if (attacker_controlled_idx < bound)  // speculated as true but is false
+        {
+	  // out-of-bound access, returns value interesting to attacker
+	  val = mem[attacker_controlled_idx];
+	  // access that causes a cache-line to be brought in - canary
+	  ... = attacker_controlled_mem[val];
+	}
+
+   The last load provides the side-channel.  The pattern can be split
+   into multiple functions or translation units.  Conservatively we'd
+   have to warn about
+
+      int foo (int *a) {  return *a; }
+
+   thus any indirect (or indexed) memory access.  That's obvioulsy
+   not useful.
+
+   The next level would be to warn only when we see load of val as
+   well.  That then misses cases like
+
+      int foo (int *a, int *b)
+      {
+        int idx = load_it (a);
+	return load_it (&b[idx]);
+      }
+
+   Still we'd warn about cases like
+
+      struct Foo { int *a; };
+      int foo (struct Foo *a) { return *a->a; }
+
+   though dereferencing VAL isn't really an interesting case.  It's
+   hard to exclude this conservatively so the obvious solution is
+   to restrict the kind of loads that produce val, for example based
+   on its type or its number of bits.  It's tempting to do this at
+   the point of the load producing val but in the end what matters
+   is the number of bits that reach the second loads [as index] given
+   there are practical limits on the size of the canary.  For this
+   we have to consider
+
+      int foo (struct Foo *a, int *b)
+      {
+        int *c = a->a;
+	int idx = *b;
+	return *(c + idx);
+      }
+
+   where idx has too many bits to be an interesting attack vector(?).
+ */
+
+/* The pass does two things, first it performs data flow analysis
+   to be able to warn about the second load.  This is controlled
+   via -Wspectre-v1.
+
+   Second it instruments control flow in the program to track a
+   mask which is all-ones but all-zeroes if the CPU speculated
+   a branch in the wrong direction.  This mask is then used to
+   mask the address[-part(s)] of loads with non-invariant addresses,
+   effectively mitigating the attack.  This is controlled by
+   -fpectre-v1[=N] where N is default 2 and
+     1  optimistically omit some instrumentations (currently
+        backedge control flow instructions do not update the
+	speculation mask)
+     2  instrument conservatively using a function-local speculation
+        mask
+     3  instrument conservatively using a global (TLS) speculation
+        mask.  This adds TLS loads/stores of the speculation mask
+	at function boundaries and before and after calls.
+ */
+
+/* We annotate statements whose defs cannot be used to leaking data
+   speculatively via loads with SV1_SAFE.  This is used to optimize
+   masking of indices where masked indices (and derived by constant
+   ones) are not masked again.  Note this works only up to the points
+   that possibly change the speculation mask value.  */
+#define SV1_SAFE GF_PLF_1
+
+namespace {
+
+const pass_data pass_data_spectrev1 =
+{
+  GIMPLE_PASS, /* type */
+  "spectrev1", /* name */
+  OPTGROUP_NONE, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg|PROP_ssa, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_spectrev1 : public gimple_opt_pass
+{
+public:
+  pass_spectrev1 (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_spectrev1, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
+  virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; }
+  virtual unsigned int execute (function *);
+
+  static bool stmt_is_indexed_load (gimple *);
+  static bool stmt_mangles_index (gimple *, tree);
+  static bool find_value_dependent_guard (gimple *, tree);
+  static void mark_influencing_outgoing_flow (basic_block, tree);
+  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
+}; // class pass_spectrev1
+
+bitmap_head *influencing_outgoing_flow;
+
+static bool
+call_between (gimple *first, gimple *second)
+{
+  gcc_assert (gimple_bb (first) == gimple_bb (second));
+  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
+     unique IDs to stmts belonging to groups with the same speculation
+     mask state.  */
+  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
+       gsi_stmt (gsi) != second; gsi_next (&gsi))
+    if (is_gimple_call (gsi_stmt (gsi)))
+      return true;
+  return false;
+}
+
+basic_block ctx_bb;
+gimple *ctx_stmt;
+static bool
+gather_indexes (tree, tree *idx, void *data)
+{
+  vec<tree *> *indexes = (vec<tree *> *)data;
+  if (TREE_CODE (*idx) != SSA_NAME)
+    return true;
+  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
+      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
+      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
+      && (flag_spectrev1 < 3
+	  || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
+    return true;
+  if (indexes->is_empty ())
+    indexes->safe_push (idx);
+  else if (*(*indexes)[0] == *idx)
+    indexes->safe_push (idx);
+  else
+    return false;
+  return true;
+}
+
+tree
+pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask)
+{
+  /* First try to see if we can find a single index we can zero which
+     has the chance of repeating in other loads and also avoids separate
+     LEA and memory references decreasing code size and AGU occupancy.  */
+  auto_vec<tree *, 8> indexes;
+  ctx_bb = gsi_bb (*gsi);
+  ctx_stmt = gsi_stmt (*gsi);
+  if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0
+      && for_each_index (&mem, gather_indexes, (void *)&indexes))
+    {
+      /* All indices are safe.  */
+      if (indexes.is_empty ())
+	return mem;
+      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
+	  <= TYPE_PRECISION (TREE_TYPE (mask)))
+	{
+	  tree idx = *indexes[0];
+	  gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
+		      || POINTER_TYPE_P (TREE_TYPE (idx)));
+	  /* Instead of instrumenting IDX directly we could look at
+	     definitions with a single SSA use and instrument that
+	     instead.  But we have to do some work to make SV1_SAFE
+	     propagation updated then - this would really ask to first
+	     gather all indexes of all refs we want to instrument and
+	     compute some optimal set of instrumentations.  */
+	  gimple_seq seq = NULL;
+	  tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
+	  tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
+					  TREE_TYPE (idx), idx, idx_mask);
+	  /* Mark the instrumentation sequence as visited.  */
+	  for (gimple_stmt_iterator si = gsi_start (seq);
+	       !gsi_end_p (si); gsi_next (&si))
+	    gimple_set_visited (gsi_stmt (si), true);
+	  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
+	  gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
+	  /* Replace downstream users in the BB which reduces register pressure
+	     and allows SV1_SAFE propagation to work (which stops at call/BB
+	     boundaries though).
+	     ???  This is really reg-pressure vs. dependence chains so not
+	     a generally easy thing.  Making the following propagate into
+	     all uses dominated by the insert slows down 429.mcf even more.
+	     ???  We can actually track SV1_SAFE across PHIs but then we
+	     have to propagate into PHIs here.  */
+	  gimple *use_stmt;
+	  use_operand_p use_p;
+	  imm_use_iterator iter;
+	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
+	    if (gimple_bb (use_stmt) == gsi_bb (*gsi)
+		&& gimple_code (use_stmt) != GIMPLE_PHI
+		&& !gimple_visited_p (use_stmt))
+	      {
+		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
+		  SET_USE (use_p, masked_idx);
+		update_stmt (use_stmt);
+	      }
+	  /* Modify MEM in place...  (our stmt is already marked visited).  */
+	  for (unsigned i = 0; i < indexes.length (); ++i)
+	    *indexes[i] = masked_idx;
+	  return mem;
+	}
+    }
+
+  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
+     handle BIT_FIELD_REFs.  */
+
+  /* Strip a bitfield reference to re-apply it at the end.  */
+  tree bitfield = NULL_TREE;
+  tree bitfield_off = NULL_TREE;
+  if (TREE_CODE (mem) == COMPONENT_REF
+      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
+    {
+      bitfield = TREE_OPERAND (mem, 1);
+      bitfield_off = TREE_OPERAND (mem, 2);
+      mem = TREE_OPERAND (mem, 0);
+    }
+
+  tree ptr_base = mem;
+  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
+     into the MEM_REF we create.  */
+  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
+    ptr_base = TREE_OPERAND (ptr_base, 0);
+
+  tree ptr = make_ssa_name (ptr_type_node);
+  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr (ptr_base));
+  gimple_set_visited (new_stmt, true);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  ptr = make_ssa_name (ptr_type_node);
+  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
+				  gimple_assign_lhs (new_stmt), mask);
+  gimple_set_visited (new_stmt, true);
+  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
+  tree type = TREE_TYPE (mem);
+  unsigned align = get_object_alignment (mem);
+  if (align != TYPE_ALIGN (type))
+    type = build_aligned_type (type, align);
+
+  tree new_mem = build2 (MEM_REF, type, ptr,
+			 build_int_cst (reference_alias_ptr_type (mem), 0));
+  if (bitfield)
+    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
+		      bitfield, bitfield_off);
+  return new_mem;
+}
+
+bool
+check_spectrev1_2nd_load (tree, tree *idx, void *data)
+{
+  sbitmap value_from_indexed_load = (sbitmap)data;
+  if (TREE_CODE (*idx) == SSA_NAME
+      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
+    return false;
+  return true;
+}
+
+bool
+check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
+{
+  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
+}
+
+void
+pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
+{
+  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
+		       bb->index))
+    return;
+
+  /* Note we are deliberately non-conservatively stop at call and
+     memory boundaries here expecting earlier optimization to expose
+     value dependences via SSA chains.  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
+  if (gimple_vuse (def_stmt)
+      || !is_gimple_assign (def_stmt))
+    return;
+
+  ssa_op_iter i;
+  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
+    mark_influencing_outgoing_flow (bb, op);
+}
+
+bool
+pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
+{
+  bitmap_iterator bi;
+  unsigned i;
+  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
+			    0, i, bi)
+    /* ???  If control-dependent on.
+       ???  Make bits in influencing_outgoing_flow the index of the BB
+       in RPO order so we could walk bits from STMT "upwards" finding
+       the nearest one.  */
+    if (dominated_by_p (CDI_DOMINATORS,
+			gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
+      {
+	if (dump_enabled_p ())
+	  dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
+			   "is related to indexes used in %G\n",
+			   last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
+			   i, stmt);
+	return true;
+      }
+
+  /* Note we are deliberately non-conservatively stop at call and
+     memory boundaries here expecting earlier optimization to expose
+     value dependences via SSA chains.  */
+  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
+  if (gimple_vuse (def_stmt)
+      || !is_gimple_assign (def_stmt))
+    return false;
+
+  ssa_op_iter it;
+  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
+    if (find_value_dependent_guard (stmt, op))
+      /* Others may be "nearer".  */
+      return true;
+
+  return false;
+}
+
+bool
+pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
+{
+  /* Given we ignore the function boundary for incoming parameters
+     let's ignore return values of calls as well for the purpose
+     of being the first indexed load (also ignore inline-asms).  */
+  if (!gimple_assign_load_p (stmt))
+    return false;
+
+  /* Exclude esp. pointers from the index load itself (but also floats,
+     vectors, etc. - quite a bit handwaving here).  */
+  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
+    return false;
+
+  /* If we do not have any SSA uses the load cannot be one indexed
+     by an attacker controlled value.  */
+  if (zero_ssa_operands (stmt, SSA_OP_USE))
+    return false;
+
+  return true;
+}
+
+/* Return true whether the index in the use operand OP in STMT is
+   not transfered to STMTs defs.  */
+
+bool
+pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
+{
+  if (gimple_assign_load_p (stmt))
+    return true;
+  if (gassign *ass = dyn_cast <gassign *> (stmt))
+    {
+      enum tree_code code = gimple_assign_rhs_code (ass);
+      switch (code)
+	{
+	case TRUNC_DIV_EXPR:
+	case CEIL_DIV_EXPR:
+	case FLOOR_DIV_EXPR:
+	case ROUND_DIV_EXPR:
+	case EXACT_DIV_EXPR:
+	case RDIV_EXPR:
+	case TRUNC_MOD_EXPR:
+	case CEIL_MOD_EXPR:
+	case FLOOR_MOD_EXPR:
+	case ROUND_MOD_EXPR:
+	case LSHIFT_EXPR:
+	case RSHIFT_EXPR:
+	case LROTATE_EXPR:
+	case RROTATE_EXPR:
+	  /* Division, modulus or shifts by the index do not produce
+	     something useful for the attacker.  */
+	  if (gimple_assign_rhs2 (ass) == op)
+	    return true;
+	  break;
+	default:;
+	  /* Comparisons do not produce an index value.  */
+	  if (TREE_CODE_CLASS (code) == tcc_comparison)
+	    return true;
+	}
+    }
+  /* ???  We could handle builtins here.  */
+  return false;
+}
+
+static GTY(()) tree spectrev1_tls_mask_decl;
+
+/* Main entry for spectrev1 pass.  */
+
+unsigned int
+pass_spectrev1::execute (function *fn)
+{
+  calculate_dominance_info (CDI_DOMINATORS);
+  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
+
+  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
+  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
+
+  /* We track for each SSA name whether its value (may) depend(s) on
+     the result of an indexed load.
+     A set of operation will kill a value (enough).  */
+  auto_sbitmap value_from_indexed_load (num_ssa_names);
+  bitmap_clear (value_from_indexed_load);
+
+  unsigned orig_num_ssa_names = num_ssa_names;
+  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
+  for (unsigned i = 1; i < num_ssa_names; ++i)
+    bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack);
+
+
+  /* Diagnosis.  */
+
+  /* Function arguments are not indexed loads unless we want to
+     be conservative to a level no longer useful.  */
+
+  for (int i = 0; i < rpo_num; ++i)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
+
+      for (gphi_iterator gpi = gsi_start_phis (bb);
+	   !gsi_end_p (gpi); gsi_next (&gpi))
+	{
+	  gphi *phi = gpi.phi ();
+	  bool value_from_indexed_load_p = false;
+	  use_operand_p arg_p;
+	  ssa_op_iter it;
+	  FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
+	    {
+	      tree arg = USE_FROM_PTR (arg_p);
+	      if (TREE_CODE (arg) == SSA_NAME
+		  && bitmap_bit_p (value_from_indexed_load,
+				   SSA_NAME_VERSION (arg)))
+		value_from_indexed_load_p = true;
+	    }
+	  if (value_from_indexed_load_p)
+	    bitmap_set_bit (value_from_indexed_load,
+			    SSA_NAME_VERSION (PHI_RESULT (phi)));
+	}
+
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  if (is_gimple_debug (stmt))
+	    continue;
+
+	  if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
+					check_spectrev1_2nd_load,
+					check_spectrev1_2nd_load))
+	    warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
+			stmt);
+
+	  bool value_from_indexed_load_p = false;
+	  if (stmt_is_indexed_load (stmt))
+	    {
+	      /* We are interested in indexes to later loads so ultimatively
+		 register values that all happen to separate SSA defs.
+		 Interesting aggregates will be decomposed by later loads
+		 which we then mark as producing an index.  Simply mark
+		 all SSA defs as coming from an indexed load.  */
+	      /* We are handling a single load in STMT right now.  */
+	      ssa_op_iter it;
+	      tree op;
+	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+	        if (find_value_dependent_guard (stmt, op))
+		  {
+		    /* ???  Somehow record the dependence to point to it in
+		       diagnostics.  */
+		    value_from_indexed_load_p = true;
+		    break;
+		  }
+	    }
+
+	  tree op;
+	  ssa_op_iter it;
+	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+	    if (bitmap_bit_p (value_from_indexed_load,
+			      SSA_NAME_VERSION (op))
+		&& !stmt_mangles_index (stmt, op))
+	      {
+		value_from_indexed_load_p = true;
+		break;
+	      }
+
+	  if (value_from_indexed_load_p)
+	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
+	      /* ???  We could cut off single-bit values from the chain
+	         here or pretain that float loads will be never turned
+		 into integer indices, etc.  */
+	      bitmap_set_bit (value_from_indexed_load,
+			      SSA_NAME_VERSION (op));
+	}
+
+      if (EDGE_COUNT (bb->succs) > 1)
+	{
+	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
+	  /* ???  What about switches?  What about badly speculated EH?  */
+	  if (!stmt)
+	    continue;
+	  /* We could constrain conditions here to those more likely
+	     being "bounds checks".  For example common guards for
+	     indirect accesses are NULL pointer checks.
+	     ???  This isn't fully safe, but it drops the number of
+	     spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
+	  if ((gimple_cond_code (stmt) == EQ_EXPR
+	       || gimple_cond_code (stmt) == NE_EXPR)
+	      && integer_zerop (gimple_cond_rhs (stmt))
+	      && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
+	    ;
+	  else
+	    {
+	      ssa_op_iter it;
+	      tree op;
+	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+		mark_influencing_outgoing_flow (bb, op);
+	    }
+	}
+    }
+
+  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
+    bitmap_release (&influencing_outgoing_flow[i]);
+  XDELETEVEC (influencing_outgoing_flow);
+
+
+
+  /* Instrumentation.  */
+  if (!flag_spectrev1)
+    return 0;
+
+  /* Create the default all-ones mask.  When doing IPA instrumentation
+     this should initialize the mask from TLS memory and outgoing edges
+     need to save the mask to TLS memory.  */
+  gimple *new_stmt;
+  if (!spectrev1_tls_mask_decl
+      && flag_spectrev1 >= 3)
+    {
+      /* Use a smaller variable in case sign-extending loads are
+	 available?  */
+      spectrev1_tls_mask_decl
+	  = build_decl (BUILTINS_LOCATION,
+			VAR_DECL, NULL_TREE, ptr_type_node);
+      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
+      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
+      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
+      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
+      DECL_INITIAL (spectrev1_tls_mask_decl)
+	  = build_all_ones_cst (ptr_type_node);
+      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
+      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
+      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
+      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
+      make_decl_one_only (spectrev1_tls_mask_decl,
+			  DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
+      set_decl_tls_model (spectrev1_tls_mask_decl,
+			  decl_default_tls_model (spectrev1_tls_mask_decl));
+    }
+
+  /* We let the SSA rewriter cope with rewriting mask into SSA and
+     inserting PHI nodes.  */
+  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
+  new_stmt = gimple_build_assign (mask,
+				  flag_spectrev1 >= 3
+				  ? spectrev1_tls_mask_decl
+				  : build_all_ones_cst (ptr_type_node));
+  gimple_stmt_iterator gsi
+      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
+  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
+
+  /* We are using the visited flag to track stmts downstream in a BB.  */
+  for (int i = 0; i < rpo_num; ++i)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
+      for (gphi_iterator gpi = gsi_start_phis (bb);
+	   !gsi_end_p (gpi); gsi_next (&gpi))
+	gimple_set_visited (gpi.phi (), false);
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
+	gimple_set_visited (gsi_stmt (gsi), false);
+    }
+
+  for (int i = 0; i < rpo_num; ++i)
+    {
+      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
+
+      for (gphi_iterator gpi = gsi_start_phis (bb);
+	   !gsi_end_p (gpi); gsi_next (&gpi))
+	{
+	  gphi *phi = gpi.phi ();
+	  /* ???  We can merge SAFE state across BB boundaries in
+	     some cases, like when edges are not critical and the
+	     state was made SAFE in the tail of the predecessors
+	     and not invalidated by calls.   */
+	  gimple_set_plf (phi, SV1_SAFE, false);
+	}
+
+      bool instrumented_call_p = false;
+      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
+	   !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *stmt = gsi_stmt (gsi);
+	  gimple_set_visited (stmt, true);
+	  if (is_gimple_debug (stmt))
+	    continue;
+
+	  tree op;
+	  ssa_op_iter it;
+	  bool safe = is_gimple_assign (stmt);
+	  if (safe)
+	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
+	      {
+		if (safe
+		    && (SSA_NAME_IS_DEFAULT_DEF (op)
+			|| !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
+			/* Once mask can have changed we cannot further
+			   propagate safe state.  */
+			|| gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
+			/* That includes calls if we have instrumented one
+			   in this block.  */
+			|| (instrumented_call_p
+			    && call_between (SSA_NAME_DEF_STMT (op), stmt))))
+		  {
+		    safe = false;
+		    break;
+		  }
+	      }
+	  gimple_set_plf (stmt, SV1_SAFE, safe);
+
+	  /* Instrument bounded loads.
+	     We instrument non-aggregate loads with non-invariant address.
+	     The idea is to reliably instrument the bounded load while
+	     leaving the canary, being it load or store, aggregate or
+	     non-aggregate, alone.  */
+	  if (gimple_assign_single_p (stmt)
+	      && gimple_vuse (stmt)
+	      && !gimple_vdef (stmt)
+	      && !zero_ssa_operands (stmt, SSA_OP_USE))
+	    {
+	      tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
+					     mask);
+	      gimple_assign_set_rhs1 (stmt, new_mem);
+	      update_stmt (stmt);
+	      /* The value loaded my a masked load is "safe".  */
+	      gimple_set_plf (stmt, SV1_SAFE, true);
+	    }
+
+	  /* Instrument return store to TLS mask.  */
+	  if (flag_spectrev1 >= 3
+	      && gimple_code (stmt) == GIMPLE_RETURN)
+	    {
+	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
+	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+	    }
+	  /* Instrument calls with store/load to/from TLS mask.
+	     ???  Placement of the stores/loads can be optimized in a LCM
+	     way.  */
+	  else if (flag_spectrev1 >= 3
+		   && is_gimple_call (stmt)
+		   && gimple_vuse (stmt))
+	    {
+	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
+	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+	      if (!stmt_ends_bb_p (stmt))
+		{
+		  new_stmt = gimple_build_assign (mask,
+						  spectrev1_tls_mask_decl);
+		  gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
+		}
+	      else
+		{
+		  edge_iterator ei;
+		  edge e;
+		  FOR_EACH_EDGE (e, ei, bb->succs)
+		    {
+		      if (e->flags & EDGE_ABNORMAL)
+			continue;
+		      new_stmt = gimple_build_assign (mask,
+						      spectrev1_tls_mask_decl);
+		      gsi_insert_on_edge (e, new_stmt);
+		    }
+		}
+	      instrumented_call_p = true;
+	    }
+	}
+
+      if (EDGE_COUNT (bb->succs) > 1)
+	{
+	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
+	  /* ???  What about switches?  What about badly speculated EH?  */
+	  if (!stmt)
+	    continue;
+
+	  /* Instrument conditional branches to track mis-speculation
+	     via a pointer-sized mask.
+	     ???  We could restrict to instrumenting those conditions
+	     that control interesting loads or apply simple heuristics
+	     like not instrumenting FP compares or equality compares
+	     which are unlikely bounds checks.  But we have to instrument
+	     bool != 0 because multiple conditions might have been
+	     combined.  */
+	  edge truee, falsee;
+	  extract_true_false_edges_from_block (bb, &truee, &falsee);
+	  /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
+	  if (flag_spectrev1 >= 2
+	      || !loop_exits_from_bb_p (bb->loop_father, bb))
+	    {
+	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
+
+	      /* Instrument
+	           if (a_1 > b_2)
+		 as
+	           tem_mask_3 = a_1 > b_2 ? -1 : 0;
+		   if (tem_mask_3 != 0)
+		 this will result in a
+		   xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
+		 sequence which is faster in practice than when retaining
+		 the original jump condition.  This is 10 bytes overhead
+		 on x86_64 plus 3 bytes for an and on the true path and
+		 5 bytes for an and and not on the false path.  */
+	      tree tem_mask = make_ssa_name (ptr_type_node);
+	      new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
+					      build2 (gimple_cond_code (stmt),
+						      boolean_type_node,
+						      gimple_cond_lhs (stmt),
+						      gimple_cond_rhs (stmt)),
+					      build_all_ones_cst (ptr_type_node),
+					      build_zero_cst (ptr_type_node));
+	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
+	      gimple_cond_set_code (stmt, NE_EXPR);
+	      gimple_cond_set_lhs (stmt, tem_mask);
+	      gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
+	      update_stmt (stmt);
+
+	      /* On the false edge
+	           mask = mask & ~tem_mask_3;  */
+	      gimple_seq tems = NULL;
+	      tree tem_mask2 = make_ssa_name (ptr_type_node);
+	      new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
+					      tem_mask);
+	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
+	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
+					      mask, tem_mask2);
+	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
+	      gsi_insert_seq_on_edge (falsee, tems);
+
+	      /* On the true edge
+	           mask = mask & tem_mask_3;  */
+	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
+					      mask, tem_mask);
+	      gsi_insert_on_edge (truee, new_stmt);
+	    }
+	}
+    }
+
+  gsi_commit_edge_inserts ();
+
+  return 0;
+}
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_spectrev1 (gcc::context *ctxt)
+{
+  return new pass_spectrev1 (ctxt);
+}
diff --git a/gcc/params.def b/gcc/params.def
index 6f98fccd291..19f7dbf4dad 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS,
 	 " loops.",
 	 100, 0, 0)
 
+DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES,
+	 "spectre-v1-max-instrument-indices",
+	 "Maximum number of indices to instrument before instrumenting the whole address.",
+	 1, 0, 0)
+
 /*
 
 Local variables:
diff --git a/gcc/passes.def b/gcc/passes.def
index 144df4fa417..2fe0cdcfa7e 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
+  NEXT_PASS (pass_spectrev1);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
 
diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
new file mode 100644
index 00000000000..3ac647e72fd
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-Wspectre-v1" } */
+
+unsigned char a[1024];
+int b[256];
+int foo (int i, int bound)
+{
+  if (i < bound)
+    return b[a[i]];  /* { dg-warning "spectrev1" } */
+}
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 9f9d85fdbc3..f5c164f465f 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-18 15:37 Spectre V1 diagnostic / mitigation Richard Biener
@ 2018-12-18 16:17 ` Jeff Law
  2018-12-19 11:16   ` Richard Biener
  2018-12-18 16:48 ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 17+ messages in thread
From: Jeff Law @ 2018-12-18 16:17 UTC (permalink / raw)
  To: Richard Biener, gcc

On 12/18/18 8:36 AM, Richard Biener wrote:
> 
> Hi,
> 
> in the past weeks I've been looking into prototyping both spectre V1 
> (speculative array bound bypass) diagnostics and mitigation in an
> architecture independent manner to assess feasability and some kind
> of upper bound on the performance impact one can expect.
> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> an interesting read in this context as well.
> 
> For simplicity I have implemented mitigation on GIMPLE right before
> RTL expansion and have chosen TLS to do mitigation across function
> boundaries.  Diagnostics sit in the same place but both are not in
> any way dependent on each other.
> 
> The mitigation strategy chosen is that of tracking speculation
> state via a mask that can be used to zero parts of the addresses
> that leak the actual data.  That's similar to what aarch64 does
> with -mtrack-speculation (but oddly there's no mitigation there).
> 
> I've optimized things to the point that is reasonable when working
> target independent on GIMPLE but I've only looked at x86 assembly
> and performance.  I expect any "final" mitigation if we choose to
> implement and integrate such would be after RTL expansion since
> RTL expansion can end up introducing quite some control flow whose
> speculation state is not properly tracked by the prototype.
> 
> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> mitigation and =3 does mitigation global with passing the state
> via TLS memory.
> 
> The following was measured on a Haswell desktop CPU:
[ ... ]
Interesting.  So we'd been kicking this issue around a bit internally.

The number of packages where we'd want to turn this on was very small
and thus it was difficult to justify burning resources in this space.
LLVM might be an option for those limited packages, but LLVM is missing
other security things we don't want to lose (such as stack clash
mitigation).

In the end we punted for the immediate future.  We'll almost certainly
revisit at some point and your prototype would obviously factor into the
calculus around future decisions.

[ ... ]


> 
> 
> The patch relies heavily on RTL optimizations for DCE purposes.  At the
> same time we rely on RTL not statically computing the mask (RTL has no
> conditional constant propagation).  Full instrumentation of the classic
> Spectre V1 testcase
Right. But it does do constant propagation into arms of conditionals as
well as jump threading.  I'd fear they might compromise things.
Obviously we'd need to look further into those issues.  But even if they
do, something like what you've done may mitigate enough vulnerable
sequences that it's worth doing, even if there's some gaps due to "over"
optimization in the RTL space.

[  ... ]

> 
> so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> 
> Patch below for reference (and your own testing in case you are curious).
> I do not plan to pursue this further at this point.
Understood.  Thanks for posting it.  We're not currently working in this
space, but again, we may re-evaluate that stance in the future.

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-18 15:37 Spectre V1 diagnostic / mitigation Richard Biener
  2018-12-18 16:17 ` Jeff Law
@ 2018-12-18 16:48 ` Richard Earnshaw (lists)
  2018-12-19 11:25   ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Earnshaw (lists) @ 2018-12-18 16:48 UTC (permalink / raw)
  To: Richard Biener, gcc

On 18/12/2018 15:36, Richard Biener wrote:
> 
> Hi,
> 
> in the past weeks I've been looking into prototyping both spectre V1 
> (speculative array bound bypass) diagnostics and mitigation in an
> architecture independent manner to assess feasability and some kind
> of upper bound on the performance impact one can expect.
> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> an interesting read in this context as well.

Interesting, thanks for posting this.

> 
> For simplicity I have implemented mitigation on GIMPLE right before
> RTL expansion and have chosen TLS to do mitigation across function
> boundaries.  Diagnostics sit in the same place but both are not in
> any way dependent on each other.

We considered using TLS for propagating the state across call-boundaries
on AArch64, but rejected it for several reasons.

- It's quite expensive to have to set up the TLS state in every function;
- It requires some global code to initialize the state variable - that's
kind of ABI;
- It also seems likely to be vulnerable to Spectre variant 4 - unless
the CPU can always correctly store-to-load forward the speculation
state, then you have the situation where the load may see an old value
of the state - and that's almost certain to say "we're not speculating".

The last one is really the killer here.

> 
> The mitigation strategy chosen is that of tracking speculation
> state via a mask that can be used to zero parts of the addresses
> that leak the actual data.  That's similar to what aarch64 does
> with -mtrack-speculation (but oddly there's no mitigation there).

We rely on the user inserting the new builtin, which we can more
effectively optimize if the compiler is generating speculation state
tracking data.  That doesn't preclude a full solution at a later date,
but it looked like it was likely overkill for protecting every load and
safely pruning the loads is not an easy problem to solve.  Of course,
the builtin does require the programmer to do some work to identify
which memory accesses might be vulnerable.

R.


> 
> I've optimized things to the point that is reasonable when working
> target independent on GIMPLE but I've only looked at x86 assembly
> and performance.  I expect any "final" mitigation if we choose to
> implement and integrate such would be after RTL expansion since
> RTL expansion can end up introducing quite some control flow whose
> speculation state is not properly tracked by the prototype.
> 
> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> mitigation and =3 does mitigation global with passing the state
> via TLS memory.
> 
> The following was measured on a Haswell desktop CPU:
> 
> 	-O2 vs. -O2 -fspectre-v1=2
> 
>                                   Estimated                       Estimated
>                 Base     Base       Base        Peak     Peak       Peak
> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> -------------- ------  ---------  ---------    ------  ---------  ---------
> 400.perlbench    9770        245       39.8 *    9770        452       21.6 *  184%
> 401.bzip2        9650        378       25.5 *    9650        726       13.3 *  192%
> 403.gcc          8050        236       34.2 *    8050        352       22.8 *  149%
> 429.mcf          9120        223       40.9 *    9120        656       13.9 *  294%
> 445.gobmk       10490        400       26.2 *   10490        666       15.8 *  167%
> 456.hmmer        9330        388       24.1 *    9330        536       17.4 *  138%
> 458.sjeng       12100        437       27.7 *   12100        661       18.3 *  151%
> 462.libquantum  20720        300       69.1 *   20720        384       53.9 *  128%
> 464.h264ref     22130        451       49.1 *   22130        586       37.8 *  130%
> 471.omnetpp      6250        291       21.5 *    6250        398       15.7 *  137%
> 473.astar        7020        334       21.0 *    7020        522       13.5 *  156%
> 483.xalancbmk    6900        182       37.9 *    6900        306       22.6 *  168%
>  Est. SPECint_base2006                   --
>  Est. SPECint2006                                                        --
> 
>    -O2 -fspectre-v1=3
> 
>                                   Estimated                       Estimated
>                 Base     Base       Base        Peak     Peak       Peak
> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> -------------- ------  ---------  ---------    ------  ---------  ---------
> 400.perlbench                                    9770        497       19.6 *  203%
> 401.bzip2                                        9650        772       12.5 *  204%
> 403.gcc                                          8050        427       18.9 *  181%
> 429.mcf                                          9120        696       13.1 *  312%
> 445.gobmk                                       10490        726       14.4 *  181%
> 456.hmmer                                        9330        537       17.4 *  138%
> 458.sjeng                                       12100        721       16.8 *  165%
> 462.libquantum                                  20720        446       46.4 *  149%
> 464.h264ref                                     22130        613       36.1 *  136%
> 471.omnetpp                                      6250        471       13.3 *  162%
> 473.astar                                        7020        579       12.1 *  173%
> 483.xalancbmk                                    6900        350       19.7 *  192%
>  Est. SPECint(R)_base2006           Not Run
>  Est. SPECint2006                                                        --
> 
> 
> While the following was measured on a Zen Epyc server:
> 
> -O2 vs -O2 -fspectre-v1=2
> 
>                        Estimated                       Estimated
>                  Base     Base        Base        Peak     Peak        Peak
> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
> --------------- -------  ---------  ---------    -------  ---------  ---------
> 500.perlbench_r       1        499       3.19  *       1        621       2.56  * 124%
> 502.gcc_r             1        286       4.95  *       1        392       3.61  * 137%
> 505.mcf_r             1        331       4.88  *       1        456       3.55  * 138%
> 520.omnetpp_r         1        454       2.89  *       1        563       2.33  * 124%
> 523.xalancbmk_r       1        328       3.22  *       1        569       1.86  * 173%
> 525.x264_r            1        518       3.38  *       1        776       2.26  * 150%
> 531.deepsjeng_r       1        365       3.14  *       1        448       2.56  * 123%
> 541.leela_r           1        598       2.77  *       1        729       2.27  * 122%
> 548.exchange2_r       1        460       5.69  *       1        756       3.46  * 164%
> 557.xz_r              1        403       2.68  *       1        586       1.84  * 145%
>  Est. SPECrate2017_int_base              3.55
>  Est. SPECrate2017_int_peak                                               2.56    72%
> 
> -O2 -fspectre-v2=3
> 
>                        Estimated                       Estimated
>                  Base     Base        Base        Peak     Peak        Peak
> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
> --------------- -------  ---------  ---------    -------  ---------  ---------
> 500.perlbench_r                               NR       1        700       2.27  * 140%
> 502.gcc_r                                     NR       1        485       2.92  * 170%
> 505.mcf_r                                     NR       1        596       2.71  * 180%
> 520.omnetpp_r                                 NR       1        604       2.17  * 133%
> 523.xalancbmk_r                               NR       1        643       1.64  * 196%
> 525.x264_r                                    NR       1        797       2.20  * 154%
> 531.deepsjeng_r                               NR       1        542       2.12  * 149%
> 541.leela_r                                   NR       1        872       1.90  * 146%
> 548.exchange2_r                               NR       1        761       3.44  * 165%
> 557.xz_r                                      NR       1        595       1.81  * 148%
>  Est. SPECrate2017_int_base           Not Run
>  Est. SPECrate2017_int_peak                                               2.26    64%
> 
> 
> 
> you can see, even thoug we're comparing apples and oranges, that the 
> performance impact is quite dependent on the microarchitecture.
> 
> Similarly interesting as performance is the effect on text size which is
> surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
> bytes per instrumented memory).
> 
> CPU2016:
>    BASE  -O2
>    text	   data	    bss	    dec	    hex	filename
> 1117726	  20928	  12704	1151358	 11917e	400.perlbench
>   56568	   3800	   4416	  64784	   fd10	401.bzip2
> 3419568	   7912	 751520	4179000	 3fc438	403.gcc
>   12212	    712	  11984	  24908	   614c	429.mcf
> 1460694	2081772	2330096	5872562	 599bb2	445.gobmk
>  284929	   5956	  82040	 372925	  5b0bd	456.hmmer
>  130782	   2152	2576896	2709830	 295946	458.sjeng
>   41915	    764	     96	  42775	   a717	462.libquantum
>  505452	  11220	 372320	 888992	  d90a0	464.h264ref
>  638188	   9584	  14664	 662436	  a1ba4	471.omnetpp
>   38859	    900	   5216	  44975	   afaf	473.astar
> 4033878	 140248	  12168	4186294	 3fe0b6	483.xalancbmk
>    PEAK -O2 -fspectre-v1=2
>    text	   data	    bss	    dec	    hex	filename
> 1508032	  20928	  12704	1541664	 178620	400.perlbench	135%
>   76098	   3800	   4416	  84314	  1495a	401.bzip2	135%
> 4483530	   7912	 751520	5242962	 500052	403.gcc		131%
>   16006	    712	  11984	  28702	   701e	429.mcf		131%
> 1647384	2081772	2330096	6059252	 5c74f4	445.gobmk	112%
>  377259	   5956	  82040	 465255	  71967	456.hmmer	132%
>  164672	   2152	2576896	2743720	 29dda8	458.sjeng	126%
>   47901	    764	     96	  48761	   be79	462.libquantum	114%
>  649854	  11220	 372320	1033394	  fc4b2	464.h264ref	129%
>  706908	   9584	  14664	 731156	  b2814	471.omnetpp	111%
>   48493	    900	   5216	  54609	   d551	473.astar	125%
> 4862056	 140248	  12168	5014472	 4c83c8	483.xalancbmk	121%
>    PEAK -O2 -fspectre-v1=3
>    text	   data	    bss	    dec	    hex	filename
> 1742008	  20936	  12704	1775648	 1b1820	400.perlbench	156%
>   83338	   3808	   4416	  91562	  165aa	401.bzip2	147%
> 5219850	   7920	 751520	5979290	 5b3c9a	403.gcc		153%
>   17422	    720	  11984	  30126	   75ae	429.mcf		143%
> 1801688	2081780	2330096	6213564	 5ecfbc	445.gobmk	123%
>  431827	   5964	  82040	 519831	  7ee97	456.hmmer	152%
>  182200	   2160	2576896	2761256	 2a2228	458.sjeng	139%
>   53773	    772	     96	  54641	   d571	462.libquantum	128%
>  691798	  11228	 372320	1075346	 106892	464.h264ref	137%
>  976692	   9592	  14664	1000948	  f45f4	471.omnetpp	153%
>   54525	    908	   5216	  60649	   ece9	473.astar	140%
> 5808306	 140256	  12168	5960730	 5af41a	483.xalancbmk	144%
> 
> CPU2017:
>    BASE -O2 -g
>    text    data     bss     dec     hex filename
> 2209713    8576    9080 2227369  21fca9 500.perlbench_r
> 9295702   37432 1150664 10483798 9ff856 502.gcc_r
>   21795     712     744   23251    5ad3 505.mcf_r
> 2067560    8984   46888 2123432  2066a8 520.omnetpp_r
> 5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
>  508402    6102   29592  544096   84d60 525.x264_r
>   84222     784 12138360 12223366 ba8386 531.deepsjeng_r
>  223480    8544   30072  262096   3ffd0 541.leela_r
>   70554     864    6384   77802   12fea 548.exchange2_r
>  180640     884   17704  199228   30a3c 557.xz_r
>    PEAK -fspectre-v2=2
>    text    data     bss     dec     hex filename
> 2991161    8576    9080 3008817  2de931 500.perlbench_r	135%
> 12244886  37432 1150664 13432982 ccf896 502.gcc_r	132%
>   28475     712     744   29931    74eb 505.mcf_r	131%
> 2397026    8984   46888 2452898  256da2 520.omnetpp_r	116%
> 6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r	119%
>  645730    6102   29592  681424   a65d0 525.x264_r	127%
>  111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
>  260835    8544   30072  299451   491bb 541.leela_r     117%
>   96874     864    6384  104122   196ba 548.exchange2_r	137%
>  215288     884   17704  233876   39194 557.xz_r	119%
>    PEAK -fspectre-v2=3
>    text    data     bss     dec     hex filename
> 3365945    8584    9080 3383609  33a139 500.perlbench_r	152%
> 14790638  37440 1150664 15978742 f3d0f6 502.gcc_r	159%
>   31419     720     744   32883    8073 505.mcf_r	144%
> 2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r	139%
> 8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r	142%
>  697434    6110   29592  733136   b2fd0 525.x264_r	137%
>  123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
>  315347    8552   30072  353971   566b3 541.leela_r	141%
>   98578     872    6384  105834   19d6a 548.exchange2_r	140%
>  239144     892   17704  257740   3eecc 557.xz_r	133%
> 
> 
> The patch relies heavily on RTL optimizations for DCE purposes.  At the
> same time we rely on RTL not statically computing the mask (RTL has no
> conditional constant propagation).  Full instrumentation of the classic
> Spectre V1 testcase
> 
> char a[1024];
> int b[1024];
> int foo (int i, int bound)
> {
>   if (i < bound)
>     return b[a[i]];
> }
> 
> is the following:
> 
> foo:
> .LFB0:  
>         .cfi_startproc
>         xorl    %eax, %eax
>         cmpl    %esi, %edi
>         setge   %al
>         subq    $1, %rax
>         jne     .L4
>         ret
>         .p2align 4,,10
>         .p2align 3
> .L4:
>         andl    %eax, %edi
>         movslq  %edi, %rdi
>         movsbq  a(%rdi), %rax
>         movl    b(,%rax,4), %eax
>         ret
> 
> so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> 
> Patch below for reference (and your own testing in case you are curious).
> I do not plan to pursue this further at this point.
> 
> Richard.
> 
> From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
> From: Richard Guenther <rguenther@suse.de>
> Date: Wed, 5 Dec 2018 13:17:02 +0100
> Subject: [PATCH] warn-spectrev1
> 
> 
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 7960cace16a..64d472d7fa0 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1334,6 +1334,7 @@ OBJS = \
>  	gimple-ssa-sprintf.o \
>  	gimple-ssa-warn-alloca.o \
>  	gimple-ssa-warn-restrict.o \
> +	gimple-ssa-spectrev1.o \
>  	gimple-streamer-in.o \
>  	gimple-streamer-out.o \
>  	gimple-walk.o \
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 45d7f6189e5..1ae7fcfe177 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -702,6 +702,10 @@ Warn when one local variable shadows another local variable or parameter of comp
>  Wshadow-compatible-local
>  Common Warning Undocumented Alias(Wshadow=compatible-local)
>  
> +Wspectre-v1
> +Common Var(warn_spectrev1) Warning
> +Warn about code susceptible to spectre v1 style attacks.
> +
>  Wstack-protector
>  Common Var(warn_stack_protect) Warning
>  Warn when not issuing stack smashing protection for some reason.
> @@ -2406,6 +2410,14 @@ fsingle-precision-constant
>  Common Report Var(flag_single_precision_constant) Optimization
>  Convert floating point constants to single precision constants.
>  
> +fspectre-v1
> +Common Alias(fspectre-v1=, 2, 0)
> +Insert code to mitigate spectre v1 style attacks.
> +
> +fspectre-v1=
> +Common Report RejectNegative Joined UInteger IntegerRange(0, 3) Var(flag_spectrev1) Optimization
> +Insert code to mitigate spectre v1 style attacks.
> +
>  fsplit-ivs-in-unroller
>  Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
>  Split lifetimes of induction variables when loops are unrolled.
> diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
> new file mode 100644
> index 00000000000..c2a5dc95324
> --- /dev/null
> +++ b/gcc/gimple-ssa-spectrev1.cc
> @@ -0,0 +1,824 @@
> +/* Loop interchange.
> +   Copyright (C) 2017-2018 Free Software Foundation, Inc.
> +   Contributed by ARM Ltd.
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it
> +under the terms of the GNU General Public License as published by the
> +Free Software Foundation; either version 3, or (at your option) any
> +later version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT
> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "backend.h"
> +#include "is-a.h"
> +#include "tree.h"
> +#include "gimple.h"
> +#include "tree-pass.h"
> +#include "ssa.h"
> +#include "gimple-pretty-print.h"
> +#include "gimple-iterator.h"
> +#include "params.h"
> +#include "tree-ssa.h"
> +#include "cfganal.h"
> +#include "gimple-walk.h"
> +#include "tree-ssa-loop.h"
> +#include "tree-dfa.h"
> +#include "tree-cfg.h"
> +#include "fold-const.h"
> +#include "builtins.h"
> +#include "alias.h"
> +#include "cfgloop.h"
> +#include "varasm.h"
> +#include "cgraph.h"
> +#include "gimple-fold.h"
> +#include "diagnostic.h"
> +
> +/* The Spectre V1 situation is as follows:
> +
> +      if (attacker_controlled_idx < bound)  // speculated as true but is false
> +        {
> +	  // out-of-bound access, returns value interesting to attacker
> +	  val = mem[attacker_controlled_idx];
> +	  // access that causes a cache-line to be brought in - canary
> +	  ... = attacker_controlled_mem[val];
> +	}
> +
> +   The last load provides the side-channel.  The pattern can be split
> +   into multiple functions or translation units.  Conservatively we'd
> +   have to warn about
> +
> +      int foo (int *a) {  return *a; }
> +
> +   thus any indirect (or indexed) memory access.  That's obvioulsy
> +   not useful.
> +
> +   The next level would be to warn only when we see load of val as
> +   well.  That then misses cases like
> +
> +      int foo (int *a, int *b)
> +      {
> +        int idx = load_it (a);
> +	return load_it (&b[idx]);
> +      }
> +
> +   Still we'd warn about cases like
> +
> +      struct Foo { int *a; };
> +      int foo (struct Foo *a) { return *a->a; }
> +
> +   though dereferencing VAL isn't really an interesting case.  It's
> +   hard to exclude this conservatively so the obvious solution is
> +   to restrict the kind of loads that produce val, for example based
> +   on its type or its number of bits.  It's tempting to do this at
> +   the point of the load producing val but in the end what matters
> +   is the number of bits that reach the second loads [as index] given
> +   there are practical limits on the size of the canary.  For this
> +   we have to consider
> +
> +      int foo (struct Foo *a, int *b)
> +      {
> +        int *c = a->a;
> +	int idx = *b;
> +	return *(c + idx);
> +      }
> +
> +   where idx has too many bits to be an interesting attack vector(?).
> + */
> +
> +/* The pass does two things, first it performs data flow analysis
> +   to be able to warn about the second load.  This is controlled
> +   via -Wspectre-v1.
> +
> +   Second it instruments control flow in the program to track a
> +   mask which is all-ones but all-zeroes if the CPU speculated
> +   a branch in the wrong direction.  This mask is then used to
> +   mask the address[-part(s)] of loads with non-invariant addresses,
> +   effectively mitigating the attack.  This is controlled by
> +   -fpectre-v1[=N] where N is default 2 and
> +     1  optimistically omit some instrumentations (currently
> +        backedge control flow instructions do not update the
> +	speculation mask)
> +     2  instrument conservatively using a function-local speculation
> +        mask
> +     3  instrument conservatively using a global (TLS) speculation
> +        mask.  This adds TLS loads/stores of the speculation mask
> +	at function boundaries and before and after calls.
> + */
> +
> +/* We annotate statements whose defs cannot be used to leaking data
> +   speculatively via loads with SV1_SAFE.  This is used to optimize
> +   masking of indices where masked indices (and derived by constant
> +   ones) are not masked again.  Note this works only up to the points
> +   that possibly change the speculation mask value.  */
> +#define SV1_SAFE GF_PLF_1
> +
> +namespace {
> +
> +const pass_data pass_data_spectrev1 =
> +{
> +  GIMPLE_PASS, /* type */
> +  "spectrev1", /* name */
> +  OPTGROUP_NONE, /* optinfo_flags */
> +  TV_NONE, /* tv_id */
> +  PROP_cfg|PROP_ssa, /* properties_required */
> +  0, /* properties_provided */
> +  0, /* properties_destroyed */
> +  0, /* todo_flags_start */
> +  TODO_update_ssa, /* todo_flags_finish */
> +};
> +
> +class pass_spectrev1 : public gimple_opt_pass
> +{
> +public:
> +  pass_spectrev1 (gcc::context *ctxt)
> +    : gimple_opt_pass (pass_data_spectrev1, ctxt)
> +  {}
> +
> +  /* opt_pass methods: */
> +  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
> +  virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; }
> +  virtual unsigned int execute (function *);
> +
> +  static bool stmt_is_indexed_load (gimple *);
> +  static bool stmt_mangles_index (gimple *, tree);
> +  static bool find_value_dependent_guard (gimple *, tree);
> +  static void mark_influencing_outgoing_flow (basic_block, tree);
> +  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
> +}; // class pass_spectrev1
> +
> +bitmap_head *influencing_outgoing_flow;
> +
> +static bool
> +call_between (gimple *first, gimple *second)
> +{
> +  gcc_assert (gimple_bb (first) == gimple_bb (second));
> +  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
> +     unique IDs to stmts belonging to groups with the same speculation
> +     mask state.  */
> +  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
> +       gsi_stmt (gsi) != second; gsi_next (&gsi))
> +    if (is_gimple_call (gsi_stmt (gsi)))
> +      return true;
> +  return false;
> +}
> +
> +basic_block ctx_bb;
> +gimple *ctx_stmt;
> +static bool
> +gather_indexes (tree, tree *idx, void *data)
> +{
> +  vec<tree *> *indexes = (vec<tree *> *)data;
> +  if (TREE_CODE (*idx) != SSA_NAME)
> +    return true;
> +  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
> +      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
> +      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
> +      && (flag_spectrev1 < 3
> +	  || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
> +    return true;
> +  if (indexes->is_empty ())
> +    indexes->safe_push (idx);
> +  else if (*(*indexes)[0] == *idx)
> +    indexes->safe_push (idx);
> +  else
> +    return false;
> +  return true;
> +}
> +
> +tree
> +pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask)
> +{
> +  /* First try to see if we can find a single index we can zero which
> +     has the chance of repeating in other loads and also avoids separate
> +     LEA and memory references decreasing code size and AGU occupancy.  */
> +  auto_vec<tree *, 8> indexes;
> +  ctx_bb = gsi_bb (*gsi);
> +  ctx_stmt = gsi_stmt (*gsi);
> +  if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0
> +      && for_each_index (&mem, gather_indexes, (void *)&indexes))
> +    {
> +      /* All indices are safe.  */
> +      if (indexes.is_empty ())
> +	return mem;
> +      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
> +	  <= TYPE_PRECISION (TREE_TYPE (mask)))
> +	{
> +	  tree idx = *indexes[0];
> +	  gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
> +		      || POINTER_TYPE_P (TREE_TYPE (idx)));
> +	  /* Instead of instrumenting IDX directly we could look at
> +	     definitions with a single SSA use and instrument that
> +	     instead.  But we have to do some work to make SV1_SAFE
> +	     propagation updated then - this would really ask to first
> +	     gather all indexes of all refs we want to instrument and
> +	     compute some optimal set of instrumentations.  */
> +	  gimple_seq seq = NULL;
> +	  tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
> +	  tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
> +					  TREE_TYPE (idx), idx, idx_mask);
> +	  /* Mark the instrumentation sequence as visited.  */
> +	  for (gimple_stmt_iterator si = gsi_start (seq);
> +	       !gsi_end_p (si); gsi_next (&si))
> +	    gimple_set_visited (gsi_stmt (si), true);
> +	  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> +	  gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
> +	  /* Replace downstream users in the BB which reduces register pressure
> +	     and allows SV1_SAFE propagation to work (which stops at call/BB
> +	     boundaries though).
> +	     ???  This is really reg-pressure vs. dependence chains so not
> +	     a generally easy thing.  Making the following propagate into
> +	     all uses dominated by the insert slows down 429.mcf even more.
> +	     ???  We can actually track SV1_SAFE across PHIs but then we
> +	     have to propagate into PHIs here.  */
> +	  gimple *use_stmt;
> +	  use_operand_p use_p;
> +	  imm_use_iterator iter;
> +	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
> +	    if (gimple_bb (use_stmt) == gsi_bb (*gsi)
> +		&& gimple_code (use_stmt) != GIMPLE_PHI
> +		&& !gimple_visited_p (use_stmt))
> +	      {
> +		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> +		  SET_USE (use_p, masked_idx);
> +		update_stmt (use_stmt);
> +	      }
> +	  /* Modify MEM in place...  (our stmt is already marked visited).  */
> +	  for (unsigned i = 0; i < indexes.length (); ++i)
> +	    *indexes[i] = masked_idx;
> +	  return mem;
> +	}
> +    }
> +
> +  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
> +     handle BIT_FIELD_REFs.  */
> +
> +  /* Strip a bitfield reference to re-apply it at the end.  */
> +  tree bitfield = NULL_TREE;
> +  tree bitfield_off = NULL_TREE;
> +  if (TREE_CODE (mem) == COMPONENT_REF
> +      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
> +    {
> +      bitfield = TREE_OPERAND (mem, 1);
> +      bitfield_off = TREE_OPERAND (mem, 2);
> +      mem = TREE_OPERAND (mem, 0);
> +    }
> +
> +  tree ptr_base = mem;
> +  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
> +     into the MEM_REF we create.  */
> +  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
> +    ptr_base = TREE_OPERAND (ptr_base, 0);
> +
> +  tree ptr = make_ssa_name (ptr_type_node);
> +  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr (ptr_base));
> +  gimple_set_visited (new_stmt, true);
> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> +  ptr = make_ssa_name (ptr_type_node);
> +  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
> +				  gimple_assign_lhs (new_stmt), mask);
> +  gimple_set_visited (new_stmt, true);
> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> +  tree type = TREE_TYPE (mem);
> +  unsigned align = get_object_alignment (mem);
> +  if (align != TYPE_ALIGN (type))
> +    type = build_aligned_type (type, align);
> +
> +  tree new_mem = build2 (MEM_REF, type, ptr,
> +			 build_int_cst (reference_alias_ptr_type (mem), 0));
> +  if (bitfield)
> +    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
> +		      bitfield, bitfield_off);
> +  return new_mem;
> +}
> +
> +bool
> +check_spectrev1_2nd_load (tree, tree *idx, void *data)
> +{
> +  sbitmap value_from_indexed_load = (sbitmap)data;
> +  if (TREE_CODE (*idx) == SSA_NAME
> +      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
> +    return false;
> +  return true;
> +}
> +
> +bool
> +check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
> +{
> +  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
> +}
> +
> +void
> +pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
> +{
> +  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> +		       bb->index))
> +    return;
> +
> +  /* Note we are deliberately non-conservatively stop at call and
> +     memory boundaries here expecting earlier optimization to expose
> +     value dependences via SSA chains.  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> +  if (gimple_vuse (def_stmt)
> +      || !is_gimple_assign (def_stmt))
> +    return;
> +
> +  ssa_op_iter i;
> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
> +    mark_influencing_outgoing_flow (bb, op);
> +}
> +
> +bool
> +pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
> +{
> +  bitmap_iterator bi;
> +  unsigned i;
> +  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> +			    0, i, bi)
> +    /* ???  If control-dependent on.
> +       ???  Make bits in influencing_outgoing_flow the index of the BB
> +       in RPO order so we could walk bits from STMT "upwards" finding
> +       the nearest one.  */
> +    if (dominated_by_p (CDI_DOMINATORS,
> +			gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
> +      {
> +	if (dump_enabled_p ())
> +	  dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
> +			   "is related to indexes used in %G\n",
> +			   last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
> +			   i, stmt);
> +	return true;
> +      }
> +
> +  /* Note we are deliberately non-conservatively stop at call and
> +     memory boundaries here expecting earlier optimization to expose
> +     value dependences via SSA chains.  */
> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> +  if (gimple_vuse (def_stmt)
> +      || !is_gimple_assign (def_stmt))
> +    return false;
> +
> +  ssa_op_iter it;
> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
> +    if (find_value_dependent_guard (stmt, op))
> +      /* Others may be "nearer".  */
> +      return true;
> +
> +  return false;
> +}
> +
> +bool
> +pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
> +{
> +  /* Given we ignore the function boundary for incoming parameters
> +     let's ignore return values of calls as well for the purpose
> +     of being the first indexed load (also ignore inline-asms).  */
> +  if (!gimple_assign_load_p (stmt))
> +    return false;
> +
> +  /* Exclude esp. pointers from the index load itself (but also floats,
> +     vectors, etc. - quite a bit handwaving here).  */
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
> +    return false;
> +
> +  /* If we do not have any SSA uses the load cannot be one indexed
> +     by an attacker controlled value.  */
> +  if (zero_ssa_operands (stmt, SSA_OP_USE))
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Return true whether the index in the use operand OP in STMT is
> +   not transfered to STMTs defs.  */
> +
> +bool
> +pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
> +{
> +  if (gimple_assign_load_p (stmt))
> +    return true;
> +  if (gassign *ass = dyn_cast <gassign *> (stmt))
> +    {
> +      enum tree_code code = gimple_assign_rhs_code (ass);
> +      switch (code)
> +	{
> +	case TRUNC_DIV_EXPR:
> +	case CEIL_DIV_EXPR:
> +	case FLOOR_DIV_EXPR:
> +	case ROUND_DIV_EXPR:
> +	case EXACT_DIV_EXPR:
> +	case RDIV_EXPR:
> +	case TRUNC_MOD_EXPR:
> +	case CEIL_MOD_EXPR:
> +	case FLOOR_MOD_EXPR:
> +	case ROUND_MOD_EXPR:
> +	case LSHIFT_EXPR:
> +	case RSHIFT_EXPR:
> +	case LROTATE_EXPR:
> +	case RROTATE_EXPR:
> +	  /* Division, modulus or shifts by the index do not produce
> +	     something useful for the attacker.  */
> +	  if (gimple_assign_rhs2 (ass) == op)
> +	    return true;
> +	  break;
> +	default:;
> +	  /* Comparisons do not produce an index value.  */
> +	  if (TREE_CODE_CLASS (code) == tcc_comparison)
> +	    return true;
> +	}
> +    }
> +  /* ???  We could handle builtins here.  */
> +  return false;
> +}
> +
> +static GTY(()) tree spectrev1_tls_mask_decl;
> +
> +/* Main entry for spectrev1 pass.  */
> +
> +unsigned int
> +pass_spectrev1::execute (function *fn)
> +{
> +  calculate_dominance_info (CDI_DOMINATORS);
> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> +
> +  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> +  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
> +
> +  /* We track for each SSA name whether its value (may) depend(s) on
> +     the result of an indexed load.
> +     A set of operation will kill a value (enough).  */
> +  auto_sbitmap value_from_indexed_load (num_ssa_names);
> +  bitmap_clear (value_from_indexed_load);
> +
> +  unsigned orig_num_ssa_names = num_ssa_names;
> +  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
> +  for (unsigned i = 1; i < num_ssa_names; ++i)
> +    bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack);
> +
> +
> +  /* Diagnosis.  */
> +
> +  /* Function arguments are not indexed loads unless we want to
> +     be conservative to a level no longer useful.  */
> +
> +  for (int i = 0; i < rpo_num; ++i)
> +    {
> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> +
> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> +	   !gsi_end_p (gpi); gsi_next (&gpi))
> +	{
> +	  gphi *phi = gpi.phi ();
> +	  bool value_from_indexed_load_p = false;
> +	  use_operand_p arg_p;
> +	  ssa_op_iter it;
> +	  FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
> +	    {
> +	      tree arg = USE_FROM_PTR (arg_p);
> +	      if (TREE_CODE (arg) == SSA_NAME
> +		  && bitmap_bit_p (value_from_indexed_load,
> +				   SSA_NAME_VERSION (arg)))
> +		value_from_indexed_load_p = true;
> +	    }
> +	  if (value_from_indexed_load_p)
> +	    bitmap_set_bit (value_from_indexed_load,
> +			    SSA_NAME_VERSION (PHI_RESULT (phi)));
> +	}
> +
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  if (is_gimple_debug (stmt))
> +	    continue;
> +
> +	  if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
> +					check_spectrev1_2nd_load,
> +					check_spectrev1_2nd_load))
> +	    warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
> +			stmt);
> +
> +	  bool value_from_indexed_load_p = false;
> +	  if (stmt_is_indexed_load (stmt))
> +	    {
> +	      /* We are interested in indexes to later loads so ultimatively
> +		 register values that all happen to separate SSA defs.
> +		 Interesting aggregates will be decomposed by later loads
> +		 which we then mark as producing an index.  Simply mark
> +		 all SSA defs as coming from an indexed load.  */
> +	      /* We are handling a single load in STMT right now.  */
> +	      ssa_op_iter it;
> +	      tree op;
> +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> +	        if (find_value_dependent_guard (stmt, op))
> +		  {
> +		    /* ???  Somehow record the dependence to point to it in
> +		       diagnostics.  */
> +		    value_from_indexed_load_p = true;
> +		    break;
> +		  }
> +	    }
> +
> +	  tree op;
> +	  ssa_op_iter it;
> +	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> +	    if (bitmap_bit_p (value_from_indexed_load,
> +			      SSA_NAME_VERSION (op))
> +		&& !stmt_mangles_index (stmt, op))
> +	      {
> +		value_from_indexed_load_p = true;
> +		break;
> +	      }
> +
> +	  if (value_from_indexed_load_p)
> +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
> +	      /* ???  We could cut off single-bit values from the chain
> +	         here or pretain that float loads will be never turned
> +		 into integer indices, etc.  */
> +	      bitmap_set_bit (value_from_indexed_load,
> +			      SSA_NAME_VERSION (op));
> +	}
> +
> +      if (EDGE_COUNT (bb->succs) > 1)
> +	{
> +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> +	  /* ???  What about switches?  What about badly speculated EH?  */
> +	  if (!stmt)
> +	    continue;
> +	  /* We could constrain conditions here to those more likely
> +	     being "bounds checks".  For example common guards for
> +	     indirect accesses are NULL pointer checks.
> +	     ???  This isn't fully safe, but it drops the number of
> +	     spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
> +	  if ((gimple_cond_code (stmt) == EQ_EXPR
> +	       || gimple_cond_code (stmt) == NE_EXPR)
> +	      && integer_zerop (gimple_cond_rhs (stmt))
> +	      && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
> +	    ;
> +	  else
> +	    {
> +	      ssa_op_iter it;
> +	      tree op;
> +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> +		mark_influencing_outgoing_flow (bb, op);
> +	    }
> +	}
> +    }
> +
> +  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
> +    bitmap_release (&influencing_outgoing_flow[i]);
> +  XDELETEVEC (influencing_outgoing_flow);
> +
> +
> +
> +  /* Instrumentation.  */
> +  if (!flag_spectrev1)
> +    return 0;
> +
> +  /* Create the default all-ones mask.  When doing IPA instrumentation
> +     this should initialize the mask from TLS memory and outgoing edges
> +     need to save the mask to TLS memory.  */
> +  gimple *new_stmt;
> +  if (!spectrev1_tls_mask_decl
> +      && flag_spectrev1 >= 3)
> +    {
> +      /* Use a smaller variable in case sign-extending loads are
> +	 available?  */
> +      spectrev1_tls_mask_decl
> +	  = build_decl (BUILTINS_LOCATION,
> +			VAR_DECL, NULL_TREE, ptr_type_node);
> +      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
> +      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
> +      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
> +      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
> +      DECL_INITIAL (spectrev1_tls_mask_decl)
> +	  = build_all_ones_cst (ptr_type_node);
> +      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
> +      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
> +      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
> +      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
> +      make_decl_one_only (spectrev1_tls_mask_decl,
> +			  DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
> +      set_decl_tls_model (spectrev1_tls_mask_decl,
> +			  decl_default_tls_model (spectrev1_tls_mask_decl));
> +    }
> +
> +  /* We let the SSA rewriter cope with rewriting mask into SSA and
> +     inserting PHI nodes.  */
> +  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
> +  new_stmt = gimple_build_assign (mask,
> +				  flag_spectrev1 >= 3
> +				  ? spectrev1_tls_mask_decl
> +				  : build_all_ones_cst (ptr_type_node));
> +  gimple_stmt_iterator gsi
> +      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
> +  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
> +
> +  /* We are using the visited flag to track stmts downstream in a BB.  */
> +  for (int i = 0; i < rpo_num; ++i)
> +    {
> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> +	   !gsi_end_p (gpi); gsi_next (&gpi))
> +	gimple_set_visited (gpi.phi (), false);
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> +	gimple_set_visited (gsi_stmt (gsi), false);
> +    }
> +
> +  for (int i = 0; i < rpo_num; ++i)
> +    {
> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> +
> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> +	   !gsi_end_p (gpi); gsi_next (&gpi))
> +	{
> +	  gphi *phi = gpi.phi ();
> +	  /* ???  We can merge SAFE state across BB boundaries in
> +	     some cases, like when edges are not critical and the
> +	     state was made SAFE in the tail of the predecessors
> +	     and not invalidated by calls.   */
> +	  gimple_set_plf (phi, SV1_SAFE, false);
> +	}
> +
> +      bool instrumented_call_p = false;
> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> +	{
> +	  gimple *stmt = gsi_stmt (gsi);
> +	  gimple_set_visited (stmt, true);
> +	  if (is_gimple_debug (stmt))
> +	    continue;
> +
> +	  tree op;
> +	  ssa_op_iter it;
> +	  bool safe = is_gimple_assign (stmt);
> +	  if (safe)
> +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> +	      {
> +		if (safe
> +		    && (SSA_NAME_IS_DEFAULT_DEF (op)
> +			|| !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
> +			/* Once mask can have changed we cannot further
> +			   propagate safe state.  */
> +			|| gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
> +			/* That includes calls if we have instrumented one
> +			   in this block.  */
> +			|| (instrumented_call_p
> +			    && call_between (SSA_NAME_DEF_STMT (op), stmt))))
> +		  {
> +		    safe = false;
> +		    break;
> +		  }
> +	      }
> +	  gimple_set_plf (stmt, SV1_SAFE, safe);
> +
> +	  /* Instrument bounded loads.
> +	     We instrument non-aggregate loads with non-invariant address.
> +	     The idea is to reliably instrument the bounded load while
> +	     leaving the canary, being it load or store, aggregate or
> +	     non-aggregate, alone.  */
> +	  if (gimple_assign_single_p (stmt)
> +	      && gimple_vuse (stmt)
> +	      && !gimple_vdef (stmt)
> +	      && !zero_ssa_operands (stmt, SSA_OP_USE))
> +	    {
> +	      tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
> +					     mask);
> +	      gimple_assign_set_rhs1 (stmt, new_mem);
> +	      update_stmt (stmt);
> +	      /* The value loaded my a masked load is "safe".  */
> +	      gimple_set_plf (stmt, SV1_SAFE, true);
> +	    }
> +
> +	  /* Instrument return store to TLS mask.  */
> +	  if (flag_spectrev1 >= 3
> +	      && gimple_code (stmt) == GIMPLE_RETURN)
> +	    {
> +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> +	    }
> +	  /* Instrument calls with store/load to/from TLS mask.
> +	     ???  Placement of the stores/loads can be optimized in a LCM
> +	     way.  */
> +	  else if (flag_spectrev1 >= 3
> +		   && is_gimple_call (stmt)
> +		   && gimple_vuse (stmt))
> +	    {
> +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> +	      if (!stmt_ends_bb_p (stmt))
> +		{
> +		  new_stmt = gimple_build_assign (mask,
> +						  spectrev1_tls_mask_decl);
> +		  gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> +		}
> +	      else
> +		{
> +		  edge_iterator ei;
> +		  edge e;
> +		  FOR_EACH_EDGE (e, ei, bb->succs)
> +		    {
> +		      if (e->flags & EDGE_ABNORMAL)
> +			continue;
> +		      new_stmt = gimple_build_assign (mask,
> +						      spectrev1_tls_mask_decl);
> +		      gsi_insert_on_edge (e, new_stmt);
> +		    }
> +		}
> +	      instrumented_call_p = true;
> +	    }
> +	}
> +
> +      if (EDGE_COUNT (bb->succs) > 1)
> +	{
> +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> +	  /* ???  What about switches?  What about badly speculated EH?  */
> +	  if (!stmt)
> +	    continue;
> +
> +	  /* Instrument conditional branches to track mis-speculation
> +	     via a pointer-sized mask.
> +	     ???  We could restrict to instrumenting those conditions
> +	     that control interesting loads or apply simple heuristics
> +	     like not instrumenting FP compares or equality compares
> +	     which are unlikely bounds checks.  But we have to instrument
> +	     bool != 0 because multiple conditions might have been
> +	     combined.  */
> +	  edge truee, falsee;
> +	  extract_true_false_edges_from_block (bb, &truee, &falsee);
> +	  /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
> +	  if (flag_spectrev1 >= 2
> +	      || !loop_exits_from_bb_p (bb->loop_father, bb))
> +	    {
> +	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
> +
> +	      /* Instrument
> +	           if (a_1 > b_2)
> +		 as
> +	           tem_mask_3 = a_1 > b_2 ? -1 : 0;
> +		   if (tem_mask_3 != 0)
> +		 this will result in a
> +		   xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
> +		 sequence which is faster in practice than when retaining
> +		 the original jump condition.  This is 10 bytes overhead
> +		 on x86_64 plus 3 bytes for an and on the true path and
> +		 5 bytes for an and and not on the false path.  */
> +	      tree tem_mask = make_ssa_name (ptr_type_node);
> +	      new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
> +					      build2 (gimple_cond_code (stmt),
> +						      boolean_type_node,
> +						      gimple_cond_lhs (stmt),
> +						      gimple_cond_rhs (stmt)),
> +					      build_all_ones_cst (ptr_type_node),
> +					      build_zero_cst (ptr_type_node));
> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> +	      gimple_cond_set_code (stmt, NE_EXPR);
> +	      gimple_cond_set_lhs (stmt, tem_mask);
> +	      gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
> +	      update_stmt (stmt);
> +
> +	      /* On the false edge
> +	           mask = mask & ~tem_mask_3;  */
> +	      gimple_seq tems = NULL;
> +	      tree tem_mask2 = make_ssa_name (ptr_type_node);
> +	      new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
> +					      tem_mask);
> +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
> +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> +					      mask, tem_mask2);
> +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
> +	      gsi_insert_seq_on_edge (falsee, tems);
> +
> +	      /* On the true edge
> +	           mask = mask & tem_mask_3;  */
> +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> +					      mask, tem_mask);
> +	      gsi_insert_on_edge (truee, new_stmt);
> +	    }
> +	}
> +    }
> +
> +  gsi_commit_edge_inserts ();
> +
> +  return 0;
> +}
> +
> +} // anon namespace
> +
> +gimple_opt_pass *
> +make_pass_spectrev1 (gcc::context *ctxt)
> +{
> +  return new pass_spectrev1 (ctxt);
> +}
> diff --git a/gcc/params.def b/gcc/params.def
> index 6f98fccd291..19f7dbf4dad 100644
> --- a/gcc/params.def
> +++ b/gcc/params.def
> @@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS,
>  	 " loops.",
>  	 100, 0, 0)
>  
> +DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES,
> +	 "spectre-v1-max-instrument-indices",
> +	 "Maximum number of indices to instrument before instrumenting the whole address.",
> +	 1, 0, 0)
> +
>  /*
>  
>  Local variables:
> diff --git a/gcc/passes.def b/gcc/passes.def
> index 144df4fa417..2fe0cdcfa7e 100644
> --- a/gcc/passes.def
> +++ b/gcc/passes.def
> @@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
>    NEXT_PASS (pass_lower_resx);
>    NEXT_PASS (pass_nrv);
>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> +  NEXT_PASS (pass_spectrev1);
>    NEXT_PASS (pass_warn_function_noreturn);
>    NEXT_PASS (pass_gen_hsail);
>  
> diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> new file mode 100644
> index 00000000000..3ac647e72fd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-Wspectre-v1" } */
> +
> +unsigned char a[1024];
> +int b[256];
> +int foo (int i, int bound)
> +{
> +  if (i < bound)
> +    return b[a[i]];  /* { dg-warning "spectrev1" } */
> +}
> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> index 9f9d85fdbc3..f5c164f465f 100644
> --- a/gcc/tree-pass.h
> +++ b/gcc/tree-pass.h
> @@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
>  extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
> +extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
>  
>  /* Current optimization pass.  */
>  extern opt_pass *current_pass;
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-18 16:17 ` Jeff Law
@ 2018-12-19 11:16   ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2018-12-19 11:16 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc

On Tue, 18 Dec 2018, Jeff Law wrote:

> On 12/18/18 8:36 AM, Richard Biener wrote:
> > 
> > Hi,
> > 
> > in the past weeks I've been looking into prototyping both spectre V1 
> > (speculative array bound bypass) diagnostics and mitigation in an
> > architecture independent manner to assess feasability and some kind
> > of upper bound on the performance impact one can expect.
> > https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> > an interesting read in this context as well.
> > 
> > For simplicity I have implemented mitigation on GIMPLE right before
> > RTL expansion and have chosen TLS to do mitigation across function
> > boundaries.  Diagnostics sit in the same place but both are not in
> > any way dependent on each other.
> > 
> > The mitigation strategy chosen is that of tracking speculation
> > state via a mask that can be used to zero parts of the addresses
> > that leak the actual data.  That's similar to what aarch64 does
> > with -mtrack-speculation (but oddly there's no mitigation there).
> > 
> > I've optimized things to the point that is reasonable when working
> > target independent on GIMPLE but I've only looked at x86 assembly
> > and performance.  I expect any "final" mitigation if we choose to
> > implement and integrate such would be after RTL expansion since
> > RTL expansion can end up introducing quite some control flow whose
> > speculation state is not properly tracked by the prototype.
> > 
> > I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> > were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> > mitigation and =3 does mitigation global with passing the state
> > via TLS memory.
> > 
> > The following was measured on a Haswell desktop CPU:
> [ ... ]
> Interesting.  So we'd been kicking this issue around a bit internally.
> 
> The number of packages where we'd want to turn this on was very small
> and thus it was difficult to justify burning resources in this space.
> LLVM might be an option for those limited packages, but LLVM is missing
> other security things we don't want to lose (such as stack clash
> mitigation).
> 
> In the end we punted for the immediate future.  We'll almost certainly
> revisit at some point and your prototype would obviously factor into the
> calculus around future decisions.
> 
> [ ... ]
> 
> 
> > 
> > 
> > The patch relies heavily on RTL optimizations for DCE purposes.  At the
> > same time we rely on RTL not statically computing the mask (RTL has no
> > conditional constant propagation).  Full instrumentation of the classic
> > Spectre V1 testcase
> Right. But it does do constant propagation into arms of conditionals as
> well as jump threading.  I'd fear they might compromise things.

jump threading shouldn't be an issue since that elides the conditional.
I didn't see constant propagation into arms of conditionals happening.
We don't do that on GIMPLE either ;)  I guess I have avoided this
by making the condition data dependent on the mask.  That is, I
transform

  if (a > b)

to

  mask = a > b ? -1 : 0;
  if (mask)
    ...

so one need to replace the condition with the mask computation
conditional.

But yes, for a "final" solution that also gives more control to
targets I thought of allowing (with fallback doing sth like above)
the targets to supply a set-mask-and-jump pattern combining
conditional, mask generation and jump.  I guess those would look
similar to the -fwrapv plusv patterns we have in i386.md.

> Obviously we'd need to look further into those issues.  But even if they
> do, something like what you've done may mitigate enough vulnerable
> sequences that it's worth doing, even if there's some gaps due to "over"
> optimization in the RTL space.

Yeah.  Note I was just lazy and thus didn't elide useless loads/stores
of the TLS var for adjacent calls or avoided instrumenting cases
where there will be no uses of the mask, etc.  With some simple
(even non-LCM) insertion optimization the dependence on dce/dse
can be avoided.

Richard.

> [  ... ]
> 
> > 
> > so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> > 
> > Patch below for reference (and your own testing in case you are curious).
> > I do not plan to pursue this further at this point.
> Understood.  Thanks for posting it.  We're not currently working in this
> space, but again, we may re-evaluate that stance in the future.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-18 16:48 ` Richard Earnshaw (lists)
@ 2018-12-19 11:25   ` Richard Biener
  2018-12-19 11:34     ` Florian Weimer
  2018-12-19 15:42     ` Richard Earnshaw (lists)
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Biener @ 2018-12-19 11:25 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc

On Tue, 18 Dec 2018, Richard Earnshaw (lists) wrote:

> On 18/12/2018 15:36, Richard Biener wrote:
> > 
> > Hi,
> > 
> > in the past weeks I've been looking into prototyping both spectre V1 
> > (speculative array bound bypass) diagnostics and mitigation in an
> > architecture independent manner to assess feasability and some kind
> > of upper bound on the performance impact one can expect.
> > https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> > an interesting read in this context as well.
> 
> Interesting, thanks for posting this.
> 
> > 
> > For simplicity I have implemented mitigation on GIMPLE right before
> > RTL expansion and have chosen TLS to do mitigation across function
> > boundaries.  Diagnostics sit in the same place but both are not in
> > any way dependent on each other.
> 
> We considered using TLS for propagating the state across call-boundaries
> on AArch64, but rejected it for several reasons.
> 
> - It's quite expensive to have to set up the TLS state in every function;
> - It requires some global code to initialize the state variable - that's
> kind of ABI;

The cost is probably target dependent - on x86 it's simply a $fs based
load/store.  For initialization a static initializer seemed to work
for me (but honestly I didn't do any testing besides running the
testsuite for correctness - so at least the mask wasn't zero initialized).
Note the LLVM people use an inverted mask and cancel values by
OR-ing -1 instead of AND-ing 0.  At least default zero-initialization
should be possible with TLS vars.

That said, my choice of TLS was to make this trivially work across
targets - if a target can do better then it should.  And of course
the target may not have any TLS support besides emultls which would
be prohibitly expensive.

> - It also seems likely to be vulnerable to Spectre variant 4 - unless
> the CPU can always correctly store-to-load forward the speculation
> state, then you have the situation where the load may see an old value
> of the state - and that's almost certain to say "we're not speculating".
> 
> The last one is really the killer here.

Hmm, as far as I understood v4 only happens when store-forwarding
doesn't work.  And I hope it doesn't fail "randomly" but works
reliable when all accesses to the memory are aligned and have
the same size as is the case with these compiler-generated TLS
accesses.  But yes, if that's not guaranteed then using memory
doesn't work at all.  Not sure what else target independent there
is though that doesn't break the ABI like simply adding another
parameter.  And even adding a parameter might not work in case
there's only stack passing and V4 happens on the stack accesses...

> > 
> > The mitigation strategy chosen is that of tracking speculation
> > state via a mask that can be used to zero parts of the addresses
> > that leak the actual data.  That's similar to what aarch64 does
> > with -mtrack-speculation (but oddly there's no mitigation there).
> 
> We rely on the user inserting the new builtin, which we can more
> effectively optimize if the compiler is generating speculation state
> tracking data.  That doesn't preclude a full solution at a later date,
> but it looked like it was likely overkill for protecting every load and
> safely pruning the loads is not an easy problem to solve.  Of course,
> the builtin does require the programmer to do some work to identify
> which memory accesses might be vulnerable.

My main question was how in earth the -mtrack-speculation overhead
is reasonable for the very few expected explicit builtin uses...

Richard.

> R.
> 
> 
> > 
> > I've optimized things to the point that is reasonable when working
> > target independent on GIMPLE but I've only looked at x86 assembly
> > and performance.  I expect any "final" mitigation if we choose to
> > implement and integrate such would be after RTL expansion since
> > RTL expansion can end up introducing quite some control flow whose
> > speculation state is not properly tracked by the prototype.
> > 
> > I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> > were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> > mitigation and =3 does mitigation global with passing the state
> > via TLS memory.
> > 
> > The following was measured on a Haswell desktop CPU:
> > 
> > 	-O2 vs. -O2 -fspectre-v1=2
> > 
> >                                   Estimated                       Estimated
> >                 Base     Base       Base        Peak     Peak       Peak
> > Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> > -------------- ------  ---------  ---------    ------  ---------  ---------
> > 400.perlbench    9770        245       39.8 *    9770        452       21.6 *  184%
> > 401.bzip2        9650        378       25.5 *    9650        726       13.3 *  192%
> > 403.gcc          8050        236       34.2 *    8050        352       22.8 *  149%
> > 429.mcf          9120        223       40.9 *    9120        656       13.9 *  294%
> > 445.gobmk       10490        400       26.2 *   10490        666       15.8 *  167%
> > 456.hmmer        9330        388       24.1 *    9330        536       17.4 *  138%
> > 458.sjeng       12100        437       27.7 *   12100        661       18.3 *  151%
> > 462.libquantum  20720        300       69.1 *   20720        384       53.9 *  128%
> > 464.h264ref     22130        451       49.1 *   22130        586       37.8 *  130%
> > 471.omnetpp      6250        291       21.5 *    6250        398       15.7 *  137%
> > 473.astar        7020        334       21.0 *    7020        522       13.5 *  156%
> > 483.xalancbmk    6900        182       37.9 *    6900        306       22.6 *  168%
> >  Est. SPECint_base2006                   --
> >  Est. SPECint2006                                                        --
> > 
> >    -O2 -fspectre-v1=3
> > 
> >                                   Estimated                       Estimated
> >                 Base     Base       Base        Peak     Peak       Peak
> > Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> > -------------- ------  ---------  ---------    ------  ---------  ---------
> > 400.perlbench                                    9770        497       19.6 *  203%
> > 401.bzip2                                        9650        772       12.5 *  204%
> > 403.gcc                                          8050        427       18.9 *  181%
> > 429.mcf                                          9120        696       13.1 *  312%
> > 445.gobmk                                       10490        726       14.4 *  181%
> > 456.hmmer                                        9330        537       17.4 *  138%
> > 458.sjeng                                       12100        721       16.8 *  165%
> > 462.libquantum                                  20720        446       46.4 *  149%
> > 464.h264ref                                     22130        613       36.1 *  136%
> > 471.omnetpp                                      6250        471       13.3 *  162%
> > 473.astar                                        7020        579       12.1 *  173%
> > 483.xalancbmk                                    6900        350       19.7 *  192%
> >  Est. SPECint(R)_base2006           Not Run
> >  Est. SPECint2006                                                        --
> > 
> > 
> > While the following was measured on a Zen Epyc server:
> > 
> > -O2 vs -O2 -fspectre-v1=2
> > 
> >                        Estimated                       Estimated
> >                  Base     Base        Base        Peak     Peak        Peak
> > Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
> > --------------- -------  ---------  ---------    -------  ---------  ---------
> > 500.perlbench_r       1        499       3.19  *       1        621       2.56  * 124%
> > 502.gcc_r             1        286       4.95  *       1        392       3.61  * 137%
> > 505.mcf_r             1        331       4.88  *       1        456       3.55  * 138%
> > 520.omnetpp_r         1        454       2.89  *       1        563       2.33  * 124%
> > 523.xalancbmk_r       1        328       3.22  *       1        569       1.86  * 173%
> > 525.x264_r            1        518       3.38  *       1        776       2.26  * 150%
> > 531.deepsjeng_r       1        365       3.14  *       1        448       2.56  * 123%
> > 541.leela_r           1        598       2.77  *       1        729       2.27  * 122%
> > 548.exchange2_r       1        460       5.69  *       1        756       3.46  * 164%
> > 557.xz_r              1        403       2.68  *       1        586       1.84  * 145%
> >  Est. SPECrate2017_int_base              3.55
> >  Est. SPECrate2017_int_peak                                               2.56    72%
> > 
> > -O2 -fspectre-v2=3
> > 
> >                        Estimated                       Estimated
> >                  Base     Base        Base        Peak     Peak        Peak
> > Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
> > --------------- -------  ---------  ---------    -------  ---------  ---------
> > 500.perlbench_r                               NR       1        700       2.27  * 140%
> > 502.gcc_r                                     NR       1        485       2.92  * 170%
> > 505.mcf_r                                     NR       1        596       2.71  * 180%
> > 520.omnetpp_r                                 NR       1        604       2.17  * 133%
> > 523.xalancbmk_r                               NR       1        643       1.64  * 196%
> > 525.x264_r                                    NR       1        797       2.20  * 154%
> > 531.deepsjeng_r                               NR       1        542       2.12  * 149%
> > 541.leela_r                                   NR       1        872       1.90  * 146%
> > 548.exchange2_r                               NR       1        761       3.44  * 165%
> > 557.xz_r                                      NR       1        595       1.81  * 148%
> >  Est. SPECrate2017_int_base           Not Run
> >  Est. SPECrate2017_int_peak                                               2.26    64%
> > 
> > 
> > 
> > you can see, even thoug we're comparing apples and oranges, that the 
> > performance impact is quite dependent on the microarchitecture.
> > 
> > Similarly interesting as performance is the effect on text size which is
> > surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
> > bytes per instrumented memory).
> > 
> > CPU2016:
> >    BASE  -O2
> >    text	   data	    bss	    dec	    hex	filename
> > 1117726	  20928	  12704	1151358	 11917e	400.perlbench
> >   56568	   3800	   4416	  64784	   fd10	401.bzip2
> > 3419568	   7912	 751520	4179000	 3fc438	403.gcc
> >   12212	    712	  11984	  24908	   614c	429.mcf
> > 1460694	2081772	2330096	5872562	 599bb2	445.gobmk
> >  284929	   5956	  82040	 372925	  5b0bd	456.hmmer
> >  130782	   2152	2576896	2709830	 295946	458.sjeng
> >   41915	    764	     96	  42775	   a717	462.libquantum
> >  505452	  11220	 372320	 888992	  d90a0	464.h264ref
> >  638188	   9584	  14664	 662436	  a1ba4	471.omnetpp
> >   38859	    900	   5216	  44975	   afaf	473.astar
> > 4033878	 140248	  12168	4186294	 3fe0b6	483.xalancbmk
> >    PEAK -O2 -fspectre-v1=2
> >    text	   data	    bss	    dec	    hex	filename
> > 1508032	  20928	  12704	1541664	 178620	400.perlbench	135%
> >   76098	   3800	   4416	  84314	  1495a	401.bzip2	135%
> > 4483530	   7912	 751520	5242962	 500052	403.gcc		131%
> >   16006	    712	  11984	  28702	   701e	429.mcf		131%
> > 1647384	2081772	2330096	6059252	 5c74f4	445.gobmk	112%
> >  377259	   5956	  82040	 465255	  71967	456.hmmer	132%
> >  164672	   2152	2576896	2743720	 29dda8	458.sjeng	126%
> >   47901	    764	     96	  48761	   be79	462.libquantum	114%
> >  649854	  11220	 372320	1033394	  fc4b2	464.h264ref	129%
> >  706908	   9584	  14664	 731156	  b2814	471.omnetpp	111%
> >   48493	    900	   5216	  54609	   d551	473.astar	125%
> > 4862056	 140248	  12168	5014472	 4c83c8	483.xalancbmk	121%
> >    PEAK -O2 -fspectre-v1=3
> >    text	   data	    bss	    dec	    hex	filename
> > 1742008	  20936	  12704	1775648	 1b1820	400.perlbench	156%
> >   83338	   3808	   4416	  91562	  165aa	401.bzip2	147%
> > 5219850	   7920	 751520	5979290	 5b3c9a	403.gcc		153%
> >   17422	    720	  11984	  30126	   75ae	429.mcf		143%
> > 1801688	2081780	2330096	6213564	 5ecfbc	445.gobmk	123%
> >  431827	   5964	  82040	 519831	  7ee97	456.hmmer	152%
> >  182200	   2160	2576896	2761256	 2a2228	458.sjeng	139%
> >   53773	    772	     96	  54641	   d571	462.libquantum	128%
> >  691798	  11228	 372320	1075346	 106892	464.h264ref	137%
> >  976692	   9592	  14664	1000948	  f45f4	471.omnetpp	153%
> >   54525	    908	   5216	  60649	   ece9	473.astar	140%
> > 5808306	 140256	  12168	5960730	 5af41a	483.xalancbmk	144%
> > 
> > CPU2017:
> >    BASE -O2 -g
> >    text    data     bss     dec     hex filename
> > 2209713    8576    9080 2227369  21fca9 500.perlbench_r
> > 9295702   37432 1150664 10483798 9ff856 502.gcc_r
> >   21795     712     744   23251    5ad3 505.mcf_r
> > 2067560    8984   46888 2123432  2066a8 520.omnetpp_r
> > 5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
> >  508402    6102   29592  544096   84d60 525.x264_r
> >   84222     784 12138360 12223366 ba8386 531.deepsjeng_r
> >  223480    8544   30072  262096   3ffd0 541.leela_r
> >   70554     864    6384   77802   12fea 548.exchange2_r
> >  180640     884   17704  199228   30a3c 557.xz_r
> >    PEAK -fspectre-v2=2
> >    text    data     bss     dec     hex filename
> > 2991161    8576    9080 3008817  2de931 500.perlbench_r	135%
> > 12244886  37432 1150664 13432982 ccf896 502.gcc_r	132%
> >   28475     712     744   29931    74eb 505.mcf_r	131%
> > 2397026    8984   46888 2452898  256da2 520.omnetpp_r	116%
> > 6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r	119%
> >  645730    6102   29592  681424   a65d0 525.x264_r	127%
> >  111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
> >  260835    8544   30072  299451   491bb 541.leela_r     117%
> >   96874     864    6384  104122   196ba 548.exchange2_r	137%
> >  215288     884   17704  233876   39194 557.xz_r	119%
> >    PEAK -fspectre-v2=3
> >    text    data     bss     dec     hex filename
> > 3365945    8584    9080 3383609  33a139 500.perlbench_r	152%
> > 14790638  37440 1150664 15978742 f3d0f6 502.gcc_r	159%
> >   31419     720     744   32883    8073 505.mcf_r	144%
> > 2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r	139%
> > 8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r	142%
> >  697434    6110   29592  733136   b2fd0 525.x264_r	137%
> >  123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
> >  315347    8552   30072  353971   566b3 541.leela_r	141%
> >   98578     872    6384  105834   19d6a 548.exchange2_r	140%
> >  239144     892   17704  257740   3eecc 557.xz_r	133%
> > 
> > 
> > The patch relies heavily on RTL optimizations for DCE purposes.  At the
> > same time we rely on RTL not statically computing the mask (RTL has no
> > conditional constant propagation).  Full instrumentation of the classic
> > Spectre V1 testcase
> > 
> > char a[1024];
> > int b[1024];
> > int foo (int i, int bound)
> > {
> >   if (i < bound)
> >     return b[a[i]];
> > }
> > 
> > is the following:
> > 
> > foo:
> > .LFB0:  
> >         .cfi_startproc
> >         xorl    %eax, %eax
> >         cmpl    %esi, %edi
> >         setge   %al
> >         subq    $1, %rax
> >         jne     .L4
> >         ret
> >         .p2align 4,,10
> >         .p2align 3
> > .L4:
> >         andl    %eax, %edi
> >         movslq  %edi, %rdi
> >         movsbq  a(%rdi), %rax
> >         movl    b(,%rax,4), %eax
> >         ret
> > 
> > so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> > 
> > Patch below for reference (and your own testing in case you are curious).
> > I do not plan to pursue this further at this point.
> > 
> > Richard.
> > 
> > From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
> > From: Richard Guenther <rguenther@suse.de>
> > Date: Wed, 5 Dec 2018 13:17:02 +0100
> > Subject: [PATCH] warn-spectrev1
> > 
> > 
> > diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> > index 7960cace16a..64d472d7fa0 100644
> > --- a/gcc/Makefile.in
> > +++ b/gcc/Makefile.in
> > @@ -1334,6 +1334,7 @@ OBJS = \
> >  	gimple-ssa-sprintf.o \
> >  	gimple-ssa-warn-alloca.o \
> >  	gimple-ssa-warn-restrict.o \
> > +	gimple-ssa-spectrev1.o \
> >  	gimple-streamer-in.o \
> >  	gimple-streamer-out.o \
> >  	gimple-walk.o \
> > diff --git a/gcc/common.opt b/gcc/common.opt
> > index 45d7f6189e5..1ae7fcfe177 100644
> > --- a/gcc/common.opt
> > +++ b/gcc/common.opt
> > @@ -702,6 +702,10 @@ Warn when one local variable shadows another local variable or parameter of comp
> >  Wshadow-compatible-local
> >  Common Warning Undocumented Alias(Wshadow=compatible-local)
> >  
> > +Wspectre-v1
> > +Common Var(warn_spectrev1) Warning
> > +Warn about code susceptible to spectre v1 style attacks.
> > +
> >  Wstack-protector
> >  Common Var(warn_stack_protect) Warning
> >  Warn when not issuing stack smashing protection for some reason.
> > @@ -2406,6 +2410,14 @@ fsingle-precision-constant
> >  Common Report Var(flag_single_precision_constant) Optimization
> >  Convert floating point constants to single precision constants.
> >  
> > +fspectre-v1
> > +Common Alias(fspectre-v1=, 2, 0)
> > +Insert code to mitigate spectre v1 style attacks.
> > +
> > +fspectre-v1=
> > +Common Report RejectNegative Joined UInteger IntegerRange(0, 3) Var(flag_spectrev1) Optimization
> > +Insert code to mitigate spectre v1 style attacks.
> > +
> >  fsplit-ivs-in-unroller
> >  Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
> >  Split lifetimes of induction variables when loops are unrolled.
> > diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
> > new file mode 100644
> > index 00000000000..c2a5dc95324
> > --- /dev/null
> > +++ b/gcc/gimple-ssa-spectrev1.cc
> > @@ -0,0 +1,824 @@
> > +/* Loop interchange.
> > +   Copyright (C) 2017-2018 Free Software Foundation, Inc.
> > +   Contributed by ARM Ltd.
> > +
> > +This file is part of GCC.
> > +
> > +GCC is free software; you can redistribute it and/or modify it
> > +under the terms of the GNU General Public License as published by the
> > +Free Software Foundation; either version 3, or (at your option) any
> > +later version.
> > +
> > +GCC is distributed in the hope that it will be useful, but WITHOUT
> > +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> > +for more details.
> > +
> > +You should have received a copy of the GNU General Public License
> > +along with GCC; see the file COPYING3.  If not see
> > +<http://www.gnu.org/licenses/>.  */
> > +
> > +#include "config.h"
> > +#include "system.h"
> > +#include "coretypes.h"
> > +#include "backend.h"
> > +#include "is-a.h"
> > +#include "tree.h"
> > +#include "gimple.h"
> > +#include "tree-pass.h"
> > +#include "ssa.h"
> > +#include "gimple-pretty-print.h"
> > +#include "gimple-iterator.h"
> > +#include "params.h"
> > +#include "tree-ssa.h"
> > +#include "cfganal.h"
> > +#include "gimple-walk.h"
> > +#include "tree-ssa-loop.h"
> > +#include "tree-dfa.h"
> > +#include "tree-cfg.h"
> > +#include "fold-const.h"
> > +#include "builtins.h"
> > +#include "alias.h"
> > +#include "cfgloop.h"
> > +#include "varasm.h"
> > +#include "cgraph.h"
> > +#include "gimple-fold.h"
> > +#include "diagnostic.h"
> > +
> > +/* The Spectre V1 situation is as follows:
> > +
> > +      if (attacker_controlled_idx < bound)  // speculated as true but is false
> > +        {
> > +	  // out-of-bound access, returns value interesting to attacker
> > +	  val = mem[attacker_controlled_idx];
> > +	  // access that causes a cache-line to be brought in - canary
> > +	  ... = attacker_controlled_mem[val];
> > +	}
> > +
> > +   The last load provides the side-channel.  The pattern can be split
> > +   into multiple functions or translation units.  Conservatively we'd
> > +   have to warn about
> > +
> > +      int foo (int *a) {  return *a; }
> > +
> > +   thus any indirect (or indexed) memory access.  That's obvioulsy
> > +   not useful.
> > +
> > +   The next level would be to warn only when we see load of val as
> > +   well.  That then misses cases like
> > +
> > +      int foo (int *a, int *b)
> > +      {
> > +        int idx = load_it (a);
> > +	return load_it (&b[idx]);
> > +      }
> > +
> > +   Still we'd warn about cases like
> > +
> > +      struct Foo { int *a; };
> > +      int foo (struct Foo *a) { return *a->a; }
> > +
> > +   though dereferencing VAL isn't really an interesting case.  It's
> > +   hard to exclude this conservatively so the obvious solution is
> > +   to restrict the kind of loads that produce val, for example based
> > +   on its type or its number of bits.  It's tempting to do this at
> > +   the point of the load producing val but in the end what matters
> > +   is the number of bits that reach the second loads [as index] given
> > +   there are practical limits on the size of the canary.  For this
> > +   we have to consider
> > +
> > +      int foo (struct Foo *a, int *b)
> > +      {
> > +        int *c = a->a;
> > +	int idx = *b;
> > +	return *(c + idx);
> > +      }
> > +
> > +   where idx has too many bits to be an interesting attack vector(?).
> > + */
> > +
> > +/* The pass does two things, first it performs data flow analysis
> > +   to be able to warn about the second load.  This is controlled
> > +   via -Wspectre-v1.
> > +
> > +   Second it instruments control flow in the program to track a
> > +   mask which is all-ones but all-zeroes if the CPU speculated
> > +   a branch in the wrong direction.  This mask is then used to
> > +   mask the address[-part(s)] of loads with non-invariant addresses,
> > +   effectively mitigating the attack.  This is controlled by
> > +   -fpectre-v1[=N] where N is default 2 and
> > +     1  optimistically omit some instrumentations (currently
> > +        backedge control flow instructions do not update the
> > +	speculation mask)
> > +     2  instrument conservatively using a function-local speculation
> > +        mask
> > +     3  instrument conservatively using a global (TLS) speculation
> > +        mask.  This adds TLS loads/stores of the speculation mask
> > +	at function boundaries and before and after calls.
> > + */
> > +
> > +/* We annotate statements whose defs cannot be used to leaking data
> > +   speculatively via loads with SV1_SAFE.  This is used to optimize
> > +   masking of indices where masked indices (and derived by constant
> > +   ones) are not masked again.  Note this works only up to the points
> > +   that possibly change the speculation mask value.  */
> > +#define SV1_SAFE GF_PLF_1
> > +
> > +namespace {
> > +
> > +const pass_data pass_data_spectrev1 =
> > +{
> > +  GIMPLE_PASS, /* type */
> > +  "spectrev1", /* name */
> > +  OPTGROUP_NONE, /* optinfo_flags */
> > +  TV_NONE, /* tv_id */
> > +  PROP_cfg|PROP_ssa, /* properties_required */
> > +  0, /* properties_provided */
> > +  0, /* properties_destroyed */
> > +  0, /* todo_flags_start */
> > +  TODO_update_ssa, /* todo_flags_finish */
> > +};
> > +
> > +class pass_spectrev1 : public gimple_opt_pass
> > +{
> > +public:
> > +  pass_spectrev1 (gcc::context *ctxt)
> > +    : gimple_opt_pass (pass_data_spectrev1, ctxt)
> > +  {}
> > +
> > +  /* opt_pass methods: */
> > +  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
> > +  virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; }
> > +  virtual unsigned int execute (function *);
> > +
> > +  static bool stmt_is_indexed_load (gimple *);
> > +  static bool stmt_mangles_index (gimple *, tree);
> > +  static bool find_value_dependent_guard (gimple *, tree);
> > +  static void mark_influencing_outgoing_flow (basic_block, tree);
> > +  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
> > +}; // class pass_spectrev1
> > +
> > +bitmap_head *influencing_outgoing_flow;
> > +
> > +static bool
> > +call_between (gimple *first, gimple *second)
> > +{
> > +  gcc_assert (gimple_bb (first) == gimple_bb (second));
> > +  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
> > +     unique IDs to stmts belonging to groups with the same speculation
> > +     mask state.  */
> > +  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
> > +       gsi_stmt (gsi) != second; gsi_next (&gsi))
> > +    if (is_gimple_call (gsi_stmt (gsi)))
> > +      return true;
> > +  return false;
> > +}
> > +
> > +basic_block ctx_bb;
> > +gimple *ctx_stmt;
> > +static bool
> > +gather_indexes (tree, tree *idx, void *data)
> > +{
> > +  vec<tree *> *indexes = (vec<tree *> *)data;
> > +  if (TREE_CODE (*idx) != SSA_NAME)
> > +    return true;
> > +  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
> > +      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
> > +      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
> > +      && (flag_spectrev1 < 3
> > +	  || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
> > +    return true;
> > +  if (indexes->is_empty ())
> > +    indexes->safe_push (idx);
> > +  else if (*(*indexes)[0] == *idx)
> > +    indexes->safe_push (idx);
> > +  else
> > +    return false;
> > +  return true;
> > +}
> > +
> > +tree
> > +pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask)
> > +{
> > +  /* First try to see if we can find a single index we can zero which
> > +     has the chance of repeating in other loads and also avoids separate
> > +     LEA and memory references decreasing code size and AGU occupancy.  */
> > +  auto_vec<tree *, 8> indexes;
> > +  ctx_bb = gsi_bb (*gsi);
> > +  ctx_stmt = gsi_stmt (*gsi);
> > +  if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0
> > +      && for_each_index (&mem, gather_indexes, (void *)&indexes))
> > +    {
> > +      /* All indices are safe.  */
> > +      if (indexes.is_empty ())
> > +	return mem;
> > +      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
> > +	  <= TYPE_PRECISION (TREE_TYPE (mask)))
> > +	{
> > +	  tree idx = *indexes[0];
> > +	  gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
> > +		      || POINTER_TYPE_P (TREE_TYPE (idx)));
> > +	  /* Instead of instrumenting IDX directly we could look at
> > +	     definitions with a single SSA use and instrument that
> > +	     instead.  But we have to do some work to make SV1_SAFE
> > +	     propagation updated then - this would really ask to first
> > +	     gather all indexes of all refs we want to instrument and
> > +	     compute some optimal set of instrumentations.  */
> > +	  gimple_seq seq = NULL;
> > +	  tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
> > +	  tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
> > +					  TREE_TYPE (idx), idx, idx_mask);
> > +	  /* Mark the instrumentation sequence as visited.  */
> > +	  for (gimple_stmt_iterator si = gsi_start (seq);
> > +	       !gsi_end_p (si); gsi_next (&si))
> > +	    gimple_set_visited (gsi_stmt (si), true);
> > +	  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> > +	  gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
> > +	  /* Replace downstream users in the BB which reduces register pressure
> > +	     and allows SV1_SAFE propagation to work (which stops at call/BB
> > +	     boundaries though).
> > +	     ???  This is really reg-pressure vs. dependence chains so not
> > +	     a generally easy thing.  Making the following propagate into
> > +	     all uses dominated by the insert slows down 429.mcf even more.
> > +	     ???  We can actually track SV1_SAFE across PHIs but then we
> > +	     have to propagate into PHIs here.  */
> > +	  gimple *use_stmt;
> > +	  use_operand_p use_p;
> > +	  imm_use_iterator iter;
> > +	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
> > +	    if (gimple_bb (use_stmt) == gsi_bb (*gsi)
> > +		&& gimple_code (use_stmt) != GIMPLE_PHI
> > +		&& !gimple_visited_p (use_stmt))
> > +	      {
> > +		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> > +		  SET_USE (use_p, masked_idx);
> > +		update_stmt (use_stmt);
> > +	      }
> > +	  /* Modify MEM in place...  (our stmt is already marked visited).  */
> > +	  for (unsigned i = 0; i < indexes.length (); ++i)
> > +	    *indexes[i] = masked_idx;
> > +	  return mem;
> > +	}
> > +    }
> > +
> > +  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
> > +     handle BIT_FIELD_REFs.  */
> > +
> > +  /* Strip a bitfield reference to re-apply it at the end.  */
> > +  tree bitfield = NULL_TREE;
> > +  tree bitfield_off = NULL_TREE;
> > +  if (TREE_CODE (mem) == COMPONENT_REF
> > +      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
> > +    {
> > +      bitfield = TREE_OPERAND (mem, 1);
> > +      bitfield_off = TREE_OPERAND (mem, 2);
> > +      mem = TREE_OPERAND (mem, 0);
> > +    }
> > +
> > +  tree ptr_base = mem;
> > +  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
> > +     into the MEM_REF we create.  */
> > +  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
> > +    ptr_base = TREE_OPERAND (ptr_base, 0);
> > +
> > +  tree ptr = make_ssa_name (ptr_type_node);
> > +  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr (ptr_base));
> > +  gimple_set_visited (new_stmt, true);
> > +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> > +  ptr = make_ssa_name (ptr_type_node);
> > +  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
> > +				  gimple_assign_lhs (new_stmt), mask);
> > +  gimple_set_visited (new_stmt, true);
> > +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> > +  tree type = TREE_TYPE (mem);
> > +  unsigned align = get_object_alignment (mem);
> > +  if (align != TYPE_ALIGN (type))
> > +    type = build_aligned_type (type, align);
> > +
> > +  tree new_mem = build2 (MEM_REF, type, ptr,
> > +			 build_int_cst (reference_alias_ptr_type (mem), 0));
> > +  if (bitfield)
> > +    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
> > +		      bitfield, bitfield_off);
> > +  return new_mem;
> > +}
> > +
> > +bool
> > +check_spectrev1_2nd_load (tree, tree *idx, void *data)
> > +{
> > +  sbitmap value_from_indexed_load = (sbitmap)data;
> > +  if (TREE_CODE (*idx) == SSA_NAME
> > +      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
> > +    return false;
> > +  return true;
> > +}
> > +
> > +bool
> > +check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
> > +{
> > +  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
> > +}
> > +
> > +void
> > +pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
> > +{
> > +  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> > +		       bb->index))
> > +    return;
> > +
> > +  /* Note we are deliberately non-conservatively stop at call and
> > +     memory boundaries here expecting earlier optimization to expose
> > +     value dependences via SSA chains.  */
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> > +  if (gimple_vuse (def_stmt)
> > +      || !is_gimple_assign (def_stmt))
> > +    return;
> > +
> > +  ssa_op_iter i;
> > +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
> > +    mark_influencing_outgoing_flow (bb, op);
> > +}
> > +
> > +bool
> > +pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
> > +{
> > +  bitmap_iterator bi;
> > +  unsigned i;
> > +  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> > +			    0, i, bi)
> > +    /* ???  If control-dependent on.
> > +       ???  Make bits in influencing_outgoing_flow the index of the BB
> > +       in RPO order so we could walk bits from STMT "upwards" finding
> > +       the nearest one.  */
> > +    if (dominated_by_p (CDI_DOMINATORS,
> > +			gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
> > +      {
> > +	if (dump_enabled_p ())
> > +	  dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
> > +			   "is related to indexes used in %G\n",
> > +			   last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
> > +			   i, stmt);
> > +	return true;
> > +      }
> > +
> > +  /* Note we are deliberately non-conservatively stop at call and
> > +     memory boundaries here expecting earlier optimization to expose
> > +     value dependences via SSA chains.  */
> > +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> > +  if (gimple_vuse (def_stmt)
> > +      || !is_gimple_assign (def_stmt))
> > +    return false;
> > +
> > +  ssa_op_iter it;
> > +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
> > +    if (find_value_dependent_guard (stmt, op))
> > +      /* Others may be "nearer".  */
> > +      return true;
> > +
> > +  return false;
> > +}
> > +
> > +bool
> > +pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
> > +{
> > +  /* Given we ignore the function boundary for incoming parameters
> > +     let's ignore return values of calls as well for the purpose
> > +     of being the first indexed load (also ignore inline-asms).  */
> > +  if (!gimple_assign_load_p (stmt))
> > +    return false;
> > +
> > +  /* Exclude esp. pointers from the index load itself (but also floats,
> > +     vectors, etc. - quite a bit handwaving here).  */
> > +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
> > +    return false;
> > +
> > +  /* If we do not have any SSA uses the load cannot be one indexed
> > +     by an attacker controlled value.  */
> > +  if (zero_ssa_operands (stmt, SSA_OP_USE))
> > +    return false;
> > +
> > +  return true;
> > +}
> > +
> > +/* Return true whether the index in the use operand OP in STMT is
> > +   not transfered to STMTs defs.  */
> > +
> > +bool
> > +pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
> > +{
> > +  if (gimple_assign_load_p (stmt))
> > +    return true;
> > +  if (gassign *ass = dyn_cast <gassign *> (stmt))
> > +    {
> > +      enum tree_code code = gimple_assign_rhs_code (ass);
> > +      switch (code)
> > +	{
> > +	case TRUNC_DIV_EXPR:
> > +	case CEIL_DIV_EXPR:
> > +	case FLOOR_DIV_EXPR:
> > +	case ROUND_DIV_EXPR:
> > +	case EXACT_DIV_EXPR:
> > +	case RDIV_EXPR:
> > +	case TRUNC_MOD_EXPR:
> > +	case CEIL_MOD_EXPR:
> > +	case FLOOR_MOD_EXPR:
> > +	case ROUND_MOD_EXPR:
> > +	case LSHIFT_EXPR:
> > +	case RSHIFT_EXPR:
> > +	case LROTATE_EXPR:
> > +	case RROTATE_EXPR:
> > +	  /* Division, modulus or shifts by the index do not produce
> > +	     something useful for the attacker.  */
> > +	  if (gimple_assign_rhs2 (ass) == op)
> > +	    return true;
> > +	  break;
> > +	default:;
> > +	  /* Comparisons do not produce an index value.  */
> > +	  if (TREE_CODE_CLASS (code) == tcc_comparison)
> > +	    return true;
> > +	}
> > +    }
> > +  /* ???  We could handle builtins here.  */
> > +  return false;
> > +}
> > +
> > +static GTY(()) tree spectrev1_tls_mask_decl;
> > +
> > +/* Main entry for spectrev1 pass.  */
> > +
> > +unsigned int
> > +pass_spectrev1::execute (function *fn)
> > +{
> > +  calculate_dominance_info (CDI_DOMINATORS);
> > +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> > +
> > +  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> > +  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
> > +
> > +  /* We track for each SSA name whether its value (may) depend(s) on
> > +     the result of an indexed load.
> > +     A set of operation will kill a value (enough).  */
> > +  auto_sbitmap value_from_indexed_load (num_ssa_names);
> > +  bitmap_clear (value_from_indexed_load);
> > +
> > +  unsigned orig_num_ssa_names = num_ssa_names;
> > +  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
> > +  for (unsigned i = 1; i < num_ssa_names; ++i)
> > +    bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack);
> > +
> > +
> > +  /* Diagnosis.  */
> > +
> > +  /* Function arguments are not indexed loads unless we want to
> > +     be conservative to a level no longer useful.  */
> > +
> > +  for (int i = 0; i < rpo_num; ++i)
> > +    {
> > +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> > +
> > +      for (gphi_iterator gpi = gsi_start_phis (bb);
> > +	   !gsi_end_p (gpi); gsi_next (&gpi))
> > +	{
> > +	  gphi *phi = gpi.phi ();
> > +	  bool value_from_indexed_load_p = false;
> > +	  use_operand_p arg_p;
> > +	  ssa_op_iter it;
> > +	  FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
> > +	    {
> > +	      tree arg = USE_FROM_PTR (arg_p);
> > +	      if (TREE_CODE (arg) == SSA_NAME
> > +		  && bitmap_bit_p (value_from_indexed_load,
> > +				   SSA_NAME_VERSION (arg)))
> > +		value_from_indexed_load_p = true;
> > +	    }
> > +	  if (value_from_indexed_load_p)
> > +	    bitmap_set_bit (value_from_indexed_load,
> > +			    SSA_NAME_VERSION (PHI_RESULT (phi)));
> > +	}
> > +
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> > +	   !gsi_end_p (gsi); gsi_next (&gsi))
> > +	{
> > +	  gimple *stmt = gsi_stmt (gsi);
> > +	  if (is_gimple_debug (stmt))
> > +	    continue;
> > +
> > +	  if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
> > +					check_spectrev1_2nd_load,
> > +					check_spectrev1_2nd_load))
> > +	    warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
> > +			stmt);
> > +
> > +	  bool value_from_indexed_load_p = false;
> > +	  if (stmt_is_indexed_load (stmt))
> > +	    {
> > +	      /* We are interested in indexes to later loads so ultimatively
> > +		 register values that all happen to separate SSA defs.
> > +		 Interesting aggregates will be decomposed by later loads
> > +		 which we then mark as producing an index.  Simply mark
> > +		 all SSA defs as coming from an indexed load.  */
> > +	      /* We are handling a single load in STMT right now.  */
> > +	      ssa_op_iter it;
> > +	      tree op;
> > +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> > +	        if (find_value_dependent_guard (stmt, op))
> > +		  {
> > +		    /* ???  Somehow record the dependence to point to it in
> > +		       diagnostics.  */
> > +		    value_from_indexed_load_p = true;
> > +		    break;
> > +		  }
> > +	    }
> > +
> > +	  tree op;
> > +	  ssa_op_iter it;
> > +	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> > +	    if (bitmap_bit_p (value_from_indexed_load,
> > +			      SSA_NAME_VERSION (op))
> > +		&& !stmt_mangles_index (stmt, op))
> > +	      {
> > +		value_from_indexed_load_p = true;
> > +		break;
> > +	      }
> > +
> > +	  if (value_from_indexed_load_p)
> > +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
> > +	      /* ???  We could cut off single-bit values from the chain
> > +	         here or pretain that float loads will be never turned
> > +		 into integer indices, etc.  */
> > +	      bitmap_set_bit (value_from_indexed_load,
> > +			      SSA_NAME_VERSION (op));
> > +	}
> > +
> > +      if (EDGE_COUNT (bb->succs) > 1)
> > +	{
> > +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> > +	  /* ???  What about switches?  What about badly speculated EH?  */
> > +	  if (!stmt)
> > +	    continue;
> > +	  /* We could constrain conditions here to those more likely
> > +	     being "bounds checks".  For example common guards for
> > +	     indirect accesses are NULL pointer checks.
> > +	     ???  This isn't fully safe, but it drops the number of
> > +	     spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
> > +	  if ((gimple_cond_code (stmt) == EQ_EXPR
> > +	       || gimple_cond_code (stmt) == NE_EXPR)
> > +	      && integer_zerop (gimple_cond_rhs (stmt))
> > +	      && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
> > +	    ;
> > +	  else
> > +	    {
> > +	      ssa_op_iter it;
> > +	      tree op;
> > +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> > +		mark_influencing_outgoing_flow (bb, op);
> > +	    }
> > +	}
> > +    }
> > +
> > +  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
> > +    bitmap_release (&influencing_outgoing_flow[i]);
> > +  XDELETEVEC (influencing_outgoing_flow);
> > +
> > +
> > +
> > +  /* Instrumentation.  */
> > +  if (!flag_spectrev1)
> > +    return 0;
> > +
> > +  /* Create the default all-ones mask.  When doing IPA instrumentation
> > +     this should initialize the mask from TLS memory and outgoing edges
> > +     need to save the mask to TLS memory.  */
> > +  gimple *new_stmt;
> > +  if (!spectrev1_tls_mask_decl
> > +      && flag_spectrev1 >= 3)
> > +    {
> > +      /* Use a smaller variable in case sign-extending loads are
> > +	 available?  */
> > +      spectrev1_tls_mask_decl
> > +	  = build_decl (BUILTINS_LOCATION,
> > +			VAR_DECL, NULL_TREE, ptr_type_node);
> > +      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
> > +      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
> > +      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
> > +      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
> > +      DECL_INITIAL (spectrev1_tls_mask_decl)
> > +	  = build_all_ones_cst (ptr_type_node);
> > +      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
> > +      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
> > +      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
> > +      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
> > +      make_decl_one_only (spectrev1_tls_mask_decl,
> > +			  DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
> > +      set_decl_tls_model (spectrev1_tls_mask_decl,
> > +			  decl_default_tls_model (spectrev1_tls_mask_decl));
> > +    }
> > +
> > +  /* We let the SSA rewriter cope with rewriting mask into SSA and
> > +     inserting PHI nodes.  */
> > +  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
> > +  new_stmt = gimple_build_assign (mask,
> > +				  flag_spectrev1 >= 3
> > +				  ? spectrev1_tls_mask_decl
> > +				  : build_all_ones_cst (ptr_type_node));
> > +  gimple_stmt_iterator gsi
> > +      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
> > +  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
> > +
> > +  /* We are using the visited flag to track stmts downstream in a BB.  */
> > +  for (int i = 0; i < rpo_num; ++i)
> > +    {
> > +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> > +      for (gphi_iterator gpi = gsi_start_phis (bb);
> > +	   !gsi_end_p (gpi); gsi_next (&gpi))
> > +	gimple_set_visited (gpi.phi (), false);
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> > +	   !gsi_end_p (gsi); gsi_next (&gsi))
> > +	gimple_set_visited (gsi_stmt (gsi), false);
> > +    }
> > +
> > +  for (int i = 0; i < rpo_num; ++i)
> > +    {
> > +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> > +
> > +      for (gphi_iterator gpi = gsi_start_phis (bb);
> > +	   !gsi_end_p (gpi); gsi_next (&gpi))
> > +	{
> > +	  gphi *phi = gpi.phi ();
> > +	  /* ???  We can merge SAFE state across BB boundaries in
> > +	     some cases, like when edges are not critical and the
> > +	     state was made SAFE in the tail of the predecessors
> > +	     and not invalidated by calls.   */
> > +	  gimple_set_plf (phi, SV1_SAFE, false);
> > +	}
> > +
> > +      bool instrumented_call_p = false;
> > +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> > +	   !gsi_end_p (gsi); gsi_next (&gsi))
> > +	{
> > +	  gimple *stmt = gsi_stmt (gsi);
> > +	  gimple_set_visited (stmt, true);
> > +	  if (is_gimple_debug (stmt))
> > +	    continue;
> > +
> > +	  tree op;
> > +	  ssa_op_iter it;
> > +	  bool safe = is_gimple_assign (stmt);
> > +	  if (safe)
> > +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> > +	      {
> > +		if (safe
> > +		    && (SSA_NAME_IS_DEFAULT_DEF (op)
> > +			|| !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
> > +			/* Once mask can have changed we cannot further
> > +			   propagate safe state.  */
> > +			|| gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
> > +			/* That includes calls if we have instrumented one
> > +			   in this block.  */
> > +			|| (instrumented_call_p
> > +			    && call_between (SSA_NAME_DEF_STMT (op), stmt))))
> > +		  {
> > +		    safe = false;
> > +		    break;
> > +		  }
> > +	      }
> > +	  gimple_set_plf (stmt, SV1_SAFE, safe);
> > +
> > +	  /* Instrument bounded loads.
> > +	     We instrument non-aggregate loads with non-invariant address.
> > +	     The idea is to reliably instrument the bounded load while
> > +	     leaving the canary, being it load or store, aggregate or
> > +	     non-aggregate, alone.  */
> > +	  if (gimple_assign_single_p (stmt)
> > +	      && gimple_vuse (stmt)
> > +	      && !gimple_vdef (stmt)
> > +	      && !zero_ssa_operands (stmt, SSA_OP_USE))
> > +	    {
> > +	      tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
> > +					     mask);
> > +	      gimple_assign_set_rhs1 (stmt, new_mem);
> > +	      update_stmt (stmt);
> > +	      /* The value loaded my a masked load is "safe".  */
> > +	      gimple_set_plf (stmt, SV1_SAFE, true);
> > +	    }
> > +
> > +	  /* Instrument return store to TLS mask.  */
> > +	  if (flag_spectrev1 >= 3
> > +	      && gimple_code (stmt) == GIMPLE_RETURN)
> > +	    {
> > +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> > +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> > +	    }
> > +	  /* Instrument calls with store/load to/from TLS mask.
> > +	     ???  Placement of the stores/loads can be optimized in a LCM
> > +	     way.  */
> > +	  else if (flag_spectrev1 >= 3
> > +		   && is_gimple_call (stmt)
> > +		   && gimple_vuse (stmt))
> > +	    {
> > +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> > +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> > +	      if (!stmt_ends_bb_p (stmt))
> > +		{
> > +		  new_stmt = gimple_build_assign (mask,
> > +						  spectrev1_tls_mask_decl);
> > +		  gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> > +		}
> > +	      else
> > +		{
> > +		  edge_iterator ei;
> > +		  edge e;
> > +		  FOR_EACH_EDGE (e, ei, bb->succs)
> > +		    {
> > +		      if (e->flags & EDGE_ABNORMAL)
> > +			continue;
> > +		      new_stmt = gimple_build_assign (mask,
> > +						      spectrev1_tls_mask_decl);
> > +		      gsi_insert_on_edge (e, new_stmt);
> > +		    }
> > +		}
> > +	      instrumented_call_p = true;
> > +	    }
> > +	}
> > +
> > +      if (EDGE_COUNT (bb->succs) > 1)
> > +	{
> > +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> > +	  /* ???  What about switches?  What about badly speculated EH?  */
> > +	  if (!stmt)
> > +	    continue;
> > +
> > +	  /* Instrument conditional branches to track mis-speculation
> > +	     via a pointer-sized mask.
> > +	     ???  We could restrict to instrumenting those conditions
> > +	     that control interesting loads or apply simple heuristics
> > +	     like not instrumenting FP compares or equality compares
> > +	     which are unlikely bounds checks.  But we have to instrument
> > +	     bool != 0 because multiple conditions might have been
> > +	     combined.  */
> > +	  edge truee, falsee;
> > +	  extract_true_false_edges_from_block (bb, &truee, &falsee);
> > +	  /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
> > +	  if (flag_spectrev1 >= 2
> > +	      || !loop_exits_from_bb_p (bb->loop_father, bb))
> > +	    {
> > +	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
> > +
> > +	      /* Instrument
> > +	           if (a_1 > b_2)
> > +		 as
> > +	           tem_mask_3 = a_1 > b_2 ? -1 : 0;
> > +		   if (tem_mask_3 != 0)
> > +		 this will result in a
> > +		   xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
> > +		 sequence which is faster in practice than when retaining
> > +		 the original jump condition.  This is 10 bytes overhead
> > +		 on x86_64 plus 3 bytes for an and on the true path and
> > +		 5 bytes for an and and not on the false path.  */
> > +	      tree tem_mask = make_ssa_name (ptr_type_node);
> > +	      new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
> > +					      build2 (gimple_cond_code (stmt),
> > +						      boolean_type_node,
> > +						      gimple_cond_lhs (stmt),
> > +						      gimple_cond_rhs (stmt)),
> > +					      build_all_ones_cst (ptr_type_node),
> > +					      build_zero_cst (ptr_type_node));
> > +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> > +	      gimple_cond_set_code (stmt, NE_EXPR);
> > +	      gimple_cond_set_lhs (stmt, tem_mask);
> > +	      gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
> > +	      update_stmt (stmt);
> > +
> > +	      /* On the false edge
> > +	           mask = mask & ~tem_mask_3;  */
> > +	      gimple_seq tems = NULL;
> > +	      tree tem_mask2 = make_ssa_name (ptr_type_node);
> > +	      new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
> > +					      tem_mask);
> > +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
> > +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> > +					      mask, tem_mask2);
> > +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
> > +	      gsi_insert_seq_on_edge (falsee, tems);
> > +
> > +	      /* On the true edge
> > +	           mask = mask & tem_mask_3;  */
> > +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> > +					      mask, tem_mask);
> > +	      gsi_insert_on_edge (truee, new_stmt);
> > +	    }
> > +	}
> > +    }
> > +
> > +  gsi_commit_edge_inserts ();
> > +
> > +  return 0;
> > +}
> > +
> > +} // anon namespace
> > +
> > +gimple_opt_pass *
> > +make_pass_spectrev1 (gcc::context *ctxt)
> > +{
> > +  return new pass_spectrev1 (ctxt);
> > +}
> > diff --git a/gcc/params.def b/gcc/params.def
> > index 6f98fccd291..19f7dbf4dad 100644
> > --- a/gcc/params.def
> > +++ b/gcc/params.def
> > @@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS,
> >  	 " loops.",
> >  	 100, 0, 0)
> >  
> > +DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES,
> > +	 "spectre-v1-max-instrument-indices",
> > +	 "Maximum number of indices to instrument before instrumenting the whole address.",
> > +	 1, 0, 0)
> > +
> >  /*
> >  
> >  Local variables:
> > diff --git a/gcc/passes.def b/gcc/passes.def
> > index 144df4fa417..2fe0cdcfa7e 100644
> > --- a/gcc/passes.def
> > +++ b/gcc/passes.def
> > @@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
> >    NEXT_PASS (pass_lower_resx);
> >    NEXT_PASS (pass_nrv);
> >    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> > +  NEXT_PASS (pass_spectrev1);
> >    NEXT_PASS (pass_warn_function_noreturn);
> >    NEXT_PASS (pass_gen_hsail);
> >  
> > diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> > new file mode 100644
> > index 00000000000..3ac647e72fd
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-Wspectre-v1" } */
> > +
> > +unsigned char a[1024];
> > +int b[256];
> > +int foo (int i, int bound)
> > +{
> > +  if (i < bound)
> > +    return b[a[i]];  /* { dg-warning "spectrev1" } */
> > +}
> > diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> > index 9f9d85fdbc3..f5c164f465f 100644
> > --- a/gcc/tree-pass.h
> > +++ b/gcc/tree-pass.h
> > @@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
> >  extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
> > +extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
> >  
> >  /* Current optimization pass.  */
> >  extern opt_pass *current_pass;
> > 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 11:25   ` Richard Biener
@ 2018-12-19 11:34     ` Florian Weimer
  2018-12-19 11:51       ` Richard Biener
  2018-12-19 15:42     ` Richard Earnshaw (lists)
  1 sibling, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-19 11:34 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Earnshaw (lists), gcc

* Richard Biener:

> The cost is probably target dependent - on x86 it's simply a $fs based
> load/store.

Do you need to reserve something in the TCB for this?

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 11:34     ` Florian Weimer
@ 2018-12-19 11:51       ` Richard Biener
  2018-12-19 13:35         ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2018-12-19 11:51 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Richard Earnshaw (lists), gcc

On Wed, 19 Dec 2018, Florian Weimer wrote:

> * Richard Biener:
> 
> > The cost is probably target dependent - on x86 it's simply a $fs based
> > load/store.
> 
> Do you need to reserve something in the TCB for this?

No idea.  But I figured using TLS with the patch only works when
optimizing and not with -O0.  Huh.  Anyway, it should be equivalent
to what presence of

__thread void *_SV1MASK = (void *)-1l;

requires (plus I make that symbol GNU_UNIQUE).  I see this
allocates -1 in the .tdata section marked TLS.

Richard.

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 11:51       ` Richard Biener
@ 2018-12-19 13:35         ` Florian Weimer
  2018-12-19 13:49           ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-19 13:35 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Earnshaw (lists), gcc

* Richard Biener:

> On Wed, 19 Dec 2018, Florian Weimer wrote:
>
>> * Richard Biener:
>> 
>> > The cost is probably target dependent - on x86 it's simply a $fs based
>> > load/store.
>> 
>> Do you need to reserve something in the TCB for this?
>
> No idea.  But I figured using TLS with the patch only works when
> optimizing and not with -O0.  Huh.  Anyway, it should be equivalent
> to what presence of
>
> __thread void *_SV1MASK = (void *)-1l;
>
> requires (plus I make that symbol GNU_UNIQUE).  I see this
> allocates -1 in the .tdata section marked TLS.

Oh.  That's going to be substantially worse for PIC, even with the
initial-exec model, especially on architectures which do not have
arbitrary PC-relative loads.  Which is why I'm asking about the TCB
reservation.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 13:35         ` Florian Weimer
@ 2018-12-19 13:49           ` Richard Biener
  2018-12-19 14:01             ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2018-12-19 13:49 UTC (permalink / raw)
  To: Florian Weimer; +Cc: Richard Earnshaw (lists), gcc

On Wed, 19 Dec 2018, Florian Weimer wrote:

> * Richard Biener:
> 
> > On Wed, 19 Dec 2018, Florian Weimer wrote:
> >
> >> * Richard Biener:
> >> 
> >> > The cost is probably target dependent - on x86 it's simply a $fs based
> >> > load/store.
> >> 
> >> Do you need to reserve something in the TCB for this?
> >
> > No idea.  But I figured using TLS with the patch only works when
> > optimizing and not with -O0.  Huh.  Anyway, it should be equivalent
> > to what presence of
> >
> > __thread void *_SV1MASK = (void *)-1l;
> >
> > requires (plus I make that symbol GNU_UNIQUE).  I see this
> > allocates -1 in the .tdata section marked TLS.
> 
> Oh.  That's going to be substantially worse for PIC, even with the
> initial-exec model, especially on architectures which do not have
> arbitrary PC-relative loads.  Which is why I'm asking about the TCB
> reservation.

Sure, if we'd ever deploy this in production placing this in the
TCB for glibc targets might be beneifical.  But as said the
current implementation was just an experiment intended to be
maximum portable.  I suppose the dynamic loader takes care
of initializing the TCB data?

I would expect most targets to use tricks with the stack pointer
for passing around the mask in any case (using the msb is sth
that was suggested for example).

Richard.

> Thanks,
> Florian
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 13:49           ` Richard Biener
@ 2018-12-19 14:01             ` Florian Weimer
  2018-12-19 14:19               ` Peter Bergner
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-19 14:01 UTC (permalink / raw)
  To: Richard Biener; +Cc: Richard Earnshaw (lists), gcc

* Richard Biener:

> Sure, if we'd ever deploy this in production placing this in the
> TCB for glibc targets might be beneifical.  But as said the
> current implementation was just an experiment intended to be
> maximum portable.  I suppose the dynamic loader takes care
> of initializing the TCB data?

Yes, the dynamic linker will initialize it.  If you need 100% reliable
initialization with something that is not zero, it's going to be tricky
though.  Initial-exec TLS memory has this covered, but in the TCB, we
only have zeroed-out reservations today.

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 14:01             ` Florian Weimer
@ 2018-12-19 14:19               ` Peter Bergner
  2018-12-19 15:44                 ` Florian Weimer
  0 siblings, 1 reply; 17+ messages in thread
From: Peter Bergner @ 2018-12-19 14:19 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Richard Biener, Richard Earnshaw (lists),
	gcc, Tulio Magno Quites Machado Filho

On 12/19/18 7:59 AM, Florian Weimer wrote:
> * Richard Biener:
> 
>> Sure, if we'd ever deploy this in production placing this in the
>> TCB for glibc targets might be beneifical.  But as said the
>> current implementation was just an experiment intended to be
>> maximum portable.  I suppose the dynamic loader takes care
>> of initializing the TCB data?
> 
> Yes, the dynamic linker will initialize it.  If you need 100% reliable
> initialization with something that is not zero, it's going to be tricky
> though.  Initial-exec TLS memory has this covered, but in the TCB, we
> only have zeroed-out reservations today.

We have non-zero initialized TCB entries on powerpc*-linux which are used
for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin
functions.  Tulio would know the magic that was used to get them setup.

Peter



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 11:25   ` Richard Biener
  2018-12-19 11:34     ` Florian Weimer
@ 2018-12-19 15:42     ` Richard Earnshaw (lists)
  2018-12-19 17:20       ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Earnshaw (lists) @ 2018-12-19 15:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc

On 19/12/2018 11:25, Richard Biener wrote:
> On Tue, 18 Dec 2018, Richard Earnshaw (lists) wrote:
> 
>> On 18/12/2018 15:36, Richard Biener wrote:
>>>
>>> Hi,
>>>
>>> in the past weeks I've been looking into prototyping both spectre V1 
>>> (speculative array bound bypass) diagnostics and mitigation in an
>>> architecture independent manner to assess feasability and some kind
>>> of upper bound on the performance impact one can expect.
>>> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
>>> an interesting read in this context as well.
>>
>> Interesting, thanks for posting this.
>>
>>>
>>> For simplicity I have implemented mitigation on GIMPLE right before
>>> RTL expansion and have chosen TLS to do mitigation across function
>>> boundaries.  Diagnostics sit in the same place but both are not in
>>> any way dependent on each other.
>>
>> We considered using TLS for propagating the state across call-boundaries
>> on AArch64, but rejected it for several reasons.
>>
>> - It's quite expensive to have to set up the TLS state in every function;
>> - It requires some global code to initialize the state variable - that's
>> kind of ABI;
> 
> The cost is probably target dependent - on x86 it's simply a $fs based
> load/store.  For initialization a static initializer seemed to work
> for me (but honestly I didn't do any testing besides running the
> testsuite for correctness - so at least the mask wasn't zero initialized).
> Note the LLVM people use an inverted mask and cancel values by
> OR-ing -1 instead of AND-ing 0.  At least default zero-initialization
> should be possible with TLS vars.
> 
> That said, my choice of TLS was to make this trivially work across
> targets - if a target can do better then it should.  And of course
> the target may not have any TLS support besides emultls which would
> be prohibitly expensive.
> 
>> - It also seems likely to be vulnerable to Spectre variant 4 - unless
>> the CPU can always correctly store-to-load forward the speculation
>> state, then you have the situation where the load may see an old value
>> of the state - and that's almost certain to say "we're not speculating".
>>
>> The last one is really the killer here.
> 
> Hmm, as far as I understood v4 only happens when store-forwarding
> doesn't work.  And I hope it doesn't fail "randomly" but works
> reliable when all accesses to the memory are aligned and have
> the same size as is the case with these compiler-generated TLS
> accesses.  But yes, if that's not guaranteed then using memory
> doesn't work at all.  

The problem is that you can't prove this through realistic testing.
Architecturally, the result has to come out the same in the end in that
if the load does bypass the store, eventually the hardware has to replay
the instruction with the correct data and cancel any operations that
were dependent on the earlier execution.  Only side-channel data will be
left after that.

> Not sure what else target independent there
> is though that doesn't break the ABI like simply adding another
> parameter.  And even adding a parameter might not work in case
> there's only stack passing and V4 happens on the stack accesses...

Yep, exactly.

> 
>>>
>>> The mitigation strategy chosen is that of tracking speculation
>>> state via a mask that can be used to zero parts of the addresses
>>> that leak the actual data.  That's similar to what aarch64 does
>>> with -mtrack-speculation (but oddly there's no mitigation there).
>>
>> We rely on the user inserting the new builtin, which we can more
>> effectively optimize if the compiler is generating speculation state
>> tracking data.  That doesn't preclude a full solution at a later date,
>> but it looked like it was likely overkill for protecting every load and
>> safely pruning the loads is not an easy problem to solve.  Of course,
>> the builtin does require the programmer to do some work to identify
>> which memory accesses might be vulnerable.
> 
> My main question was how in earth the -mtrack-speculation overhead
> is reasonable for the very few expected explicit builtin uses...

Ultimately that will depend on what the user wants and the level of
protection needed.  The builtin gives the choice: get a hard barrier if
tracking has not been enabled, with a very high hit at the point of
execution; or take a much lower hit at that point if tracking has been
enabled.  That's a trade-off between how often you hit the barrier vs
how much you hit the tracking events to no benefit.

Your code, however, doesn't work at present.  This example shows that
the mitigation code is just optimized away by the rtl passes, at least
for -fspectre-v1=2.

int f (int a, int b, int c, char *d)
{
  if (a > 10)
    return 0;

  if (b > 64)
    return 0;

  if (c > 96)
    return 0;

  return d[a] + d[b] + d[c];
}

It works ok at level 3 because then the compiler can't prove the logical
truth of the speculation variable on the path from TLS memory and that's
sufficient to defeat the optimizers.

R.

> 
> Richard.
> 
>> R.
>>
>>
>>>
>>> I've optimized things to the point that is reasonable when working
>>> target independent on GIMPLE but I've only looked at x86 assembly
>>> and performance.  I expect any "final" mitigation if we choose to
>>> implement and integrate such would be after RTL expansion since
>>> RTL expansion can end up introducing quite some control flow whose
>>> speculation state is not properly tracked by the prototype.
>>>
>>> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
>>> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
>>> mitigation and =3 does mitigation global with passing the state
>>> via TLS memory.
>>>
>>> The following was measured on a Haswell desktop CPU:
>>>
>>> 	-O2 vs. -O2 -fspectre-v1=2
>>>
>>>                                   Estimated                       Estimated
>>>                 Base     Base       Base        Peak     Peak       Peak
>>> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
>>> -------------- ------  ---------  ---------    ------  ---------  ---------
>>> 400.perlbench    9770        245       39.8 *    9770        452       21.6 *  184%
>>> 401.bzip2        9650        378       25.5 *    9650        726       13.3 *  192%
>>> 403.gcc          8050        236       34.2 *    8050        352       22.8 *  149%
>>> 429.mcf          9120        223       40.9 *    9120        656       13.9 *  294%
>>> 445.gobmk       10490        400       26.2 *   10490        666       15.8 *  167%
>>> 456.hmmer        9330        388       24.1 *    9330        536       17.4 *  138%
>>> 458.sjeng       12100        437       27.7 *   12100        661       18.3 *  151%
>>> 462.libquantum  20720        300       69.1 *   20720        384       53.9 *  128%
>>> 464.h264ref     22130        451       49.1 *   22130        586       37.8 *  130%
>>> 471.omnetpp      6250        291       21.5 *    6250        398       15.7 *  137%
>>> 473.astar        7020        334       21.0 *    7020        522       13.5 *  156%
>>> 483.xalancbmk    6900        182       37.9 *    6900        306       22.6 *  168%
>>>  Est. SPECint_base2006                   --
>>>  Est. SPECint2006                                                        --
>>>
>>>    -O2 -fspectre-v1=3
>>>
>>>                                   Estimated                       Estimated
>>>                 Base     Base       Base        Peak     Peak       Peak
>>> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
>>> -------------- ------  ---------  ---------    ------  ---------  ---------
>>> 400.perlbench                                    9770        497       19.6 *  203%
>>> 401.bzip2                                        9650        772       12.5 *  204%
>>> 403.gcc                                          8050        427       18.9 *  181%
>>> 429.mcf                                          9120        696       13.1 *  312%
>>> 445.gobmk                                       10490        726       14.4 *  181%
>>> 456.hmmer                                        9330        537       17.4 *  138%
>>> 458.sjeng                                       12100        721       16.8 *  165%
>>> 462.libquantum                                  20720        446       46.4 *  149%
>>> 464.h264ref                                     22130        613       36.1 *  136%
>>> 471.omnetpp                                      6250        471       13.3 *  162%
>>> 473.astar                                        7020        579       12.1 *  173%
>>> 483.xalancbmk                                    6900        350       19.7 *  192%
>>>  Est. SPECint(R)_base2006           Not Run
>>>  Est. SPECint2006                                                        --
>>>
>>>
>>> While the following was measured on a Zen Epyc server:
>>>
>>> -O2 vs -O2 -fspectre-v1=2
>>>
>>>                        Estimated                       Estimated
>>>                  Base     Base        Base        Peak     Peak        Peak
>>> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
>>> --------------- -------  ---------  ---------    -------  ---------  ---------
>>> 500.perlbench_r       1        499       3.19  *       1        621       2.56  * 124%
>>> 502.gcc_r             1        286       4.95  *       1        392       3.61  * 137%
>>> 505.mcf_r             1        331       4.88  *       1        456       3.55  * 138%
>>> 520.omnetpp_r         1        454       2.89  *       1        563       2.33  * 124%
>>> 523.xalancbmk_r       1        328       3.22  *       1        569       1.86  * 173%
>>> 525.x264_r            1        518       3.38  *       1        776       2.26  * 150%
>>> 531.deepsjeng_r       1        365       3.14  *       1        448       2.56  * 123%
>>> 541.leela_r           1        598       2.77  *       1        729       2.27  * 122%
>>> 548.exchange2_r       1        460       5.69  *       1        756       3.46  * 164%
>>> 557.xz_r              1        403       2.68  *       1        586       1.84  * 145%
>>>  Est. SPECrate2017_int_base              3.55
>>>  Est. SPECrate2017_int_peak                                               2.56    72%
>>>
>>> -O2 -fspectre-v2=3
>>>
>>>                        Estimated                       Estimated
>>>                  Base     Base        Base        Peak     Peak        Peak
>>> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
>>> --------------- -------  ---------  ---------    -------  ---------  ---------
>>> 500.perlbench_r                               NR       1        700       2.27  * 140%
>>> 502.gcc_r                                     NR       1        485       2.92  * 170%
>>> 505.mcf_r                                     NR       1        596       2.71  * 180%
>>> 520.omnetpp_r                                 NR       1        604       2.17  * 133%
>>> 523.xalancbmk_r                               NR       1        643       1.64  * 196%
>>> 525.x264_r                                    NR       1        797       2.20  * 154%
>>> 531.deepsjeng_r                               NR       1        542       2.12  * 149%
>>> 541.leela_r                                   NR       1        872       1.90  * 146%
>>> 548.exchange2_r                               NR       1        761       3.44  * 165%
>>> 557.xz_r                                      NR       1        595       1.81  * 148%
>>>  Est. SPECrate2017_int_base           Not Run
>>>  Est. SPECrate2017_int_peak                                               2.26    64%
>>>
>>>
>>>
>>> you can see, even thoug we're comparing apples and oranges, that the 
>>> performance impact is quite dependent on the microarchitecture.
>>>
>>> Similarly interesting as performance is the effect on text size which is
>>> surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
>>> bytes per instrumented memory).
>>>
>>> CPU2016:
>>>    BASE  -O2
>>>    text	   data	    bss	    dec	    hex	filename
>>> 1117726	  20928	  12704	1151358	 11917e	400.perlbench
>>>   56568	   3800	   4416	  64784	   fd10	401.bzip2
>>> 3419568	   7912	 751520	4179000	 3fc438	403.gcc
>>>   12212	    712	  11984	  24908	   614c	429.mcf
>>> 1460694	2081772	2330096	5872562	 599bb2	445.gobmk
>>>  284929	   5956	  82040	 372925	  5b0bd	456.hmmer
>>>  130782	   2152	2576896	2709830	 295946	458.sjeng
>>>   41915	    764	     96	  42775	   a717	462.libquantum
>>>  505452	  11220	 372320	 888992	  d90a0	464.h264ref
>>>  638188	   9584	  14664	 662436	  a1ba4	471.omnetpp
>>>   38859	    900	   5216	  44975	   afaf	473.astar
>>> 4033878	 140248	  12168	4186294	 3fe0b6	483.xalancbmk
>>>    PEAK -O2 -fspectre-v1=2
>>>    text	   data	    bss	    dec	    hex	filename
>>> 1508032	  20928	  12704	1541664	 178620	400.perlbench	135%
>>>   76098	   3800	   4416	  84314	  1495a	401.bzip2	135%
>>> 4483530	   7912	 751520	5242962	 500052	403.gcc		131%
>>>   16006	    712	  11984	  28702	   701e	429.mcf		131%
>>> 1647384	2081772	2330096	6059252	 5c74f4	445.gobmk	112%
>>>  377259	   5956	  82040	 465255	  71967	456.hmmer	132%
>>>  164672	   2152	2576896	2743720	 29dda8	458.sjeng	126%
>>>   47901	    764	     96	  48761	   be79	462.libquantum	114%
>>>  649854	  11220	 372320	1033394	  fc4b2	464.h264ref	129%
>>>  706908	   9584	  14664	 731156	  b2814	471.omnetpp	111%
>>>   48493	    900	   5216	  54609	   d551	473.astar	125%
>>> 4862056	 140248	  12168	5014472	 4c83c8	483.xalancbmk	121%
>>>    PEAK -O2 -fspectre-v1=3
>>>    text	   data	    bss	    dec	    hex	filename
>>> 1742008	  20936	  12704	1775648	 1b1820	400.perlbench	156%
>>>   83338	   3808	   4416	  91562	  165aa	401.bzip2	147%
>>> 5219850	   7920	 751520	5979290	 5b3c9a	403.gcc		153%
>>>   17422	    720	  11984	  30126	   75ae	429.mcf		143%
>>> 1801688	2081780	2330096	6213564	 5ecfbc	445.gobmk	123%
>>>  431827	   5964	  82040	 519831	  7ee97	456.hmmer	152%
>>>  182200	   2160	2576896	2761256	 2a2228	458.sjeng	139%
>>>   53773	    772	     96	  54641	   d571	462.libquantum	128%
>>>  691798	  11228	 372320	1075346	 106892	464.h264ref	137%
>>>  976692	   9592	  14664	1000948	  f45f4	471.omnetpp	153%
>>>   54525	    908	   5216	  60649	   ece9	473.astar	140%
>>> 5808306	 140256	  12168	5960730	 5af41a	483.xalancbmk	144%
>>>
>>> CPU2017:
>>>    BASE -O2 -g
>>>    text    data     bss     dec     hex filename
>>> 2209713    8576    9080 2227369  21fca9 500.perlbench_r
>>> 9295702   37432 1150664 10483798 9ff856 502.gcc_r
>>>   21795     712     744   23251    5ad3 505.mcf_r
>>> 2067560    8984   46888 2123432  2066a8 520.omnetpp_r
>>> 5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
>>>  508402    6102   29592  544096   84d60 525.x264_r
>>>   84222     784 12138360 12223366 ba8386 531.deepsjeng_r
>>>  223480    8544   30072  262096   3ffd0 541.leela_r
>>>   70554     864    6384   77802   12fea 548.exchange2_r
>>>  180640     884   17704  199228   30a3c 557.xz_r
>>>    PEAK -fspectre-v2=2
>>>    text    data     bss     dec     hex filename
>>> 2991161    8576    9080 3008817  2de931 500.perlbench_r	135%
>>> 12244886  37432 1150664 13432982 ccf896 502.gcc_r	132%
>>>   28475     712     744   29931    74eb 505.mcf_r	131%
>>> 2397026    8984   46888 2452898  256da2 520.omnetpp_r	116%
>>> 6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r	119%
>>>  645730    6102   29592  681424   a65d0 525.x264_r	127%
>>>  111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
>>>  260835    8544   30072  299451   491bb 541.leela_r     117%
>>>   96874     864    6384  104122   196ba 548.exchange2_r	137%
>>>  215288     884   17704  233876   39194 557.xz_r	119%
>>>    PEAK -fspectre-v2=3
>>>    text    data     bss     dec     hex filename
>>> 3365945    8584    9080 3383609  33a139 500.perlbench_r	152%
>>> 14790638  37440 1150664 15978742 f3d0f6 502.gcc_r	159%
>>>   31419     720     744   32883    8073 505.mcf_r	144%
>>> 2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r	139%
>>> 8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r	142%
>>>  697434    6110   29592  733136   b2fd0 525.x264_r	137%
>>>  123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
>>>  315347    8552   30072  353971   566b3 541.leela_r	141%
>>>   98578     872    6384  105834   19d6a 548.exchange2_r	140%
>>>  239144     892   17704  257740   3eecc 557.xz_r	133%
>>>
>>>
>>> The patch relies heavily on RTL optimizations for DCE purposes.  At the
>>> same time we rely on RTL not statically computing the mask (RTL has no
>>> conditional constant propagation).  Full instrumentation of the classic
>>> Spectre V1 testcase
>>>
>>> char a[1024];
>>> int b[1024];
>>> int foo (int i, int bound)
>>> {
>>>   if (i < bound)
>>>     return b[a[i]];
>>> }
>>>
>>> is the following:
>>>
>>> foo:
>>> .LFB0:  
>>>         .cfi_startproc
>>>         xorl    %eax, %eax
>>>         cmpl    %esi, %edi
>>>         setge   %al
>>>         subq    $1, %rax
>>>         jne     .L4
>>>         ret
>>>         .p2align 4,,10
>>>         .p2align 3
>>> .L4:
>>>         andl    %eax, %edi
>>>         movslq  %edi, %rdi
>>>         movsbq  a(%rdi), %rax
>>>         movl    b(,%rax,4), %eax
>>>         ret
>>>
>>> so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
>>>
>>> Patch below for reference (and your own testing in case you are curious).
>>> I do not plan to pursue this further at this point.
>>>
>>> Richard.
>>>
>>> From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
>>> From: Richard Guenther <rguenther@suse.de>
>>> Date: Wed, 5 Dec 2018 13:17:02 +0100
>>> Subject: [PATCH] warn-spectrev1
>>>
>>>
>>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
>>> index 7960cace16a..64d472d7fa0 100644
>>> --- a/gcc/Makefile.in
>>> +++ b/gcc/Makefile.in
>>> @@ -1334,6 +1334,7 @@ OBJS = \
>>>  	gimple-ssa-sprintf.o \
>>>  	gimple-ssa-warn-alloca.o \
>>>  	gimple-ssa-warn-restrict.o \
>>> +	gimple-ssa-spectrev1.o \
>>>  	gimple-streamer-in.o \
>>>  	gimple-streamer-out.o \
>>>  	gimple-walk.o \
>>> diff --git a/gcc/common.opt b/gcc/common.opt
>>> index 45d7f6189e5..1ae7fcfe177 100644
>>> --- a/gcc/common.opt
>>> +++ b/gcc/common.opt
>>> @@ -702,6 +702,10 @@ Warn when one local variable shadows another local variable or parameter of comp
>>>  Wshadow-compatible-local
>>>  Common Warning Undocumented Alias(Wshadow=compatible-local)
>>>  
>>> +Wspectre-v1
>>> +Common Var(warn_spectrev1) Warning
>>> +Warn about code susceptible to spectre v1 style attacks.
>>> +
>>>  Wstack-protector
>>>  Common Var(warn_stack_protect) Warning
>>>  Warn when not issuing stack smashing protection for some reason.
>>> @@ -2406,6 +2410,14 @@ fsingle-precision-constant
>>>  Common Report Var(flag_single_precision_constant) Optimization
>>>  Convert floating point constants to single precision constants.
>>>  
>>> +fspectre-v1
>>> +Common Alias(fspectre-v1=, 2, 0)
>>> +Insert code to mitigate spectre v1 style attacks.
>>> +
>>> +fspectre-v1=
>>> +Common Report RejectNegative Joined UInteger IntegerRange(0, 3) Var(flag_spectrev1) Optimization
>>> +Insert code to mitigate spectre v1 style attacks.
>>> +
>>>  fsplit-ivs-in-unroller
>>>  Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
>>>  Split lifetimes of induction variables when loops are unrolled.
>>> diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
>>> new file mode 100644
>>> index 00000000000..c2a5dc95324
>>> --- /dev/null
>>> +++ b/gcc/gimple-ssa-spectrev1.cc
>>> @@ -0,0 +1,824 @@
>>> +/* Loop interchange.
>>> +   Copyright (C) 2017-2018 Free Software Foundation, Inc.
>>> +   Contributed by ARM Ltd.
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it
>>> +under the terms of the GNU General Public License as published by the
>>> +Free Software Foundation; either version 3, or (at your option) any
>>> +later version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT
>>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "backend.h"
>>> +#include "is-a.h"
>>> +#include "tree.h"
>>> +#include "gimple.h"
>>> +#include "tree-pass.h"
>>> +#include "ssa.h"
>>> +#include "gimple-pretty-print.h"
>>> +#include "gimple-iterator.h"
>>> +#include "params.h"
>>> +#include "tree-ssa.h"
>>> +#include "cfganal.h"
>>> +#include "gimple-walk.h"
>>> +#include "tree-ssa-loop.h"
>>> +#include "tree-dfa.h"
>>> +#include "tree-cfg.h"
>>> +#include "fold-const.h"
>>> +#include "builtins.h"
>>> +#include "alias.h"
>>> +#include "cfgloop.h"
>>> +#include "varasm.h"
>>> +#include "cgraph.h"
>>> +#include "gimple-fold.h"
>>> +#include "diagnostic.h"
>>> +
>>> +/* The Spectre V1 situation is as follows:
>>> +
>>> +      if (attacker_controlled_idx < bound)  // speculated as true but is false
>>> +        {
>>> +	  // out-of-bound access, returns value interesting to attacker
>>> +	  val = mem[attacker_controlled_idx];
>>> +	  // access that causes a cache-line to be brought in - canary
>>> +	  ... = attacker_controlled_mem[val];
>>> +	}
>>> +
>>> +   The last load provides the side-channel.  The pattern can be split
>>> +   into multiple functions or translation units.  Conservatively we'd
>>> +   have to warn about
>>> +
>>> +      int foo (int *a) {  return *a; }
>>> +
>>> +   thus any indirect (or indexed) memory access.  That's obvioulsy
>>> +   not useful.
>>> +
>>> +   The next level would be to warn only when we see load of val as
>>> +   well.  That then misses cases like
>>> +
>>> +      int foo (int *a, int *b)
>>> +      {
>>> +        int idx = load_it (a);
>>> +	return load_it (&b[idx]);
>>> +      }
>>> +
>>> +   Still we'd warn about cases like
>>> +
>>> +      struct Foo { int *a; };
>>> +      int foo (struct Foo *a) { return *a->a; }
>>> +
>>> +   though dereferencing VAL isn't really an interesting case.  It's
>>> +   hard to exclude this conservatively so the obvious solution is
>>> +   to restrict the kind of loads that produce val, for example based
>>> +   on its type or its number of bits.  It's tempting to do this at
>>> +   the point of the load producing val but in the end what matters
>>> +   is the number of bits that reach the second loads [as index] given
>>> +   there are practical limits on the size of the canary.  For this
>>> +   we have to consider
>>> +
>>> +      int foo (struct Foo *a, int *b)
>>> +      {
>>> +        int *c = a->a;
>>> +	int idx = *b;
>>> +	return *(c + idx);
>>> +      }
>>> +
>>> +   where idx has too many bits to be an interesting attack vector(?).
>>> + */
>>> +
>>> +/* The pass does two things, first it performs data flow analysis
>>> +   to be able to warn about the second load.  This is controlled
>>> +   via -Wspectre-v1.
>>> +
>>> +   Second it instruments control flow in the program to track a
>>> +   mask which is all-ones but all-zeroes if the CPU speculated
>>> +   a branch in the wrong direction.  This mask is then used to
>>> +   mask the address[-part(s)] of loads with non-invariant addresses,
>>> +   effectively mitigating the attack.  This is controlled by
>>> +   -fpectre-v1[=N] where N is default 2 and
>>> +     1  optimistically omit some instrumentations (currently
>>> +        backedge control flow instructions do not update the
>>> +	speculation mask)
>>> +     2  instrument conservatively using a function-local speculation
>>> +        mask
>>> +     3  instrument conservatively using a global (TLS) speculation
>>> +        mask.  This adds TLS loads/stores of the speculation mask
>>> +	at function boundaries and before and after calls.
>>> + */
>>> +
>>> +/* We annotate statements whose defs cannot be used to leaking data
>>> +   speculatively via loads with SV1_SAFE.  This is used to optimize
>>> +   masking of indices where masked indices (and derived by constant
>>> +   ones) are not masked again.  Note this works only up to the points
>>> +   that possibly change the speculation mask value.  */
>>> +#define SV1_SAFE GF_PLF_1
>>> +
>>> +namespace {
>>> +
>>> +const pass_data pass_data_spectrev1 =
>>> +{
>>> +  GIMPLE_PASS, /* type */
>>> +  "spectrev1", /* name */
>>> +  OPTGROUP_NONE, /* optinfo_flags */
>>> +  TV_NONE, /* tv_id */
>>> +  PROP_cfg|PROP_ssa, /* properties_required */
>>> +  0, /* properties_provided */
>>> +  0, /* properties_destroyed */
>>> +  0, /* todo_flags_start */
>>> +  TODO_update_ssa, /* todo_flags_finish */
>>> +};
>>> +
>>> +class pass_spectrev1 : public gimple_opt_pass
>>> +{
>>> +public:
>>> +  pass_spectrev1 (gcc::context *ctxt)
>>> +    : gimple_opt_pass (pass_data_spectrev1, ctxt)
>>> +  {}
>>> +
>>> +  /* opt_pass methods: */
>>> +  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
>>> +  virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; }
>>> +  virtual unsigned int execute (function *);
>>> +
>>> +  static bool stmt_is_indexed_load (gimple *);
>>> +  static bool stmt_mangles_index (gimple *, tree);
>>> +  static bool find_value_dependent_guard (gimple *, tree);
>>> +  static void mark_influencing_outgoing_flow (basic_block, tree);
>>> +  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
>>> +}; // class pass_spectrev1
>>> +
>>> +bitmap_head *influencing_outgoing_flow;
>>> +
>>> +static bool
>>> +call_between (gimple *first, gimple *second)
>>> +{
>>> +  gcc_assert (gimple_bb (first) == gimple_bb (second));
>>> +  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
>>> +     unique IDs to stmts belonging to groups with the same speculation
>>> +     mask state.  */
>>> +  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
>>> +       gsi_stmt (gsi) != second; gsi_next (&gsi))
>>> +    if (is_gimple_call (gsi_stmt (gsi)))
>>> +      return true;
>>> +  return false;
>>> +}
>>> +
>>> +basic_block ctx_bb;
>>> +gimple *ctx_stmt;
>>> +static bool
>>> +gather_indexes (tree, tree *idx, void *data)
>>> +{
>>> +  vec<tree *> *indexes = (vec<tree *> *)data;
>>> +  if (TREE_CODE (*idx) != SSA_NAME)
>>> +    return true;
>>> +  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
>>> +      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
>>> +      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
>>> +      && (flag_spectrev1 < 3
>>> +	  || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
>>> +    return true;
>>> +  if (indexes->is_empty ())
>>> +    indexes->safe_push (idx);
>>> +  else if (*(*indexes)[0] == *idx)
>>> +    indexes->safe_push (idx);
>>> +  else
>>> +    return false;
>>> +  return true;
>>> +}
>>> +
>>> +tree
>>> +pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask)
>>> +{
>>> +  /* First try to see if we can find a single index we can zero which
>>> +     has the chance of repeating in other loads and also avoids separate
>>> +     LEA and memory references decreasing code size and AGU occupancy.  */
>>> +  auto_vec<tree *, 8> indexes;
>>> +  ctx_bb = gsi_bb (*gsi);
>>> +  ctx_stmt = gsi_stmt (*gsi);
>>> +  if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0
>>> +      && for_each_index (&mem, gather_indexes, (void *)&indexes))
>>> +    {
>>> +      /* All indices are safe.  */
>>> +      if (indexes.is_empty ())
>>> +	return mem;
>>> +      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
>>> +	  <= TYPE_PRECISION (TREE_TYPE (mask)))
>>> +	{
>>> +	  tree idx = *indexes[0];
>>> +	  gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
>>> +		      || POINTER_TYPE_P (TREE_TYPE (idx)));
>>> +	  /* Instead of instrumenting IDX directly we could look at
>>> +	     definitions with a single SSA use and instrument that
>>> +	     instead.  But we have to do some work to make SV1_SAFE
>>> +	     propagation updated then - this would really ask to first
>>> +	     gather all indexes of all refs we want to instrument and
>>> +	     compute some optimal set of instrumentations.  */
>>> +	  gimple_seq seq = NULL;
>>> +	  tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
>>> +	  tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
>>> +					  TREE_TYPE (idx), idx, idx_mask);
>>> +	  /* Mark the instrumentation sequence as visited.  */
>>> +	  for (gimple_stmt_iterator si = gsi_start (seq);
>>> +	       !gsi_end_p (si); gsi_next (&si))
>>> +	    gimple_set_visited (gsi_stmt (si), true);
>>> +	  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
>>> +	  gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
>>> +	  /* Replace downstream users in the BB which reduces register pressure
>>> +	     and allows SV1_SAFE propagation to work (which stops at call/BB
>>> +	     boundaries though).
>>> +	     ???  This is really reg-pressure vs. dependence chains so not
>>> +	     a generally easy thing.  Making the following propagate into
>>> +	     all uses dominated by the insert slows down 429.mcf even more.
>>> +	     ???  We can actually track SV1_SAFE across PHIs but then we
>>> +	     have to propagate into PHIs here.  */
>>> +	  gimple *use_stmt;
>>> +	  use_operand_p use_p;
>>> +	  imm_use_iterator iter;
>>> +	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
>>> +	    if (gimple_bb (use_stmt) == gsi_bb (*gsi)
>>> +		&& gimple_code (use_stmt) != GIMPLE_PHI
>>> +		&& !gimple_visited_p (use_stmt))
>>> +	      {
>>> +		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
>>> +		  SET_USE (use_p, masked_idx);
>>> +		update_stmt (use_stmt);
>>> +	      }
>>> +	  /* Modify MEM in place...  (our stmt is already marked visited).  */
>>> +	  for (unsigned i = 0; i < indexes.length (); ++i)
>>> +	    *indexes[i] = masked_idx;
>>> +	  return mem;
>>> +	}
>>> +    }
>>> +
>>> +  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
>>> +     handle BIT_FIELD_REFs.  */
>>> +
>>> +  /* Strip a bitfield reference to re-apply it at the end.  */
>>> +  tree bitfield = NULL_TREE;
>>> +  tree bitfield_off = NULL_TREE;
>>> +  if (TREE_CODE (mem) == COMPONENT_REF
>>> +      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
>>> +    {
>>> +      bitfield = TREE_OPERAND (mem, 1);
>>> +      bitfield_off = TREE_OPERAND (mem, 2);
>>> +      mem = TREE_OPERAND (mem, 0);
>>> +    }
>>> +
>>> +  tree ptr_base = mem;
>>> +  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
>>> +     into the MEM_REF we create.  */
>>> +  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
>>> +    ptr_base = TREE_OPERAND (ptr_base, 0);
>>> +
>>> +  tree ptr = make_ssa_name (ptr_type_node);
>>> +  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr (ptr_base));
>>> +  gimple_set_visited (new_stmt, true);
>>> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
>>> +  ptr = make_ssa_name (ptr_type_node);
>>> +  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
>>> +				  gimple_assign_lhs (new_stmt), mask);
>>> +  gimple_set_visited (new_stmt, true);
>>> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
>>> +  tree type = TREE_TYPE (mem);
>>> +  unsigned align = get_object_alignment (mem);
>>> +  if (align != TYPE_ALIGN (type))
>>> +    type = build_aligned_type (type, align);
>>> +
>>> +  tree new_mem = build2 (MEM_REF, type, ptr,
>>> +			 build_int_cst (reference_alias_ptr_type (mem), 0));
>>> +  if (bitfield)
>>> +    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
>>> +		      bitfield, bitfield_off);
>>> +  return new_mem;
>>> +}
>>> +
>>> +bool
>>> +check_spectrev1_2nd_load (tree, tree *idx, void *data)
>>> +{
>>> +  sbitmap value_from_indexed_load = (sbitmap)data;
>>> +  if (TREE_CODE (*idx) == SSA_NAME
>>> +      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
>>> +    return false;
>>> +  return true;
>>> +}
>>> +
>>> +bool
>>> +check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
>>> +{
>>> +  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
>>> +}
>>> +
>>> +void
>>> +pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
>>> +{
>>> +  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
>>> +		       bb->index))
>>> +    return;
>>> +
>>> +  /* Note we are deliberately non-conservatively stop at call and
>>> +     memory boundaries here expecting earlier optimization to expose
>>> +     value dependences via SSA chains.  */
>>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
>>> +  if (gimple_vuse (def_stmt)
>>> +      || !is_gimple_assign (def_stmt))
>>> +    return;
>>> +
>>> +  ssa_op_iter i;
>>> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
>>> +    mark_influencing_outgoing_flow (bb, op);
>>> +}
>>> +
>>> +bool
>>> +pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
>>> +{
>>> +  bitmap_iterator bi;
>>> +  unsigned i;
>>> +  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
>>> +			    0, i, bi)
>>> +    /* ???  If control-dependent on.
>>> +       ???  Make bits in influencing_outgoing_flow the index of the BB
>>> +       in RPO order so we could walk bits from STMT "upwards" finding
>>> +       the nearest one.  */
>>> +    if (dominated_by_p (CDI_DOMINATORS,
>>> +			gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
>>> +      {
>>> +	if (dump_enabled_p ())
>>> +	  dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
>>> +			   "is related to indexes used in %G\n",
>>> +			   last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
>>> +			   i, stmt);
>>> +	return true;
>>> +      }
>>> +
>>> +  /* Note we are deliberately non-conservatively stop at call and
>>> +     memory boundaries here expecting earlier optimization to expose
>>> +     value dependences via SSA chains.  */
>>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
>>> +  if (gimple_vuse (def_stmt)
>>> +      || !is_gimple_assign (def_stmt))
>>> +    return false;
>>> +
>>> +  ssa_op_iter it;
>>> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
>>> +    if (find_value_dependent_guard (stmt, op))
>>> +      /* Others may be "nearer".  */
>>> +      return true;
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +bool
>>> +pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
>>> +{
>>> +  /* Given we ignore the function boundary for incoming parameters
>>> +     let's ignore return values of calls as well for the purpose
>>> +     of being the first indexed load (also ignore inline-asms).  */
>>> +  if (!gimple_assign_load_p (stmt))
>>> +    return false;
>>> +
>>> +  /* Exclude esp. pointers from the index load itself (but also floats,
>>> +     vectors, etc. - quite a bit handwaving here).  */
>>> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
>>> +    return false;
>>> +
>>> +  /* If we do not have any SSA uses the load cannot be one indexed
>>> +     by an attacker controlled value.  */
>>> +  if (zero_ssa_operands (stmt, SSA_OP_USE))
>>> +    return false;
>>> +
>>> +  return true;
>>> +}
>>> +
>>> +/* Return true whether the index in the use operand OP in STMT is
>>> +   not transfered to STMTs defs.  */
>>> +
>>> +bool
>>> +pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
>>> +{
>>> +  if (gimple_assign_load_p (stmt))
>>> +    return true;
>>> +  if (gassign *ass = dyn_cast <gassign *> (stmt))
>>> +    {
>>> +      enum tree_code code = gimple_assign_rhs_code (ass);
>>> +      switch (code)
>>> +	{
>>> +	case TRUNC_DIV_EXPR:
>>> +	case CEIL_DIV_EXPR:
>>> +	case FLOOR_DIV_EXPR:
>>> +	case ROUND_DIV_EXPR:
>>> +	case EXACT_DIV_EXPR:
>>> +	case RDIV_EXPR:
>>> +	case TRUNC_MOD_EXPR:
>>> +	case CEIL_MOD_EXPR:
>>> +	case FLOOR_MOD_EXPR:
>>> +	case ROUND_MOD_EXPR:
>>> +	case LSHIFT_EXPR:
>>> +	case RSHIFT_EXPR:
>>> +	case LROTATE_EXPR:
>>> +	case RROTATE_EXPR:
>>> +	  /* Division, modulus or shifts by the index do not produce
>>> +	     something useful for the attacker.  */
>>> +	  if (gimple_assign_rhs2 (ass) == op)
>>> +	    return true;
>>> +	  break;
>>> +	default:;
>>> +	  /* Comparisons do not produce an index value.  */
>>> +	  if (TREE_CODE_CLASS (code) == tcc_comparison)
>>> +	    return true;
>>> +	}
>>> +    }
>>> +  /* ???  We could handle builtins here.  */
>>> +  return false;
>>> +}
>>> +
>>> +static GTY(()) tree spectrev1_tls_mask_decl;
>>> +
>>> +/* Main entry for spectrev1 pass.  */
>>> +
>>> +unsigned int
>>> +pass_spectrev1::execute (function *fn)
>>> +{
>>> +  calculate_dominance_info (CDI_DOMINATORS);
>>> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
>>> +
>>> +  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
>>> +  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
>>> +
>>> +  /* We track for each SSA name whether its value (may) depend(s) on
>>> +     the result of an indexed load.
>>> +     A set of operation will kill a value (enough).  */
>>> +  auto_sbitmap value_from_indexed_load (num_ssa_names);
>>> +  bitmap_clear (value_from_indexed_load);
>>> +
>>> +  unsigned orig_num_ssa_names = num_ssa_names;
>>> +  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
>>> +  for (unsigned i = 1; i < num_ssa_names; ++i)
>>> +    bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack);
>>> +
>>> +
>>> +  /* Diagnosis.  */
>>> +
>>> +  /* Function arguments are not indexed loads unless we want to
>>> +     be conservative to a level no longer useful.  */
>>> +
>>> +  for (int i = 0; i < rpo_num; ++i)
>>> +    {
>>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
>>> +
>>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
>>> +	   !gsi_end_p (gpi); gsi_next (&gpi))
>>> +	{
>>> +	  gphi *phi = gpi.phi ();
>>> +	  bool value_from_indexed_load_p = false;
>>> +	  use_operand_p arg_p;
>>> +	  ssa_op_iter it;
>>> +	  FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
>>> +	    {
>>> +	      tree arg = USE_FROM_PTR (arg_p);
>>> +	      if (TREE_CODE (arg) == SSA_NAME
>>> +		  && bitmap_bit_p (value_from_indexed_load,
>>> +				   SSA_NAME_VERSION (arg)))
>>> +		value_from_indexed_load_p = true;
>>> +	    }
>>> +	  if (value_from_indexed_load_p)
>>> +	    bitmap_set_bit (value_from_indexed_load,
>>> +			    SSA_NAME_VERSION (PHI_RESULT (phi)));
>>> +	}
>>> +
>>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>>> +	   !gsi_end_p (gsi); gsi_next (&gsi))
>>> +	{
>>> +	  gimple *stmt = gsi_stmt (gsi);
>>> +	  if (is_gimple_debug (stmt))
>>> +	    continue;
>>> +
>>> +	  if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
>>> +					check_spectrev1_2nd_load,
>>> +					check_spectrev1_2nd_load))
>>> +	    warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
>>> +			stmt);
>>> +
>>> +	  bool value_from_indexed_load_p = false;
>>> +	  if (stmt_is_indexed_load (stmt))
>>> +	    {
>>> +	      /* We are interested in indexes to later loads so ultimatively
>>> +		 register values that all happen to separate SSA defs.
>>> +		 Interesting aggregates will be decomposed by later loads
>>> +		 which we then mark as producing an index.  Simply mark
>>> +		 all SSA defs as coming from an indexed load.  */
>>> +	      /* We are handling a single load in STMT right now.  */
>>> +	      ssa_op_iter it;
>>> +	      tree op;
>>> +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
>>> +	        if (find_value_dependent_guard (stmt, op))
>>> +		  {
>>> +		    /* ???  Somehow record the dependence to point to it in
>>> +		       diagnostics.  */
>>> +		    value_from_indexed_load_p = true;
>>> +		    break;
>>> +		  }
>>> +	    }
>>> +
>>> +	  tree op;
>>> +	  ssa_op_iter it;
>>> +	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
>>> +	    if (bitmap_bit_p (value_from_indexed_load,
>>> +			      SSA_NAME_VERSION (op))
>>> +		&& !stmt_mangles_index (stmt, op))
>>> +	      {
>>> +		value_from_indexed_load_p = true;
>>> +		break;
>>> +	      }
>>> +
>>> +	  if (value_from_indexed_load_p)
>>> +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
>>> +	      /* ???  We could cut off single-bit values from the chain
>>> +	         here or pretain that float loads will be never turned
>>> +		 into integer indices, etc.  */
>>> +	      bitmap_set_bit (value_from_indexed_load,
>>> +			      SSA_NAME_VERSION (op));
>>> +	}
>>> +
>>> +      if (EDGE_COUNT (bb->succs) > 1)
>>> +	{
>>> +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
>>> +	  /* ???  What about switches?  What about badly speculated EH?  */
>>> +	  if (!stmt)
>>> +	    continue;
>>> +	  /* We could constrain conditions here to those more likely
>>> +	     being "bounds checks".  For example common guards for
>>> +	     indirect accesses are NULL pointer checks.
>>> +	     ???  This isn't fully safe, but it drops the number of
>>> +	     spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
>>> +	  if ((gimple_cond_code (stmt) == EQ_EXPR
>>> +	       || gimple_cond_code (stmt) == NE_EXPR)
>>> +	      && integer_zerop (gimple_cond_rhs (stmt))
>>> +	      && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
>>> +	    ;
>>> +	  else
>>> +	    {
>>> +	      ssa_op_iter it;
>>> +	      tree op;
>>> +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
>>> +		mark_influencing_outgoing_flow (bb, op);
>>> +	    }
>>> +	}
>>> +    }
>>> +
>>> +  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
>>> +    bitmap_release (&influencing_outgoing_flow[i]);
>>> +  XDELETEVEC (influencing_outgoing_flow);
>>> +
>>> +
>>> +
>>> +  /* Instrumentation.  */
>>> +  if (!flag_spectrev1)
>>> +    return 0;
>>> +
>>> +  /* Create the default all-ones mask.  When doing IPA instrumentation
>>> +     this should initialize the mask from TLS memory and outgoing edges
>>> +     need to save the mask to TLS memory.  */
>>> +  gimple *new_stmt;
>>> +  if (!spectrev1_tls_mask_decl
>>> +      && flag_spectrev1 >= 3)
>>> +    {
>>> +      /* Use a smaller variable in case sign-extending loads are
>>> +	 available?  */
>>> +      spectrev1_tls_mask_decl
>>> +	  = build_decl (BUILTINS_LOCATION,
>>> +			VAR_DECL, NULL_TREE, ptr_type_node);
>>> +      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
>>> +      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
>>> +      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
>>> +      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
>>> +      DECL_INITIAL (spectrev1_tls_mask_decl)
>>> +	  = build_all_ones_cst (ptr_type_node);
>>> +      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
>>> +      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
>>> +      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
>>> +      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
>>> +      make_decl_one_only (spectrev1_tls_mask_decl,
>>> +			  DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
>>> +      set_decl_tls_model (spectrev1_tls_mask_decl,
>>> +			  decl_default_tls_model (spectrev1_tls_mask_decl));
>>> +    }
>>> +
>>> +  /* We let the SSA rewriter cope with rewriting mask into SSA and
>>> +     inserting PHI nodes.  */
>>> +  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
>>> +  new_stmt = gimple_build_assign (mask,
>>> +				  flag_spectrev1 >= 3
>>> +				  ? spectrev1_tls_mask_decl
>>> +				  : build_all_ones_cst (ptr_type_node));
>>> +  gimple_stmt_iterator gsi
>>> +      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
>>> +  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
>>> +
>>> +  /* We are using the visited flag to track stmts downstream in a BB.  */
>>> +  for (int i = 0; i < rpo_num; ++i)
>>> +    {
>>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
>>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
>>> +	   !gsi_end_p (gpi); gsi_next (&gpi))
>>> +	gimple_set_visited (gpi.phi (), false);
>>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>>> +	   !gsi_end_p (gsi); gsi_next (&gsi))
>>> +	gimple_set_visited (gsi_stmt (gsi), false);
>>> +    }
>>> +
>>> +  for (int i = 0; i < rpo_num; ++i)
>>> +    {
>>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
>>> +
>>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
>>> +	   !gsi_end_p (gpi); gsi_next (&gpi))
>>> +	{
>>> +	  gphi *phi = gpi.phi ();
>>> +	  /* ???  We can merge SAFE state across BB boundaries in
>>> +	     some cases, like when edges are not critical and the
>>> +	     state was made SAFE in the tail of the predecessors
>>> +	     and not invalidated by calls.   */
>>> +	  gimple_set_plf (phi, SV1_SAFE, false);
>>> +	}
>>> +
>>> +      bool instrumented_call_p = false;
>>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
>>> +	   !gsi_end_p (gsi); gsi_next (&gsi))
>>> +	{
>>> +	  gimple *stmt = gsi_stmt (gsi);
>>> +	  gimple_set_visited (stmt, true);
>>> +	  if (is_gimple_debug (stmt))
>>> +	    continue;
>>> +
>>> +	  tree op;
>>> +	  ssa_op_iter it;
>>> +	  bool safe = is_gimple_assign (stmt);
>>> +	  if (safe)
>>> +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
>>> +	      {
>>> +		if (safe
>>> +		    && (SSA_NAME_IS_DEFAULT_DEF (op)
>>> +			|| !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
>>> +			/* Once mask can have changed we cannot further
>>> +			   propagate safe state.  */
>>> +			|| gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
>>> +			/* That includes calls if we have instrumented one
>>> +			   in this block.  */
>>> +			|| (instrumented_call_p
>>> +			    && call_between (SSA_NAME_DEF_STMT (op), stmt))))
>>> +		  {
>>> +		    safe = false;
>>> +		    break;
>>> +		  }
>>> +	      }
>>> +	  gimple_set_plf (stmt, SV1_SAFE, safe);
>>> +
>>> +	  /* Instrument bounded loads.
>>> +	     We instrument non-aggregate loads with non-invariant address.
>>> +	     The idea is to reliably instrument the bounded load while
>>> +	     leaving the canary, being it load or store, aggregate or
>>> +	     non-aggregate, alone.  */
>>> +	  if (gimple_assign_single_p (stmt)
>>> +	      && gimple_vuse (stmt)
>>> +	      && !gimple_vdef (stmt)
>>> +	      && !zero_ssa_operands (stmt, SSA_OP_USE))
>>> +	    {
>>> +	      tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
>>> +					     mask);
>>> +	      gimple_assign_set_rhs1 (stmt, new_mem);
>>> +	      update_stmt (stmt);
>>> +	      /* The value loaded my a masked load is "safe".  */
>>> +	      gimple_set_plf (stmt, SV1_SAFE, true);
>>> +	    }
>>> +
>>> +	  /* Instrument return store to TLS mask.  */
>>> +	  if (flag_spectrev1 >= 3
>>> +	      && gimple_code (stmt) == GIMPLE_RETURN)
>>> +	    {
>>> +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
>>> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
>>> +	    }
>>> +	  /* Instrument calls with store/load to/from TLS mask.
>>> +	     ???  Placement of the stores/loads can be optimized in a LCM
>>> +	     way.  */
>>> +	  else if (flag_spectrev1 >= 3
>>> +		   && is_gimple_call (stmt)
>>> +		   && gimple_vuse (stmt))
>>> +	    {
>>> +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
>>> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
>>> +	      if (!stmt_ends_bb_p (stmt))
>>> +		{
>>> +		  new_stmt = gimple_build_assign (mask,
>>> +						  spectrev1_tls_mask_decl);
>>> +		  gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
>>> +		}
>>> +	      else
>>> +		{
>>> +		  edge_iterator ei;
>>> +		  edge e;
>>> +		  FOR_EACH_EDGE (e, ei, bb->succs)
>>> +		    {
>>> +		      if (e->flags & EDGE_ABNORMAL)
>>> +			continue;
>>> +		      new_stmt = gimple_build_assign (mask,
>>> +						      spectrev1_tls_mask_decl);
>>> +		      gsi_insert_on_edge (e, new_stmt);
>>> +		    }
>>> +		}
>>> +	      instrumented_call_p = true;
>>> +	    }
>>> +	}
>>> +
>>> +      if (EDGE_COUNT (bb->succs) > 1)
>>> +	{
>>> +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
>>> +	  /* ???  What about switches?  What about badly speculated EH?  */
>>> +	  if (!stmt)
>>> +	    continue;
>>> +
>>> +	  /* Instrument conditional branches to track mis-speculation
>>> +	     via a pointer-sized mask.
>>> +	     ???  We could restrict to instrumenting those conditions
>>> +	     that control interesting loads or apply simple heuristics
>>> +	     like not instrumenting FP compares or equality compares
>>> +	     which are unlikely bounds checks.  But we have to instrument
>>> +	     bool != 0 because multiple conditions might have been
>>> +	     combined.  */
>>> +	  edge truee, falsee;
>>> +	  extract_true_false_edges_from_block (bb, &truee, &falsee);
>>> +	  /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
>>> +	  if (flag_spectrev1 >= 2
>>> +	      || !loop_exits_from_bb_p (bb->loop_father, bb))
>>> +	    {
>>> +	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
>>> +
>>> +	      /* Instrument
>>> +	           if (a_1 > b_2)
>>> +		 as
>>> +	           tem_mask_3 = a_1 > b_2 ? -1 : 0;
>>> +		   if (tem_mask_3 != 0)
>>> +		 this will result in a
>>> +		   xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
>>> +		 sequence which is faster in practice than when retaining
>>> +		 the original jump condition.  This is 10 bytes overhead
>>> +		 on x86_64 plus 3 bytes for an and on the true path and
>>> +		 5 bytes for an and and not on the false path.  */
>>> +	      tree tem_mask = make_ssa_name (ptr_type_node);
>>> +	      new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
>>> +					      build2 (gimple_cond_code (stmt),
>>> +						      boolean_type_node,
>>> +						      gimple_cond_lhs (stmt),
>>> +						      gimple_cond_rhs (stmt)),
>>> +					      build_all_ones_cst (ptr_type_node),
>>> +					      build_zero_cst (ptr_type_node));
>>> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
>>> +	      gimple_cond_set_code (stmt, NE_EXPR);
>>> +	      gimple_cond_set_lhs (stmt, tem_mask);
>>> +	      gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
>>> +	      update_stmt (stmt);
>>> +
>>> +	      /* On the false edge
>>> +	           mask = mask & ~tem_mask_3;  */
>>> +	      gimple_seq tems = NULL;
>>> +	      tree tem_mask2 = make_ssa_name (ptr_type_node);
>>> +	      new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
>>> +					      tem_mask);
>>> +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
>>> +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
>>> +					      mask, tem_mask2);
>>> +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
>>> +	      gsi_insert_seq_on_edge (falsee, tems);
>>> +
>>> +	      /* On the true edge
>>> +	           mask = mask & tem_mask_3;  */
>>> +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
>>> +					      mask, tem_mask);
>>> +	      gsi_insert_on_edge (truee, new_stmt);
>>> +	    }
>>> +	}
>>> +    }
>>> +
>>> +  gsi_commit_edge_inserts ();
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +} // anon namespace
>>> +
>>> +gimple_opt_pass *
>>> +make_pass_spectrev1 (gcc::context *ctxt)
>>> +{
>>> +  return new pass_spectrev1 (ctxt);
>>> +}
>>> diff --git a/gcc/params.def b/gcc/params.def
>>> index 6f98fccd291..19f7dbf4dad 100644
>>> --- a/gcc/params.def
>>> +++ b/gcc/params.def
>>> @@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS,
>>>  	 " loops.",
>>>  	 100, 0, 0)
>>>  
>>> +DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES,
>>> +	 "spectre-v1-max-instrument-indices",
>>> +	 "Maximum number of indices to instrument before instrumenting the whole address.",
>>> +	 1, 0, 0)
>>> +
>>>  /*
>>>  
>>>  Local variables:
>>> diff --git a/gcc/passes.def b/gcc/passes.def
>>> index 144df4fa417..2fe0cdcfa7e 100644
>>> --- a/gcc/passes.def
>>> +++ b/gcc/passes.def
>>> @@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
>>>    NEXT_PASS (pass_lower_resx);
>>>    NEXT_PASS (pass_nrv);
>>>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
>>> +  NEXT_PASS (pass_spectrev1);
>>>    NEXT_PASS (pass_warn_function_noreturn);
>>>    NEXT_PASS (pass_gen_hsail);
>>>  
>>> diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
>>> new file mode 100644
>>> index 00000000000..3ac647e72fd
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
>>> @@ -0,0 +1,10 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-Wspectre-v1" } */
>>> +
>>> +unsigned char a[1024];
>>> +int b[256];
>>> +int foo (int i, int bound)
>>> +{
>>> +  if (i < bound)
>>> +    return b[a[i]];  /* { dg-warning "spectrev1" } */
>>> +}
>>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
>>> index 9f9d85fdbc3..f5c164f465f 100644
>>> --- a/gcc/tree-pass.h
>>> +++ b/gcc/tree-pass.h
>>> @@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
>>>  extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
>>> +extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
>>>  
>>>  /* Current optimization pass.  */
>>>  extern opt_pass *current_pass;
>>>
>>
>>
>>
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 14:19               ` Peter Bergner
@ 2018-12-19 15:44                 ` Florian Weimer
  2018-12-19 17:17                   ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Florian Weimer @ 2018-12-19 15:44 UTC (permalink / raw)
  To: Peter Bergner
  Cc: Richard Biener, Richard Earnshaw (lists),
	gcc, Tulio Magno Quites Machado Filho

* Peter Bergner:

> On 12/19/18 7:59 AM, Florian Weimer wrote:
>> * Richard Biener:
>> 
>>> Sure, if we'd ever deploy this in production placing this in the
>>> TCB for glibc targets might be beneifical.  But as said the
>>> current implementation was just an experiment intended to be
>>> maximum portable.  I suppose the dynamic loader takes care
>>> of initializing the TCB data?
>> 
>> Yes, the dynamic linker will initialize it.  If you need 100% reliable
>> initialization with something that is not zero, it's going to be tricky
>> though.  Initial-exec TLS memory has this covered, but in the TCB, we
>> only have zeroed-out reservations today.
>
> We have non-zero initialized TCB entries on powerpc*-linux which are used
> for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin
> functions.  Tulio would know the magic that was used to get them setup.

Yes, there's a special symbol, __parse_hwcap_and_convert_at_platform, to
verify that the dynamic linker sets up the TCB as required.  This way,
binaries which need the feature will fail to run on older loaders.  This
is why I said it's a bit tricky to implement this.  It's even more
complicated if you want to backport this into released glibcs, where we
normally do not accept ABI changes (not even ABI additions).

Thanks,
Florian

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 15:44                 ` Florian Weimer
@ 2018-12-19 17:17                   ` Richard Biener
  2018-12-19 17:25                     ` Richard Earnshaw (lists)
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2018-12-19 17:17 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Peter Bergner, Richard Earnshaw (lists),
	gcc, Tulio Magno Quites Machado Filho

On Wed, 19 Dec 2018, Florian Weimer wrote:

> * Peter Bergner:
> 
> > On 12/19/18 7:59 AM, Florian Weimer wrote:
> >> * Richard Biener:
> >> 
> >>> Sure, if we'd ever deploy this in production placing this in the
> >>> TCB for glibc targets might be beneifical.  But as said the
> >>> current implementation was just an experiment intended to be
> >>> maximum portable.  I suppose the dynamic loader takes care
> >>> of initializing the TCB data?
> >> 
> >> Yes, the dynamic linker will initialize it.  If you need 100% reliable
> >> initialization with something that is not zero, it's going to be tricky
> >> though.  Initial-exec TLS memory has this covered, but in the TCB, we
> >> only have zeroed-out reservations today.
> >
> > We have non-zero initialized TCB entries on powerpc*-linux which are used
> > for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin
> > functions.  Tulio would know the magic that was used to get them setup.
> 
> Yes, there's a special symbol, __parse_hwcap_and_convert_at_platform, to
> verify that the dynamic linker sets up the TCB as required.  This way,
> binaries which need the feature will fail to run on older loaders.  This
> is why I said it's a bit tricky to implement this.  It's even more
> complicated if you want to backport this into released glibcs, where we
> normally do not accept ABI changes (not even ABI additions).

It's easy to change the mitigation scheme to use a zero for the
non-speculated path, you'd simply replace ands with zero by
ors with -1.  For address parts that gets you some possible overflows
you do not want though.

Richard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 15:42     ` Richard Earnshaw (lists)
@ 2018-12-19 17:20       ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2018-12-19 17:20 UTC (permalink / raw)
  To: Richard Earnshaw (lists); +Cc: gcc

On Wed, 19 Dec 2018, Richard Earnshaw (lists) wrote:

> On 19/12/2018 11:25, Richard Biener wrote:
> > On Tue, 18 Dec 2018, Richard Earnshaw (lists) wrote:
> > 
> >> On 18/12/2018 15:36, Richard Biener wrote:
> >>>
> >>> Hi,
> >>>
> >>> in the past weeks I've been looking into prototyping both spectre V1 
> >>> (speculative array bound bypass) diagnostics and mitigation in an
> >>> architecture independent manner to assess feasability and some kind
> >>> of upper bound on the performance impact one can expect.
> >>> https://lists.llvm.org/pipermail/llvm-dev/2018-March/122085.html is
> >>> an interesting read in this context as well.
> >>
> >> Interesting, thanks for posting this.
> >>
> >>>
> >>> For simplicity I have implemented mitigation on GIMPLE right before
> >>> RTL expansion and have chosen TLS to do mitigation across function
> >>> boundaries.  Diagnostics sit in the same place but both are not in
> >>> any way dependent on each other.
> >>
> >> We considered using TLS for propagating the state across call-boundaries
> >> on AArch64, but rejected it for several reasons.
> >>
> >> - It's quite expensive to have to set up the TLS state in every function;
> >> - It requires some global code to initialize the state variable - that's
> >> kind of ABI;
> > 
> > The cost is probably target dependent - on x86 it's simply a $fs based
> > load/store.  For initialization a static initializer seemed to work
> > for me (but honestly I didn't do any testing besides running the
> > testsuite for correctness - so at least the mask wasn't zero initialized).
> > Note the LLVM people use an inverted mask and cancel values by
> > OR-ing -1 instead of AND-ing 0.  At least default zero-initialization
> > should be possible with TLS vars.
> > 
> > That said, my choice of TLS was to make this trivially work across
> > targets - if a target can do better then it should.  And of course
> > the target may not have any TLS support besides emultls which would
> > be prohibitly expensive.
> > 
> >> - It also seems likely to be vulnerable to Spectre variant 4 - unless
> >> the CPU can always correctly store-to-load forward the speculation
> >> state, then you have the situation where the load may see an old value
> >> of the state - and that's almost certain to say "we're not speculating".
> >>
> >> The last one is really the killer here.
> > 
> > Hmm, as far as I understood v4 only happens when store-forwarding
> > doesn't work.  And I hope it doesn't fail "randomly" but works
> > reliable when all accesses to the memory are aligned and have
> > the same size as is the case with these compiler-generated TLS
> > accesses.  But yes, if that's not guaranteed then using memory
> > doesn't work at all.  
> 
> The problem is that you can't prove this through realistic testing.
> Architecturally, the result has to come out the same in the end in that
> if the load does bypass the store, eventually the hardware has to replay
> the instruction with the correct data and cancel any operations that
> were dependent on the earlier execution.  Only side-channel data will be
> left after that.
> 
> > Not sure what else target independent there
> > is though that doesn't break the ABI like simply adding another
> > parameter.  And even adding a parameter might not work in case
> > there's only stack passing and V4 happens on the stack accesses...
> 
> Yep, exactly.
> 
> > 
> >>>
> >>> The mitigation strategy chosen is that of tracking speculation
> >>> state via a mask that can be used to zero parts of the addresses
> >>> that leak the actual data.  That's similar to what aarch64 does
> >>> with -mtrack-speculation (but oddly there's no mitigation there).
> >>
> >> We rely on the user inserting the new builtin, which we can more
> >> effectively optimize if the compiler is generating speculation state
> >> tracking data.  That doesn't preclude a full solution at a later date,
> >> but it looked like it was likely overkill for protecting every load and
> >> safely pruning the loads is not an easy problem to solve.  Of course,
> >> the builtin does require the programmer to do some work to identify
> >> which memory accesses might be vulnerable.
> > 
> > My main question was how in earth the -mtrack-speculation overhead
> > is reasonable for the very few expected explicit builtin uses...
> 
> Ultimately that will depend on what the user wants and the level of
> protection needed.  The builtin gives the choice: get a hard barrier if
> tracking has not been enabled, with a very high hit at the point of
> execution; or take a much lower hit at that point if tracking has been
> enabled.  That's a trade-off between how often you hit the barrier vs
> how much you hit the tracking events to no benefit.
> 
> Your code, however, doesn't work at present.  This example shows that
> the mitigation code is just optimized away by the rtl passes, at least
> for -fspectre-v1=2.
> 
> int f (int a, int b, int c, char *d)
> {
>   if (a > 10)
>     return 0;
> 
>   if (b > 64)
>     return 0;
> 
>   if (c > 96)
>     return 0;
> 
>   return d[a] + d[b] + d[c];
> }
> 
> It works ok at level 3 because then the compiler can't prove the logical
> truth of the speculation variable on the path from TLS memory and that's
> sufficient to defeat the optimizers.

That was expected - now I didn't find a simple example, thanks for 
providing one ;)  The above is "mis-"optimized by combine which
seems to have code to track nonzero bits across conditionals
(likely, didn't find the part of the code yet).

Mitigation against this is moving the whole thing to RTL or, as you
say, hide the -1 initialization from it via some volatile stuff.

Richard.

> R.
> 
> > 
> > Richard.
> > 
> >> R.
> >>
> >>
> >>>
> >>> I've optimized things to the point that is reasonable when working
> >>> target independent on GIMPLE but I've only looked at x86 assembly
> >>> and performance.  I expect any "final" mitigation if we choose to
> >>> implement and integrate such would be after RTL expansion since
> >>> RTL expansion can end up introducing quite some control flow whose
> >>> speculation state is not properly tracked by the prototype.
> >>>
> >>> I'm cut&pasting single-runs of SPEC INT 2006/2017 here, the runs
> >>> were done with -O2 [-fspectre-v1={2,3}] where =2 is function-local
> >>> mitigation and =3 does mitigation global with passing the state
> >>> via TLS memory.
> >>>
> >>> The following was measured on a Haswell desktop CPU:
> >>>
> >>> 	-O2 vs. -O2 -fspectre-v1=2
> >>>
> >>>                                   Estimated                       Estimated
> >>>                 Base     Base       Base        Peak     Peak       Peak
> >>> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> >>> -------------- ------  ---------  ---------    ------  ---------  ---------
> >>> 400.perlbench    9770        245       39.8 *    9770        452       21.6 *  184%
> >>> 401.bzip2        9650        378       25.5 *    9650        726       13.3 *  192%
> >>> 403.gcc          8050        236       34.2 *    8050        352       22.8 *  149%
> >>> 429.mcf          9120        223       40.9 *    9120        656       13.9 *  294%
> >>> 445.gobmk       10490        400       26.2 *   10490        666       15.8 *  167%
> >>> 456.hmmer        9330        388       24.1 *    9330        536       17.4 *  138%
> >>> 458.sjeng       12100        437       27.7 *   12100        661       18.3 *  151%
> >>> 462.libquantum  20720        300       69.1 *   20720        384       53.9 *  128%
> >>> 464.h264ref     22130        451       49.1 *   22130        586       37.8 *  130%
> >>> 471.omnetpp      6250        291       21.5 *    6250        398       15.7 *  137%
> >>> 473.astar        7020        334       21.0 *    7020        522       13.5 *  156%
> >>> 483.xalancbmk    6900        182       37.9 *    6900        306       22.6 *  168%
> >>>  Est. SPECint_base2006                   --
> >>>  Est. SPECint2006                                                        --
> >>>
> >>>    -O2 -fspectre-v1=3
> >>>
> >>>                                   Estimated                       Estimated
> >>>                 Base     Base       Base        Peak     Peak       Peak
> >>> Benchmarks      Ref.   Run Time     Ratio       Ref.   Run Time     Ratio
> >>> -------------- ------  ---------  ---------    ------  ---------  ---------
> >>> 400.perlbench                                    9770        497       19.6 *  203%
> >>> 401.bzip2                                        9650        772       12.5 *  204%
> >>> 403.gcc                                          8050        427       18.9 *  181%
> >>> 429.mcf                                          9120        696       13.1 *  312%
> >>> 445.gobmk                                       10490        726       14.4 *  181%
> >>> 456.hmmer                                        9330        537       17.4 *  138%
> >>> 458.sjeng                                       12100        721       16.8 *  165%
> >>> 462.libquantum                                  20720        446       46.4 *  149%
> >>> 464.h264ref                                     22130        613       36.1 *  136%
> >>> 471.omnetpp                                      6250        471       13.3 *  162%
> >>> 473.astar                                        7020        579       12.1 *  173%
> >>> 483.xalancbmk                                    6900        350       19.7 *  192%
> >>>  Est. SPECint(R)_base2006           Not Run
> >>>  Est. SPECint2006                                                        --
> >>>
> >>>
> >>> While the following was measured on a Zen Epyc server:
> >>>
> >>> -O2 vs -O2 -fspectre-v1=2
> >>>
> >>>                        Estimated                       Estimated
> >>>                  Base     Base        Base        Peak     Peak        Peak
> >>> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
> >>> --------------- -------  ---------  ---------    -------  ---------  ---------
> >>> 500.perlbench_r       1        499       3.19  *       1        621       2.56  * 124%
> >>> 502.gcc_r             1        286       4.95  *       1        392       3.61  * 137%
> >>> 505.mcf_r             1        331       4.88  *       1        456       3.55  * 138%
> >>> 520.omnetpp_r         1        454       2.89  *       1        563       2.33  * 124%
> >>> 523.xalancbmk_r       1        328       3.22  *       1        569       1.86  * 173%
> >>> 525.x264_r            1        518       3.38  *       1        776       2.26  * 150%
> >>> 531.deepsjeng_r       1        365       3.14  *       1        448       2.56  * 123%
> >>> 541.leela_r           1        598       2.77  *       1        729       2.27  * 122%
> >>> 548.exchange2_r       1        460       5.69  *       1        756       3.46  * 164%
> >>> 557.xz_r              1        403       2.68  *       1        586       1.84  * 145%
> >>>  Est. SPECrate2017_int_base              3.55
> >>>  Est. SPECrate2017_int_peak                                               2.56    72%
> >>>
> >>> -O2 -fspectre-v2=3
> >>>
> >>>                        Estimated                       Estimated
> >>>                  Base     Base        Base        Peak     Peak        Peak
> >>> Benchmarks       Copies  Run Time     Rate        Copies  Run Time     Rate
> >>> --------------- -------  ---------  ---------    -------  ---------  ---------
> >>> 500.perlbench_r                               NR       1        700       2.27  * 140%
> >>> 502.gcc_r                                     NR       1        485       2.92  * 170%
> >>> 505.mcf_r                                     NR       1        596       2.71  * 180%
> >>> 520.omnetpp_r                                 NR       1        604       2.17  * 133%
> >>> 523.xalancbmk_r                               NR       1        643       1.64  * 196%
> >>> 525.x264_r                                    NR       1        797       2.20  * 154%
> >>> 531.deepsjeng_r                               NR       1        542       2.12  * 149%
> >>> 541.leela_r                                   NR       1        872       1.90  * 146%
> >>> 548.exchange2_r                               NR       1        761       3.44  * 165%
> >>> 557.xz_r                                      NR       1        595       1.81  * 148%
> >>>  Est. SPECrate2017_int_base           Not Run
> >>>  Est. SPECrate2017_int_peak                                               2.26    64%
> >>>
> >>>
> >>>
> >>> you can see, even thoug we're comparing apples and oranges, that the 
> >>> performance impact is quite dependent on the microarchitecture.
> >>>
> >>> Similarly interesting as performance is the effect on text size which is
> >>> surprisingly high (_best_ case is 13 bytes per conditional branch plus 3
> >>> bytes per instrumented memory).
> >>>
> >>> CPU2016:
> >>>    BASE  -O2
> >>>    text	   data	    bss	    dec	    hex	filename
> >>> 1117726	  20928	  12704	1151358	 11917e	400.perlbench
> >>>   56568	   3800	   4416	  64784	   fd10	401.bzip2
> >>> 3419568	   7912	 751520	4179000	 3fc438	403.gcc
> >>>   12212	    712	  11984	  24908	   614c	429.mcf
> >>> 1460694	2081772	2330096	5872562	 599bb2	445.gobmk
> >>>  284929	   5956	  82040	 372925	  5b0bd	456.hmmer
> >>>  130782	   2152	2576896	2709830	 295946	458.sjeng
> >>>   41915	    764	     96	  42775	   a717	462.libquantum
> >>>  505452	  11220	 372320	 888992	  d90a0	464.h264ref
> >>>  638188	   9584	  14664	 662436	  a1ba4	471.omnetpp
> >>>   38859	    900	   5216	  44975	   afaf	473.astar
> >>> 4033878	 140248	  12168	4186294	 3fe0b6	483.xalancbmk
> >>>    PEAK -O2 -fspectre-v1=2
> >>>    text	   data	    bss	    dec	    hex	filename
> >>> 1508032	  20928	  12704	1541664	 178620	400.perlbench	135%
> >>>   76098	   3800	   4416	  84314	  1495a	401.bzip2	135%
> >>> 4483530	   7912	 751520	5242962	 500052	403.gcc		131%
> >>>   16006	    712	  11984	  28702	   701e	429.mcf		131%
> >>> 1647384	2081772	2330096	6059252	 5c74f4	445.gobmk	112%
> >>>  377259	   5956	  82040	 465255	  71967	456.hmmer	132%
> >>>  164672	   2152	2576896	2743720	 29dda8	458.sjeng	126%
> >>>   47901	    764	     96	  48761	   be79	462.libquantum	114%
> >>>  649854	  11220	 372320	1033394	  fc4b2	464.h264ref	129%
> >>>  706908	   9584	  14664	 731156	  b2814	471.omnetpp	111%
> >>>   48493	    900	   5216	  54609	   d551	473.astar	125%
> >>> 4862056	 140248	  12168	5014472	 4c83c8	483.xalancbmk	121%
> >>>    PEAK -O2 -fspectre-v1=3
> >>>    text	   data	    bss	    dec	    hex	filename
> >>> 1742008	  20936	  12704	1775648	 1b1820	400.perlbench	156%
> >>>   83338	   3808	   4416	  91562	  165aa	401.bzip2	147%
> >>> 5219850	   7920	 751520	5979290	 5b3c9a	403.gcc		153%
> >>>   17422	    720	  11984	  30126	   75ae	429.mcf		143%
> >>> 1801688	2081780	2330096	6213564	 5ecfbc	445.gobmk	123%
> >>>  431827	   5964	  82040	 519831	  7ee97	456.hmmer	152%
> >>>  182200	   2160	2576896	2761256	 2a2228	458.sjeng	139%
> >>>   53773	    772	     96	  54641	   d571	462.libquantum	128%
> >>>  691798	  11228	 372320	1075346	 106892	464.h264ref	137%
> >>>  976692	   9592	  14664	1000948	  f45f4	471.omnetpp	153%
> >>>   54525	    908	   5216	  60649	   ece9	473.astar	140%
> >>> 5808306	 140256	  12168	5960730	 5af41a	483.xalancbmk	144%
> >>>
> >>> CPU2017:
> >>>    BASE -O2 -g
> >>>    text    data     bss     dec     hex filename
> >>> 2209713    8576    9080 2227369  21fca9 500.perlbench_r
> >>> 9295702   37432 1150664 10483798 9ff856 502.gcc_r
> >>>   21795     712     744   23251    5ad3 505.mcf_r
> >>> 2067560    8984   46888 2123432  2066a8 520.omnetpp_r
> >>> 5763577  142584   20040 5926201  5a6d39 523.xalancbmk_r
> >>>  508402    6102   29592  544096   84d60 525.x264_r
> >>>   84222     784 12138360 12223366 ba8386 531.deepsjeng_r
> >>>  223480    8544   30072  262096   3ffd0 541.leela_r
> >>>   70554     864    6384   77802   12fea 548.exchange2_r
> >>>  180640     884   17704  199228   30a3c 557.xz_r
> >>>    PEAK -fspectre-v2=2
> >>>    text    data     bss     dec     hex filename
> >>> 2991161    8576    9080 3008817  2de931 500.perlbench_r	135%
> >>> 12244886  37432 1150664 13432982 ccf896 502.gcc_r	132%
> >>>   28475     712     744   29931    74eb 505.mcf_r	131%
> >>> 2397026    8984   46888 2452898  256da2 520.omnetpp_r	116%
> >>> 6846853  142584   20040 7009477  6af4c5 523.xalancbmk_r	119%
> >>>  645730    6102   29592  681424   a65d0 525.x264_r	127%
> >>>  111166     784 12138360 12250310 baecc6 531.deepsjeng_r 132%
> >>>  260835    8544   30072  299451   491bb 541.leela_r     117%
> >>>   96874     864    6384  104122   196ba 548.exchange2_r	137%
> >>>  215288     884   17704  233876   39194 557.xz_r	119%
> >>>    PEAK -fspectre-v2=3
> >>>    text    data     bss     dec     hex filename
> >>> 3365945    8584    9080 3383609  33a139 500.perlbench_r	152%
> >>> 14790638  37440 1150664 15978742 f3d0f6 502.gcc_r	159%
> >>>   31419     720     744   32883    8073 505.mcf_r	144%
> >>> 2867893    8992   46888 2923773  2c9cfd 520.omnetpp_r	139%
> >>> 8183689  142592   20040 8346321  7f5ad1 523.xalancbmk_r	142%
> >>>  697434    6110   29592  733136   b2fd0 525.x264_r	137%
> >>>  123638     792 12138360 12262790 bb1d86 531.deepsjeng_r 147%
> >>>  315347    8552   30072  353971   566b3 541.leela_r	141%
> >>>   98578     872    6384  105834   19d6a 548.exchange2_r	140%
> >>>  239144     892   17704  257740   3eecc 557.xz_r	133%
> >>>
> >>>
> >>> The patch relies heavily on RTL optimizations for DCE purposes.  At the
> >>> same time we rely on RTL not statically computing the mask (RTL has no
> >>> conditional constant propagation).  Full instrumentation of the classic
> >>> Spectre V1 testcase
> >>>
> >>> char a[1024];
> >>> int b[1024];
> >>> int foo (int i, int bound)
> >>> {
> >>>   if (i < bound)
> >>>     return b[a[i]];
> >>> }
> >>>
> >>> is the following:
> >>>
> >>> foo:
> >>> .LFB0:  
> >>>         .cfi_startproc
> >>>         xorl    %eax, %eax
> >>>         cmpl    %esi, %edi
> >>>         setge   %al
> >>>         subq    $1, %rax
> >>>         jne     .L4
> >>>         ret
> >>>         .p2align 4,,10
> >>>         .p2align 3
> >>> .L4:
> >>>         andl    %eax, %edi
> >>>         movslq  %edi, %rdi
> >>>         movsbq  a(%rdi), %rax
> >>>         movl    b(,%rax,4), %eax
> >>>         ret
> >>>
> >>> so the generated GIMPLE was "tuned" for reasonable x86 assembler outcome.
> >>>
> >>> Patch below for reference (and your own testing in case you are curious).
> >>> I do not plan to pursue this further at this point.
> >>>
> >>> Richard.
> >>>
> >>> From 01e4a5a43e266065d32489daa50de0cf2425d5f5 Mon Sep 17 00:00:00 2001
> >>> From: Richard Guenther <rguenther@suse.de>
> >>> Date: Wed, 5 Dec 2018 13:17:02 +0100
> >>> Subject: [PATCH] warn-spectrev1
> >>>
> >>>
> >>> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> >>> index 7960cace16a..64d472d7fa0 100644
> >>> --- a/gcc/Makefile.in
> >>> +++ b/gcc/Makefile.in
> >>> @@ -1334,6 +1334,7 @@ OBJS = \
> >>>  	gimple-ssa-sprintf.o \
> >>>  	gimple-ssa-warn-alloca.o \
> >>>  	gimple-ssa-warn-restrict.o \
> >>> +	gimple-ssa-spectrev1.o \
> >>>  	gimple-streamer-in.o \
> >>>  	gimple-streamer-out.o \
> >>>  	gimple-walk.o \
> >>> diff --git a/gcc/common.opt b/gcc/common.opt
> >>> index 45d7f6189e5..1ae7fcfe177 100644
> >>> --- a/gcc/common.opt
> >>> +++ b/gcc/common.opt
> >>> @@ -702,6 +702,10 @@ Warn when one local variable shadows another local variable or parameter of comp
> >>>  Wshadow-compatible-local
> >>>  Common Warning Undocumented Alias(Wshadow=compatible-local)
> >>>  
> >>> +Wspectre-v1
> >>> +Common Var(warn_spectrev1) Warning
> >>> +Warn about code susceptible to spectre v1 style attacks.
> >>> +
> >>>  Wstack-protector
> >>>  Common Var(warn_stack_protect) Warning
> >>>  Warn when not issuing stack smashing protection for some reason.
> >>> @@ -2406,6 +2410,14 @@ fsingle-precision-constant
> >>>  Common Report Var(flag_single_precision_constant) Optimization
> >>>  Convert floating point constants to single precision constants.
> >>>  
> >>> +fspectre-v1
> >>> +Common Alias(fspectre-v1=, 2, 0)
> >>> +Insert code to mitigate spectre v1 style attacks.
> >>> +
> >>> +fspectre-v1=
> >>> +Common Report RejectNegative Joined UInteger IntegerRange(0, 3) Var(flag_spectrev1) Optimization
> >>> +Insert code to mitigate spectre v1 style attacks.
> >>> +
> >>>  fsplit-ivs-in-unroller
> >>>  Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
> >>>  Split lifetimes of induction variables when loops are unrolled.
> >>> diff --git a/gcc/gimple-ssa-spectrev1.cc b/gcc/gimple-ssa-spectrev1.cc
> >>> new file mode 100644
> >>> index 00000000000..c2a5dc95324
> >>> --- /dev/null
> >>> +++ b/gcc/gimple-ssa-spectrev1.cc
> >>> @@ -0,0 +1,824 @@
> >>> +/* Loop interchange.
> >>> +   Copyright (C) 2017-2018 Free Software Foundation, Inc.
> >>> +   Contributed by ARM Ltd.
> >>> +
> >>> +This file is part of GCC.
> >>> +
> >>> +GCC is free software; you can redistribute it and/or modify it
> >>> +under the terms of the GNU General Public License as published by the
> >>> +Free Software Foundation; either version 3, or (at your option) any
> >>> +later version.
> >>> +
> >>> +GCC is distributed in the hope that it will be useful, but WITHOUT
> >>> +ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> >>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> >>> +for more details.
> >>> +
> >>> +You should have received a copy of the GNU General Public License
> >>> +along with GCC; see the file COPYING3.  If not see
> >>> +<http://www.gnu.org/licenses/>.  */
> >>> +
> >>> +#include "config.h"
> >>> +#include "system.h"
> >>> +#include "coretypes.h"
> >>> +#include "backend.h"
> >>> +#include "is-a.h"
> >>> +#include "tree.h"
> >>> +#include "gimple.h"
> >>> +#include "tree-pass.h"
> >>> +#include "ssa.h"
> >>> +#include "gimple-pretty-print.h"
> >>> +#include "gimple-iterator.h"
> >>> +#include "params.h"
> >>> +#include "tree-ssa.h"
> >>> +#include "cfganal.h"
> >>> +#include "gimple-walk.h"
> >>> +#include "tree-ssa-loop.h"
> >>> +#include "tree-dfa.h"
> >>> +#include "tree-cfg.h"
> >>> +#include "fold-const.h"
> >>> +#include "builtins.h"
> >>> +#include "alias.h"
> >>> +#include "cfgloop.h"
> >>> +#include "varasm.h"
> >>> +#include "cgraph.h"
> >>> +#include "gimple-fold.h"
> >>> +#include "diagnostic.h"
> >>> +
> >>> +/* The Spectre V1 situation is as follows:
> >>> +
> >>> +      if (attacker_controlled_idx < bound)  // speculated as true but is false
> >>> +        {
> >>> +	  // out-of-bound access, returns value interesting to attacker
> >>> +	  val = mem[attacker_controlled_idx];
> >>> +	  // access that causes a cache-line to be brought in - canary
> >>> +	  ... = attacker_controlled_mem[val];
> >>> +	}
> >>> +
> >>> +   The last load provides the side-channel.  The pattern can be split
> >>> +   into multiple functions or translation units.  Conservatively we'd
> >>> +   have to warn about
> >>> +
> >>> +      int foo (int *a) {  return *a; }
> >>> +
> >>> +   thus any indirect (or indexed) memory access.  That's obvioulsy
> >>> +   not useful.
> >>> +
> >>> +   The next level would be to warn only when we see load of val as
> >>> +   well.  That then misses cases like
> >>> +
> >>> +      int foo (int *a, int *b)
> >>> +      {
> >>> +        int idx = load_it (a);
> >>> +	return load_it (&b[idx]);
> >>> +      }
> >>> +
> >>> +   Still we'd warn about cases like
> >>> +
> >>> +      struct Foo { int *a; };
> >>> +      int foo (struct Foo *a) { return *a->a; }
> >>> +
> >>> +   though dereferencing VAL isn't really an interesting case.  It's
> >>> +   hard to exclude this conservatively so the obvious solution is
> >>> +   to restrict the kind of loads that produce val, for example based
> >>> +   on its type or its number of bits.  It's tempting to do this at
> >>> +   the point of the load producing val but in the end what matters
> >>> +   is the number of bits that reach the second loads [as index] given
> >>> +   there are practical limits on the size of the canary.  For this
> >>> +   we have to consider
> >>> +
> >>> +      int foo (struct Foo *a, int *b)
> >>> +      {
> >>> +        int *c = a->a;
> >>> +	int idx = *b;
> >>> +	return *(c + idx);
> >>> +      }
> >>> +
> >>> +   where idx has too many bits to be an interesting attack vector(?).
> >>> + */
> >>> +
> >>> +/* The pass does two things, first it performs data flow analysis
> >>> +   to be able to warn about the second load.  This is controlled
> >>> +   via -Wspectre-v1.
> >>> +
> >>> +   Second it instruments control flow in the program to track a
> >>> +   mask which is all-ones but all-zeroes if the CPU speculated
> >>> +   a branch in the wrong direction.  This mask is then used to
> >>> +   mask the address[-part(s)] of loads with non-invariant addresses,
> >>> +   effectively mitigating the attack.  This is controlled by
> >>> +   -fpectre-v1[=N] where N is default 2 and
> >>> +     1  optimistically omit some instrumentations (currently
> >>> +        backedge control flow instructions do not update the
> >>> +	speculation mask)
> >>> +     2  instrument conservatively using a function-local speculation
> >>> +        mask
> >>> +     3  instrument conservatively using a global (TLS) speculation
> >>> +        mask.  This adds TLS loads/stores of the speculation mask
> >>> +	at function boundaries and before and after calls.
> >>> + */
> >>> +
> >>> +/* We annotate statements whose defs cannot be used to leaking data
> >>> +   speculatively via loads with SV1_SAFE.  This is used to optimize
> >>> +   masking of indices where masked indices (and derived by constant
> >>> +   ones) are not masked again.  Note this works only up to the points
> >>> +   that possibly change the speculation mask value.  */
> >>> +#define SV1_SAFE GF_PLF_1
> >>> +
> >>> +namespace {
> >>> +
> >>> +const pass_data pass_data_spectrev1 =
> >>> +{
> >>> +  GIMPLE_PASS, /* type */
> >>> +  "spectrev1", /* name */
> >>> +  OPTGROUP_NONE, /* optinfo_flags */
> >>> +  TV_NONE, /* tv_id */
> >>> +  PROP_cfg|PROP_ssa, /* properties_required */
> >>> +  0, /* properties_provided */
> >>> +  0, /* properties_destroyed */
> >>> +  0, /* todo_flags_start */
> >>> +  TODO_update_ssa, /* todo_flags_finish */
> >>> +};
> >>> +
> >>> +class pass_spectrev1 : public gimple_opt_pass
> >>> +{
> >>> +public:
> >>> +  pass_spectrev1 (gcc::context *ctxt)
> >>> +    : gimple_opt_pass (pass_data_spectrev1, ctxt)
> >>> +  {}
> >>> +
> >>> +  /* opt_pass methods: */
> >>> +  opt_pass * clone () { return new pass_spectrev1 (m_ctxt); }
> >>> +  virtual bool gate (function *) { return warn_spectrev1 || flag_spectrev1; }
> >>> +  virtual unsigned int execute (function *);
> >>> +
> >>> +  static bool stmt_is_indexed_load (gimple *);
> >>> +  static bool stmt_mangles_index (gimple *, tree);
> >>> +  static bool find_value_dependent_guard (gimple *, tree);
> >>> +  static void mark_influencing_outgoing_flow (basic_block, tree);
> >>> +  static tree instrument_mem (gimple_stmt_iterator *, tree, tree);
> >>> +}; // class pass_spectrev1
> >>> +
> >>> +bitmap_head *influencing_outgoing_flow;
> >>> +
> >>> +static bool
> >>> +call_between (gimple *first, gimple *second)
> >>> +{
> >>> +  gcc_assert (gimple_bb (first) == gimple_bb (second));
> >>> +  /* ???  This is inefficient.  Maybe we can use gimple_uid to assign
> >>> +     unique IDs to stmts belonging to groups with the same speculation
> >>> +     mask state.  */
> >>> +  for (gimple_stmt_iterator gsi = gsi_for_stmt (first);
> >>> +       gsi_stmt (gsi) != second; gsi_next (&gsi))
> >>> +    if (is_gimple_call (gsi_stmt (gsi)))
> >>> +      return true;
> >>> +  return false;
> >>> +}
> >>> +
> >>> +basic_block ctx_bb;
> >>> +gimple *ctx_stmt;
> >>> +static bool
> >>> +gather_indexes (tree, tree *idx, void *data)
> >>> +{
> >>> +  vec<tree *> *indexes = (vec<tree *> *)data;
> >>> +  if (TREE_CODE (*idx) != SSA_NAME)
> >>> +    return true;
> >>> +  if (!SSA_NAME_IS_DEFAULT_DEF (*idx)
> >>> +      && gimple_bb (SSA_NAME_DEF_STMT (*idx)) == ctx_bb
> >>> +      && gimple_plf (SSA_NAME_DEF_STMT (*idx), SV1_SAFE)
> >>> +      && (flag_spectrev1 < 3
> >>> +	  || !call_between (SSA_NAME_DEF_STMT (*idx), ctx_stmt)))
> >>> +    return true;
> >>> +  if (indexes->is_empty ())
> >>> +    indexes->safe_push (idx);
> >>> +  else if (*(*indexes)[0] == *idx)
> >>> +    indexes->safe_push (idx);
> >>> +  else
> >>> +    return false;
> >>> +  return true;
> >>> +}
> >>> +
> >>> +tree
> >>> +pass_spectrev1::instrument_mem (gimple_stmt_iterator *gsi, tree mem, tree mask)
> >>> +{
> >>> +  /* First try to see if we can find a single index we can zero which
> >>> +     has the chance of repeating in other loads and also avoids separate
> >>> +     LEA and memory references decreasing code size and AGU occupancy.  */
> >>> +  auto_vec<tree *, 8> indexes;
> >>> +  ctx_bb = gsi_bb (*gsi);
> >>> +  ctx_stmt = gsi_stmt (*gsi);
> >>> +  if (PARAM_VALUE (PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES) > 0
> >>> +      && for_each_index (&mem, gather_indexes, (void *)&indexes))
> >>> +    {
> >>> +      /* All indices are safe.  */
> >>> +      if (indexes.is_empty ())
> >>> +	return mem;
> >>> +      if (TYPE_PRECISION (TREE_TYPE (*indexes[0]))
> >>> +	  <= TYPE_PRECISION (TREE_TYPE (mask)))
> >>> +	{
> >>> +	  tree idx = *indexes[0];
> >>> +	  gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (idx))
> >>> +		      || POINTER_TYPE_P (TREE_TYPE (idx)));
> >>> +	  /* Instead of instrumenting IDX directly we could look at
> >>> +	     definitions with a single SSA use and instrument that
> >>> +	     instead.  But we have to do some work to make SV1_SAFE
> >>> +	     propagation updated then - this would really ask to first
> >>> +	     gather all indexes of all refs we want to instrument and
> >>> +	     compute some optimal set of instrumentations.  */
> >>> +	  gimple_seq seq = NULL;
> >>> +	  tree idx_mask = gimple_convert (&seq, TREE_TYPE (idx), mask);
> >>> +	  tree masked_idx = gimple_build (&seq, BIT_AND_EXPR,
> >>> +					  TREE_TYPE (idx), idx, idx_mask);
> >>> +	  /* Mark the instrumentation sequence as visited.  */
> >>> +	  for (gimple_stmt_iterator si = gsi_start (seq);
> >>> +	       !gsi_end_p (si); gsi_next (&si))
> >>> +	    gimple_set_visited (gsi_stmt (si), true);
> >>> +	  gsi_insert_seq_before (gsi, seq, GSI_SAME_STMT);
> >>> +	  gimple_set_plf (SSA_NAME_DEF_STMT (masked_idx), SV1_SAFE, true);
> >>> +	  /* Replace downstream users in the BB which reduces register pressure
> >>> +	     and allows SV1_SAFE propagation to work (which stops at call/BB
> >>> +	     boundaries though).
> >>> +	     ???  This is really reg-pressure vs. dependence chains so not
> >>> +	     a generally easy thing.  Making the following propagate into
> >>> +	     all uses dominated by the insert slows down 429.mcf even more.
> >>> +	     ???  We can actually track SV1_SAFE across PHIs but then we
> >>> +	     have to propagate into PHIs here.  */
> >>> +	  gimple *use_stmt;
> >>> +	  use_operand_p use_p;
> >>> +	  imm_use_iterator iter;
> >>> +	  FOR_EACH_IMM_USE_STMT (use_stmt, iter, idx)
> >>> +	    if (gimple_bb (use_stmt) == gsi_bb (*gsi)
> >>> +		&& gimple_code (use_stmt) != GIMPLE_PHI
> >>> +		&& !gimple_visited_p (use_stmt))
> >>> +	      {
> >>> +		FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> >>> +		  SET_USE (use_p, masked_idx);
> >>> +		update_stmt (use_stmt);
> >>> +	      }
> >>> +	  /* Modify MEM in place...  (our stmt is already marked visited).  */
> >>> +	  for (unsigned i = 0; i < indexes.length (); ++i)
> >>> +	    *indexes[i] = masked_idx;
> >>> +	  return mem;
> >>> +	}
> >>> +    }
> >>> +
> >>> +  /* ???  Can we handle TYPE_REVERSE_STORAGE_ORDER at all?  Need to
> >>> +     handle BIT_FIELD_REFs.  */
> >>> +
> >>> +  /* Strip a bitfield reference to re-apply it at the end.  */
> >>> +  tree bitfield = NULL_TREE;
> >>> +  tree bitfield_off = NULL_TREE;
> >>> +  if (TREE_CODE (mem) == COMPONENT_REF
> >>> +      && DECL_BIT_FIELD (TREE_OPERAND (mem, 1)))
> >>> +    {
> >>> +      bitfield = TREE_OPERAND (mem, 1);
> >>> +      bitfield_off = TREE_OPERAND (mem, 2);
> >>> +      mem = TREE_OPERAND (mem, 0);
> >>> +    }
> >>> +
> >>> +  tree ptr_base = mem;
> >>> +  /* VIEW_CONVERT_EXPRs do not change offset, strip them, they get folded
> >>> +     into the MEM_REF we create.  */
> >>> +  while (TREE_CODE (ptr_base) == VIEW_CONVERT_EXPR)
> >>> +    ptr_base = TREE_OPERAND (ptr_base, 0);
> >>> +
> >>> +  tree ptr = make_ssa_name (ptr_type_node);
> >>> +  gimple *new_stmt = gimple_build_assign (ptr, build_fold_addr_expr (ptr_base));
> >>> +  gimple_set_visited (new_stmt, true);
> >>> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> >>> +  ptr = make_ssa_name (ptr_type_node);
> >>> +  new_stmt = gimple_build_assign (ptr, BIT_AND_EXPR,
> >>> +				  gimple_assign_lhs (new_stmt), mask);
> >>> +  gimple_set_visited (new_stmt, true);
> >>> +  gsi_insert_before (gsi, new_stmt, GSI_SAME_STMT);
> >>> +  tree type = TREE_TYPE (mem);
> >>> +  unsigned align = get_object_alignment (mem);
> >>> +  if (align != TYPE_ALIGN (type))
> >>> +    type = build_aligned_type (type, align);
> >>> +
> >>> +  tree new_mem = build2 (MEM_REF, type, ptr,
> >>> +			 build_int_cst (reference_alias_ptr_type (mem), 0));
> >>> +  if (bitfield)
> >>> +    new_mem = build3 (COMPONENT_REF, TREE_TYPE (bitfield), new_mem,
> >>> +		      bitfield, bitfield_off);
> >>> +  return new_mem;
> >>> +}
> >>> +
> >>> +bool
> >>> +check_spectrev1_2nd_load (tree, tree *idx, void *data)
> >>> +{
> >>> +  sbitmap value_from_indexed_load = (sbitmap)data;
> >>> +  if (TREE_CODE (*idx) == SSA_NAME
> >>> +      && bitmap_bit_p (value_from_indexed_load, SSA_NAME_VERSION (*idx)))
> >>> +    return false;
> >>> +  return true;
> >>> +}
> >>> +
> >>> +bool
> >>> +check_spectrev1_2nd_load (gimple *, tree, tree ref, void *data)
> >>> +{
> >>> +  return !for_each_index (&ref, check_spectrev1_2nd_load, data);
> >>> +}
> >>> +
> >>> +void
> >>> +pass_spectrev1::mark_influencing_outgoing_flow (basic_block bb, tree op)
> >>> +{
> >>> +  if (!bitmap_set_bit (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> >>> +		       bb->index))
> >>> +    return;
> >>> +
> >>> +  /* Note we are deliberately non-conservatively stop at call and
> >>> +     memory boundaries here expecting earlier optimization to expose
> >>> +     value dependences via SSA chains.  */
> >>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> >>> +  if (gimple_vuse (def_stmt)
> >>> +      || !is_gimple_assign (def_stmt))
> >>> +    return;
> >>> +
> >>> +  ssa_op_iter i;
> >>> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, i, SSA_OP_USE)
> >>> +    mark_influencing_outgoing_flow (bb, op);
> >>> +}
> >>> +
> >>> +bool
> >>> +pass_spectrev1::find_value_dependent_guard (gimple *stmt, tree op)
> >>> +{
> >>> +  bitmap_iterator bi;
> >>> +  unsigned i;
> >>> +  EXECUTE_IF_SET_IN_BITMAP (&influencing_outgoing_flow[SSA_NAME_VERSION (op)],
> >>> +			    0, i, bi)
> >>> +    /* ???  If control-dependent on.
> >>> +       ???  Make bits in influencing_outgoing_flow the index of the BB
> >>> +       in RPO order so we could walk bits from STMT "upwards" finding
> >>> +       the nearest one.  */
> >>> +    if (dominated_by_p (CDI_DOMINATORS,
> >>> +			gimple_bb (stmt), BASIC_BLOCK_FOR_FN (cfun, i)))
> >>> +      {
> >>> +	if (dump_enabled_p ())
> >>> +	  dump_printf_loc (MSG_NOTE, stmt, "Condition %G in block %d "
> >>> +			   "is related to indexes used in %G\n",
> >>> +			   last_stmt (BASIC_BLOCK_FOR_FN (cfun, i)),
> >>> +			   i, stmt);
> >>> +	return true;
> >>> +      }
> >>> +
> >>> +  /* Note we are deliberately non-conservatively stop at call and
> >>> +     memory boundaries here expecting earlier optimization to expose
> >>> +     value dependences via SSA chains.  */
> >>> +  gimple *def_stmt = SSA_NAME_DEF_STMT (op);
> >>> +  if (gimple_vuse (def_stmt)
> >>> +      || !is_gimple_assign (def_stmt))
> >>> +    return false;
> >>> +
> >>> +  ssa_op_iter it;
> >>> +  FOR_EACH_SSA_TREE_OPERAND (op, def_stmt, it, SSA_OP_USE)
> >>> +    if (find_value_dependent_guard (stmt, op))
> >>> +      /* Others may be "nearer".  */
> >>> +      return true;
> >>> +
> >>> +  return false;
> >>> +}
> >>> +
> >>> +bool
> >>> +pass_spectrev1::stmt_is_indexed_load (gimple *stmt)
> >>> +{
> >>> +  /* Given we ignore the function boundary for incoming parameters
> >>> +     let's ignore return values of calls as well for the purpose
> >>> +     of being the first indexed load (also ignore inline-asms).  */
> >>> +  if (!gimple_assign_load_p (stmt))
> >>> +    return false;
> >>> +
> >>> +  /* Exclude esp. pointers from the index load itself (but also floats,
> >>> +     vectors, etc. - quite a bit handwaving here).  */
> >>> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt))))
> >>> +    return false;
> >>> +
> >>> +  /* If we do not have any SSA uses the load cannot be one indexed
> >>> +     by an attacker controlled value.  */
> >>> +  if (zero_ssa_operands (stmt, SSA_OP_USE))
> >>> +    return false;
> >>> +
> >>> +  return true;
> >>> +}
> >>> +
> >>> +/* Return true whether the index in the use operand OP in STMT is
> >>> +   not transfered to STMTs defs.  */
> >>> +
> >>> +bool
> >>> +pass_spectrev1::stmt_mangles_index (gimple *stmt, tree op)
> >>> +{
> >>> +  if (gimple_assign_load_p (stmt))
> >>> +    return true;
> >>> +  if (gassign *ass = dyn_cast <gassign *> (stmt))
> >>> +    {
> >>> +      enum tree_code code = gimple_assign_rhs_code (ass);
> >>> +      switch (code)
> >>> +	{
> >>> +	case TRUNC_DIV_EXPR:
> >>> +	case CEIL_DIV_EXPR:
> >>> +	case FLOOR_DIV_EXPR:
> >>> +	case ROUND_DIV_EXPR:
> >>> +	case EXACT_DIV_EXPR:
> >>> +	case RDIV_EXPR:
> >>> +	case TRUNC_MOD_EXPR:
> >>> +	case CEIL_MOD_EXPR:
> >>> +	case FLOOR_MOD_EXPR:
> >>> +	case ROUND_MOD_EXPR:
> >>> +	case LSHIFT_EXPR:
> >>> +	case RSHIFT_EXPR:
> >>> +	case LROTATE_EXPR:
> >>> +	case RROTATE_EXPR:
> >>> +	  /* Division, modulus or shifts by the index do not produce
> >>> +	     something useful for the attacker.  */
> >>> +	  if (gimple_assign_rhs2 (ass) == op)
> >>> +	    return true;
> >>> +	  break;
> >>> +	default:;
> >>> +	  /* Comparisons do not produce an index value.  */
> >>> +	  if (TREE_CODE_CLASS (code) == tcc_comparison)
> >>> +	    return true;
> >>> +	}
> >>> +    }
> >>> +  /* ???  We could handle builtins here.  */
> >>> +  return false;
> >>> +}
> >>> +
> >>> +static GTY(()) tree spectrev1_tls_mask_decl;
> >>> +
> >>> +/* Main entry for spectrev1 pass.  */
> >>> +
> >>> +unsigned int
> >>> +pass_spectrev1::execute (function *fn)
> >>> +{
> >>> +  calculate_dominance_info (CDI_DOMINATORS);
> >>> +  loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
> >>> +
> >>> +  int *rpo = XNEWVEC (int, n_basic_blocks_for_fn (cfun));
> >>> +  int rpo_num = pre_and_rev_post_order_compute_fn (fn, NULL, rpo, false);
> >>> +
> >>> +  /* We track for each SSA name whether its value (may) depend(s) on
> >>> +     the result of an indexed load.
> >>> +     A set of operation will kill a value (enough).  */
> >>> +  auto_sbitmap value_from_indexed_load (num_ssa_names);
> >>> +  bitmap_clear (value_from_indexed_load);
> >>> +
> >>> +  unsigned orig_num_ssa_names = num_ssa_names;
> >>> +  influencing_outgoing_flow = XCNEWVEC (bitmap_head, num_ssa_names);
> >>> +  for (unsigned i = 1; i < num_ssa_names; ++i)
> >>> +    bitmap_initialize (&influencing_outgoing_flow[i], &bitmap_default_obstack);
> >>> +
> >>> +
> >>> +  /* Diagnosis.  */
> >>> +
> >>> +  /* Function arguments are not indexed loads unless we want to
> >>> +     be conservative to a level no longer useful.  */
> >>> +
> >>> +  for (int i = 0; i < rpo_num; ++i)
> >>> +    {
> >>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> >>> +
> >>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> >>> +	   !gsi_end_p (gpi); gsi_next (&gpi))
> >>> +	{
> >>> +	  gphi *phi = gpi.phi ();
> >>> +	  bool value_from_indexed_load_p = false;
> >>> +	  use_operand_p arg_p;
> >>> +	  ssa_op_iter it;
> >>> +	  FOR_EACH_PHI_ARG (arg_p, phi, it, SSA_OP_USE)
> >>> +	    {
> >>> +	      tree arg = USE_FROM_PTR (arg_p);
> >>> +	      if (TREE_CODE (arg) == SSA_NAME
> >>> +		  && bitmap_bit_p (value_from_indexed_load,
> >>> +				   SSA_NAME_VERSION (arg)))
> >>> +		value_from_indexed_load_p = true;
> >>> +	    }
> >>> +	  if (value_from_indexed_load_p)
> >>> +	    bitmap_set_bit (value_from_indexed_load,
> >>> +			    SSA_NAME_VERSION (PHI_RESULT (phi)));
> >>> +	}
> >>> +
> >>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> >>> +	{
> >>> +	  gimple *stmt = gsi_stmt (gsi);
> >>> +	  if (is_gimple_debug (stmt))
> >>> +	    continue;
> >>> +
> >>> +	  if (walk_stmt_load_store_ops (stmt, value_from_indexed_load,
> >>> +					check_spectrev1_2nd_load,
> >>> +					check_spectrev1_2nd_load))
> >>> +	    warning_at (gimple_location (stmt), OPT_Wspectre_v1, "%Gspectrev1",
> >>> +			stmt);
> >>> +
> >>> +	  bool value_from_indexed_load_p = false;
> >>> +	  if (stmt_is_indexed_load (stmt))
> >>> +	    {
> >>> +	      /* We are interested in indexes to later loads so ultimatively
> >>> +		 register values that all happen to separate SSA defs.
> >>> +		 Interesting aggregates will be decomposed by later loads
> >>> +		 which we then mark as producing an index.  Simply mark
> >>> +		 all SSA defs as coming from an indexed load.  */
> >>> +	      /* We are handling a single load in STMT right now.  */
> >>> +	      ssa_op_iter it;
> >>> +	      tree op;
> >>> +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +	        if (find_value_dependent_guard (stmt, op))
> >>> +		  {
> >>> +		    /* ???  Somehow record the dependence to point to it in
> >>> +		       diagnostics.  */
> >>> +		    value_from_indexed_load_p = true;
> >>> +		    break;
> >>> +		  }
> >>> +	    }
> >>> +
> >>> +	  tree op;
> >>> +	  ssa_op_iter it;
> >>> +	  FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +	    if (bitmap_bit_p (value_from_indexed_load,
> >>> +			      SSA_NAME_VERSION (op))
> >>> +		&& !stmt_mangles_index (stmt, op))
> >>> +	      {
> >>> +		value_from_indexed_load_p = true;
> >>> +		break;
> >>> +	      }
> >>> +
> >>> +	  if (value_from_indexed_load_p)
> >>> +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_DEF)
> >>> +	      /* ???  We could cut off single-bit values from the chain
> >>> +	         here or pretain that float loads will be never turned
> >>> +		 into integer indices, etc.  */
> >>> +	      bitmap_set_bit (value_from_indexed_load,
> >>> +			      SSA_NAME_VERSION (op));
> >>> +	}
> >>> +
> >>> +      if (EDGE_COUNT (bb->succs) > 1)
> >>> +	{
> >>> +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> >>> +	  /* ???  What about switches?  What about badly speculated EH?  */
> >>> +	  if (!stmt)
> >>> +	    continue;
> >>> +	  /* We could constrain conditions here to those more likely
> >>> +	     being "bounds checks".  For example common guards for
> >>> +	     indirect accesses are NULL pointer checks.
> >>> +	     ???  This isn't fully safe, but it drops the number of
> >>> +	     spectre warnings for dwarf2out.i from cc1files from 70 to 16.  */
> >>> +	  if ((gimple_cond_code (stmt) == EQ_EXPR
> >>> +	       || gimple_cond_code (stmt) == NE_EXPR)
> >>> +	      && integer_zerop (gimple_cond_rhs (stmt))
> >>> +	      && POINTER_TYPE_P (TREE_TYPE (gimple_cond_lhs (stmt))))
> >>> +	    ;
> >>> +	  else
> >>> +	    {
> >>> +	      ssa_op_iter it;
> >>> +	      tree op;
> >>> +	      FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +		mark_influencing_outgoing_flow (bb, op);
> >>> +	    }
> >>> +	}
> >>> +    }
> >>> +
> >>> +  for (unsigned i = 1; i < orig_num_ssa_names; ++i)
> >>> +    bitmap_release (&influencing_outgoing_flow[i]);
> >>> +  XDELETEVEC (influencing_outgoing_flow);
> >>> +
> >>> +
> >>> +
> >>> +  /* Instrumentation.  */
> >>> +  if (!flag_spectrev1)
> >>> +    return 0;
> >>> +
> >>> +  /* Create the default all-ones mask.  When doing IPA instrumentation
> >>> +     this should initialize the mask from TLS memory and outgoing edges
> >>> +     need to save the mask to TLS memory.  */
> >>> +  gimple *new_stmt;
> >>> +  if (!spectrev1_tls_mask_decl
> >>> +      && flag_spectrev1 >= 3)
> >>> +    {
> >>> +      /* Use a smaller variable in case sign-extending loads are
> >>> +	 available?  */
> >>> +      spectrev1_tls_mask_decl
> >>> +	  = build_decl (BUILTINS_LOCATION,
> >>> +			VAR_DECL, NULL_TREE, ptr_type_node);
> >>> +      TREE_STATIC (spectrev1_tls_mask_decl) = 1;
> >>> +      TREE_PUBLIC (spectrev1_tls_mask_decl) = 1;
> >>> +      DECL_VISIBILITY (spectrev1_tls_mask_decl) = VISIBILITY_HIDDEN;
> >>> +      DECL_VISIBILITY_SPECIFIED (spectrev1_tls_mask_decl) = 1;
> >>> +      DECL_INITIAL (spectrev1_tls_mask_decl)
> >>> +	  = build_all_ones_cst (ptr_type_node);
> >>> +      DECL_NAME (spectrev1_tls_mask_decl) = get_identifier ("__SV1MSK");
> >>> +      DECL_ARTIFICIAL (spectrev1_tls_mask_decl) = 1;
> >>> +      DECL_IGNORED_P (spectrev1_tls_mask_decl) = 1;
> >>> +      varpool_node::finalize_decl (spectrev1_tls_mask_decl);
> >>> +      make_decl_one_only (spectrev1_tls_mask_decl,
> >>> +			  DECL_ASSEMBLER_NAME (spectrev1_tls_mask_decl));
> >>> +      set_decl_tls_model (spectrev1_tls_mask_decl,
> >>> +			  decl_default_tls_model (spectrev1_tls_mask_decl));
> >>> +    }
> >>> +
> >>> +  /* We let the SSA rewriter cope with rewriting mask into SSA and
> >>> +     inserting PHI nodes.  */
> >>> +  tree mask = create_tmp_reg (ptr_type_node, "spectre_v1_mask");
> >>> +  new_stmt = gimple_build_assign (mask,
> >>> +				  flag_spectrev1 >= 3
> >>> +				  ? spectrev1_tls_mask_decl
> >>> +				  : build_all_ones_cst (ptr_type_node));
> >>> +  gimple_stmt_iterator gsi
> >>> +      = gsi_after_labels (single_succ (ENTRY_BLOCK_PTR_FOR_FN (fn)));
> >>> +  gsi_insert_before (&gsi, new_stmt, GSI_CONTINUE_LINKING);
> >>> +
> >>> +  /* We are using the visited flag to track stmts downstream in a BB.  */
> >>> +  for (int i = 0; i < rpo_num; ++i)
> >>> +    {
> >>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> >>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> >>> +	   !gsi_end_p (gpi); gsi_next (&gpi))
> >>> +	gimple_set_visited (gpi.phi (), false);
> >>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> >>> +	gimple_set_visited (gsi_stmt (gsi), false);
> >>> +    }
> >>> +
> >>> +  for (int i = 0; i < rpo_num; ++i)
> >>> +    {
> >>> +      basic_block bb = BASIC_BLOCK_FOR_FN (fn, rpo[i]);
> >>> +
> >>> +      for (gphi_iterator gpi = gsi_start_phis (bb);
> >>> +	   !gsi_end_p (gpi); gsi_next (&gpi))
> >>> +	{
> >>> +	  gphi *phi = gpi.phi ();
> >>> +	  /* ???  We can merge SAFE state across BB boundaries in
> >>> +	     some cases, like when edges are not critical and the
> >>> +	     state was made SAFE in the tail of the predecessors
> >>> +	     and not invalidated by calls.   */
> >>> +	  gimple_set_plf (phi, SV1_SAFE, false);
> >>> +	}
> >>> +
> >>> +      bool instrumented_call_p = false;
> >>> +      for (gimple_stmt_iterator gsi = gsi_start_bb (bb);
> >>> +	   !gsi_end_p (gsi); gsi_next (&gsi))
> >>> +	{
> >>> +	  gimple *stmt = gsi_stmt (gsi);
> >>> +	  gimple_set_visited (stmt, true);
> >>> +	  if (is_gimple_debug (stmt))
> >>> +	    continue;
> >>> +
> >>> +	  tree op;
> >>> +	  ssa_op_iter it;
> >>> +	  bool safe = is_gimple_assign (stmt);
> >>> +	  if (safe)
> >>> +	    FOR_EACH_SSA_TREE_OPERAND (op, stmt, it, SSA_OP_USE)
> >>> +	      {
> >>> +		if (safe
> >>> +		    && (SSA_NAME_IS_DEFAULT_DEF (op)
> >>> +			|| !gimple_plf (SSA_NAME_DEF_STMT (op), SV1_SAFE)
> >>> +			/* Once mask can have changed we cannot further
> >>> +			   propagate safe state.  */
> >>> +			|| gimple_bb (SSA_NAME_DEF_STMT (op)) != bb
> >>> +			/* That includes calls if we have instrumented one
> >>> +			   in this block.  */
> >>> +			|| (instrumented_call_p
> >>> +			    && call_between (SSA_NAME_DEF_STMT (op), stmt))))
> >>> +		  {
> >>> +		    safe = false;
> >>> +		    break;
> >>> +		  }
> >>> +	      }
> >>> +	  gimple_set_plf (stmt, SV1_SAFE, safe);
> >>> +
> >>> +	  /* Instrument bounded loads.
> >>> +	     We instrument non-aggregate loads with non-invariant address.
> >>> +	     The idea is to reliably instrument the bounded load while
> >>> +	     leaving the canary, being it load or store, aggregate or
> >>> +	     non-aggregate, alone.  */
> >>> +	  if (gimple_assign_single_p (stmt)
> >>> +	      && gimple_vuse (stmt)
> >>> +	      && !gimple_vdef (stmt)
> >>> +	      && !zero_ssa_operands (stmt, SSA_OP_USE))
> >>> +	    {
> >>> +	      tree new_mem = instrument_mem (&gsi, gimple_assign_rhs1 (stmt),
> >>> +					     mask);
> >>> +	      gimple_assign_set_rhs1 (stmt, new_mem);
> >>> +	      update_stmt (stmt);
> >>> +	      /* The value loaded my a masked load is "safe".  */
> >>> +	      gimple_set_plf (stmt, SV1_SAFE, true);
> >>> +	    }
> >>> +
> >>> +	  /* Instrument return store to TLS mask.  */
> >>> +	  if (flag_spectrev1 >= 3
> >>> +	      && gimple_code (stmt) == GIMPLE_RETURN)
> >>> +	    {
> >>> +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> >>> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> >>> +	    }
> >>> +	  /* Instrument calls with store/load to/from TLS mask.
> >>> +	     ???  Placement of the stores/loads can be optimized in a LCM
> >>> +	     way.  */
> >>> +	  else if (flag_spectrev1 >= 3
> >>> +		   && is_gimple_call (stmt)
> >>> +		   && gimple_vuse (stmt))
> >>> +	    {
> >>> +	      new_stmt = gimple_build_assign (spectrev1_tls_mask_decl, mask);
> >>> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> >>> +	      if (!stmt_ends_bb_p (stmt))
> >>> +		{
> >>> +		  new_stmt = gimple_build_assign (mask,
> >>> +						  spectrev1_tls_mask_decl);
> >>> +		  gsi_insert_after (&gsi, new_stmt, GSI_NEW_STMT);
> >>> +		}
> >>> +	      else
> >>> +		{
> >>> +		  edge_iterator ei;
> >>> +		  edge e;
> >>> +		  FOR_EACH_EDGE (e, ei, bb->succs)
> >>> +		    {
> >>> +		      if (e->flags & EDGE_ABNORMAL)
> >>> +			continue;
> >>> +		      new_stmt = gimple_build_assign (mask,
> >>> +						      spectrev1_tls_mask_decl);
> >>> +		      gsi_insert_on_edge (e, new_stmt);
> >>> +		    }
> >>> +		}
> >>> +	      instrumented_call_p = true;
> >>> +	    }
> >>> +	}
> >>> +
> >>> +      if (EDGE_COUNT (bb->succs) > 1)
> >>> +	{
> >>> +	  gcond *stmt = safe_dyn_cast <gcond *> (last_stmt (bb));
> >>> +	  /* ???  What about switches?  What about badly speculated EH?  */
> >>> +	  if (!stmt)
> >>> +	    continue;
> >>> +
> >>> +	  /* Instrument conditional branches to track mis-speculation
> >>> +	     via a pointer-sized mask.
> >>> +	     ???  We could restrict to instrumenting those conditions
> >>> +	     that control interesting loads or apply simple heuristics
> >>> +	     like not instrumenting FP compares or equality compares
> >>> +	     which are unlikely bounds checks.  But we have to instrument
> >>> +	     bool != 0 because multiple conditions might have been
> >>> +	     combined.  */
> >>> +	  edge truee, falsee;
> >>> +	  extract_true_false_edges_from_block (bb, &truee, &falsee);
> >>> +	  /* Unless -fspectre-v1=2 we do not instrument loop exit tests.  */
> >>> +	  if (flag_spectrev1 >= 2
> >>> +	      || !loop_exits_from_bb_p (bb->loop_father, bb))
> >>> +	    {
> >>> +	      gimple_stmt_iterator gsi = gsi_last_bb (bb);
> >>> +
> >>> +	      /* Instrument
> >>> +	           if (a_1 > b_2)
> >>> +		 as
> >>> +	           tem_mask_3 = a_1 > b_2 ? -1 : 0;
> >>> +		   if (tem_mask_3 != 0)
> >>> +		 this will result in a
> >>> +		   xor %eax, %eax; cmp|test; setCC %al; sub $0x1, %eax; jne
> >>> +		 sequence which is faster in practice than when retaining
> >>> +		 the original jump condition.  This is 10 bytes overhead
> >>> +		 on x86_64 plus 3 bytes for an and on the true path and
> >>> +		 5 bytes for an and and not on the false path.  */
> >>> +	      tree tem_mask = make_ssa_name (ptr_type_node);
> >>> +	      new_stmt = gimple_build_assign (tem_mask, COND_EXPR,
> >>> +					      build2 (gimple_cond_code (stmt),
> >>> +						      boolean_type_node,
> >>> +						      gimple_cond_lhs (stmt),
> >>> +						      gimple_cond_rhs (stmt)),
> >>> +					      build_all_ones_cst (ptr_type_node),
> >>> +					      build_zero_cst (ptr_type_node));
> >>> +	      gsi_insert_before (&gsi, new_stmt, GSI_SAME_STMT);
> >>> +	      gimple_cond_set_code (stmt, NE_EXPR);
> >>> +	      gimple_cond_set_lhs (stmt, tem_mask);
> >>> +	      gimple_cond_set_rhs (stmt, build_zero_cst (ptr_type_node));
> >>> +	      update_stmt (stmt);
> >>> +
> >>> +	      /* On the false edge
> >>> +	           mask = mask & ~tem_mask_3;  */
> >>> +	      gimple_seq tems = NULL;
> >>> +	      tree tem_mask2 = make_ssa_name (ptr_type_node);
> >>> +	      new_stmt = gimple_build_assign (tem_mask2, BIT_NOT_EXPR,
> >>> +					      tem_mask);
> >>> +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
> >>> +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> >>> +					      mask, tem_mask2);
> >>> +	      gimple_seq_add_stmt_without_update (&tems, new_stmt);
> >>> +	      gsi_insert_seq_on_edge (falsee, tems);
> >>> +
> >>> +	      /* On the true edge
> >>> +	           mask = mask & tem_mask_3;  */
> >>> +	      new_stmt = gimple_build_assign (mask, BIT_AND_EXPR,
> >>> +					      mask, tem_mask);
> >>> +	      gsi_insert_on_edge (truee, new_stmt);
> >>> +	    }
> >>> +	}
> >>> +    }
> >>> +
> >>> +  gsi_commit_edge_inserts ();
> >>> +
> >>> +  return 0;
> >>> +}
> >>> +
> >>> +} // anon namespace
> >>> +
> >>> +gimple_opt_pass *
> >>> +make_pass_spectrev1 (gcc::context *ctxt)
> >>> +{
> >>> +  return new pass_spectrev1 (ctxt);
> >>> +}
> >>> diff --git a/gcc/params.def b/gcc/params.def
> >>> index 6f98fccd291..19f7dbf4dad 100644
> >>> --- a/gcc/params.def
> >>> +++ b/gcc/params.def
> >>> @@ -1378,6 +1378,11 @@ DEFPARAM(PARAM_LOOP_VERSIONING_MAX_OUTER_INSNS,
> >>>  	 " loops.",
> >>>  	 100, 0, 0)
> >>>  
> >>> +DEFPARAM(PARAM_SPECTRE_V1_MAX_INSTRUMENT_INDICES,
> >>> +	 "spectre-v1-max-instrument-indices",
> >>> +	 "Maximum number of indices to instrument before instrumenting the whole address.",
> >>> +	 1, 0, 0)
> >>> +
> >>>  /*
> >>>  
> >>>  Local variables:
> >>> diff --git a/gcc/passes.def b/gcc/passes.def
> >>> index 144df4fa417..2fe0cdcfa7e 100644
> >>> --- a/gcc/passes.def
> >>> +++ b/gcc/passes.def
> >>> @@ -400,6 +400,7 @@ along with GCC; see the file COPYING3.  If not see
> >>>    NEXT_PASS (pass_lower_resx);
> >>>    NEXT_PASS (pass_nrv);
> >>>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);
> >>> +  NEXT_PASS (pass_spectrev1);
> >>>    NEXT_PASS (pass_warn_function_noreturn);
> >>>    NEXT_PASS (pass_gen_hsail);
> >>>  
> >>> diff --git a/gcc/testsuite/gcc.dg/Wspectre-v1-1.c b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> >>> new file mode 100644
> >>> index 00000000000..3ac647e72fd
> >>> --- /dev/null
> >>> +++ b/gcc/testsuite/gcc.dg/Wspectre-v1-1.c
> >>> @@ -0,0 +1,10 @@
> >>> +/* { dg-do compile } */
> >>> +/* { dg-options "-Wspectre-v1" } */
> >>> +
> >>> +unsigned char a[1024];
> >>> +int b[256];
> >>> +int foo (int i, int bound)
> >>> +{
> >>> +  if (i < bound)
> >>> +    return b[a[i]];  /* { dg-warning "spectrev1" } */
> >>> +}
> >>> diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
> >>> index 9f9d85fdbc3..f5c164f465f 100644
> >>> --- a/gcc/tree-pass.h
> >>> +++ b/gcc/tree-pass.h
> >>> @@ -625,6 +625,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
> >>>  extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
> >>>  extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
> >>>  extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
> >>> +extern gimple_opt_pass *make_pass_spectrev1 (gcc::context *ctxt);
> >>>  
> >>>  /* Current optimization pass.  */
> >>>  extern opt_pass *current_pass;
> >>>
> >>
> >>
> >>
> > 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 17:17                   ` Richard Biener
@ 2018-12-19 17:25                     ` Richard Earnshaw (lists)
  2018-12-19 17:29                       ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Earnshaw (lists) @ 2018-12-19 17:25 UTC (permalink / raw)
  To: Richard Biener, Florian Weimer
  Cc: Peter Bergner, gcc, Tulio Magno Quites Machado Filho

On 19/12/2018 17:17, Richard Biener wrote:
> On Wed, 19 Dec 2018, Florian Weimer wrote:
> 
>> * Peter Bergner:
>>
>>> On 12/19/18 7:59 AM, Florian Weimer wrote:
>>>> * Richard Biener:
>>>>
>>>>> Sure, if we'd ever deploy this in production placing this in the
>>>>> TCB for glibc targets might be beneifical.  But as said the
>>>>> current implementation was just an experiment intended to be
>>>>> maximum portable.  I suppose the dynamic loader takes care
>>>>> of initializing the TCB data?
>>>>
>>>> Yes, the dynamic linker will initialize it.  If you need 100% reliable
>>>> initialization with something that is not zero, it's going to be tricky
>>>> though.  Initial-exec TLS memory has this covered, but in the TCB, we
>>>> only have zeroed-out reservations today.
>>>
>>> We have non-zero initialized TCB entries on powerpc*-linux which are used
>>> for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin
>>> functions.  Tulio would know the magic that was used to get them setup.
>>
>> Yes, there's a special symbol, __parse_hwcap_and_convert_at_platform, to
>> verify that the dynamic linker sets up the TCB as required.  This way,
>> binaries which need the feature will fail to run on older loaders.  This
>> is why I said it's a bit tricky to implement this.  It's even more
>> complicated if you want to backport this into released glibcs, where we
>> normally do not accept ABI changes (not even ABI additions).
> 
> It's easy to change the mitigation scheme to use a zero for the
> non-speculated path, you'd simply replace ands with zero by
> ors with -1.  For address parts that gets you some possible overflows
> you do not want though.

And you have to invert the value before using it as a mask.

R.

> 
> Richard.
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Spectre V1 diagnostic / mitigation
  2018-12-19 17:25                     ` Richard Earnshaw (lists)
@ 2018-12-19 17:29                       ` Richard Biener
  0 siblings, 0 replies; 17+ messages in thread
From: Richard Biener @ 2018-12-19 17:29 UTC (permalink / raw)
  To: Richard Earnshaw (lists)
  Cc: Florian Weimer, Peter Bergner, gcc, Tulio Magno Quites Machado Filho

On Wed, 19 Dec 2018, Richard Earnshaw (lists) wrote:

> On 19/12/2018 17:17, Richard Biener wrote:
> > On Wed, 19 Dec 2018, Florian Weimer wrote:
> > 
> >> * Peter Bergner:
> >>
> >>> On 12/19/18 7:59 AM, Florian Weimer wrote:
> >>>> * Richard Biener:
> >>>>
> >>>>> Sure, if we'd ever deploy this in production placing this in the
> >>>>> TCB for glibc targets might be beneifical.  But as said the
> >>>>> current implementation was just an experiment intended to be
> >>>>> maximum portable.  I suppose the dynamic loader takes care
> >>>>> of initializing the TCB data?
> >>>>
> >>>> Yes, the dynamic linker will initialize it.  If you need 100% reliable
> >>>> initialization with something that is not zero, it's going to be tricky
> >>>> though.  Initial-exec TLS memory has this covered, but in the TCB, we
> >>>> only have zeroed-out reservations today.
> >>>
> >>> We have non-zero initialized TCB entries on powerpc*-linux which are used
> >>> for the GCC __builtin_cpu_is() and __builtin_cpu_supports() builtin
> >>> functions.  Tulio would know the magic that was used to get them setup.
> >>
> >> Yes, there's a special symbol, __parse_hwcap_and_convert_at_platform, to
> >> verify that the dynamic linker sets up the TCB as required.  This way,
> >> binaries which need the feature will fail to run on older loaders.  This
> >> is why I said it's a bit tricky to implement this.  It's even more
> >> complicated if you want to backport this into released glibcs, where we
> >> normally do not accept ABI changes (not even ABI additions).
> > 
> > It's easy to change the mitigation scheme to use a zero for the
> > non-speculated path, you'd simply replace ands with zero by
> > ors with -1.  For address parts that gets you some possible overflows
> > you do not want though.
> 
> And you have to invert the value before using it as a mask.

For the cases an OR doesn't work yes.  Fortunately most targets
have an and-not instruction.  (x86 doesn't unless you have BMI)

Richard.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2018-12-19 17:29 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-18 15:37 Spectre V1 diagnostic / mitigation Richard Biener
2018-12-18 16:17 ` Jeff Law
2018-12-19 11:16   ` Richard Biener
2018-12-18 16:48 ` Richard Earnshaw (lists)
2018-12-19 11:25   ` Richard Biener
2018-12-19 11:34     ` Florian Weimer
2018-12-19 11:51       ` Richard Biener
2018-12-19 13:35         ` Florian Weimer
2018-12-19 13:49           ` Richard Biener
2018-12-19 14:01             ` Florian Weimer
2018-12-19 14:19               ` Peter Bergner
2018-12-19 15:44                 ` Florian Weimer
2018-12-19 17:17                   ` Richard Biener
2018-12-19 17:25                     ` Richard Earnshaw (lists)
2018-12-19 17:29                       ` Richard Biener
2018-12-19 15:42     ` Richard Earnshaw (lists)
2018-12-19 17:20       ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).