public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* fix PR46029: reimplement if conversion of loads and stores
@ 2015-06-12 21:05 Abe Skolnik
  2015-06-22 16:31 ` Alan Lawrence
  2015-06-24 16:51 ` Ramana Radhakrishnan
  0 siblings, 2 replies; 17+ messages in thread
From: Abe Skolnik @ 2015-06-12 21:05 UTC (permalink / raw)
  To: gcc-patches, alalaw01, richard.guenther, sebpop

Hi everybody!

In the current implementation of if conversion, loads and stores are
if-converted in a thread-unsafe way:

  * loads were always executed, even when they should have not been.
    Some source code could be rendered invalid due to null pointers
    that were OK in the original program because they were never
    dereferenced.

  * writes were if-converted via load/maybe-modify/store, which
    renders some code multithreading-unsafe.

This patch reimplements if-conversion of loads and stores in a safe
way using a scratchpad allocated by the compiler on the stack:

  * loads are done through an indirection, reading either the correct
    data from the correct source [if the condition is true] or reading
    from the scratchpad and later ignoring this read result [if the
    condition is false].

  * writes are also done through an indirection, writing either to the
    correct destination [if the condition is true] or to the
    scratchpad [if the condition is false].

Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
old if-conversion resulted in unsafe code that could fail under
multithreading even though the as-written code _was_ thread-safe.

Passed regression testing and bootstrap on amd64-linux.
Is this OK to commit to trunk?

Regards,

Abe




2015-06-12  Sebastian Pop  <s.pop@samsung.com>
            Abe Skolnik  <a.skolnik@samsung.com>

	PR tree-optimization/46029
	* tree-data-ref.c (struct data_ref_loc_d): Moved...
	(get_references_in_stmt): Exported.
	* tree-data-ref.h (struct data_ref_loc_d): ... here.
	(get_references_in_stmt): Declared.

	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
	* tree-if-conv.c (struct ifc_dr): Removed.
	(IFC_DR): Removed.
	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
	(DR_RW_UNCONDITIONALLY): Removed.
	(memrefs_read_or_written_unconditionally): Removed.
	(write_memrefs_written_at_least_once): Removed.
	(ifcvt_could_trap_p): Does not take refs parameter anymore.
	(ifcvt_memrefs_wont_trap): Removed.
	(has_non_addressable_refs): New.
	(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
	Removed use of refs.
	(if_convertible_stmt_p): Removed use of refs.
	(if_convertible_gimple_assign_stmt_p): Same.
	(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
	of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
	(insert_address_of): New.
	(create_scratchpad): New.
	(create_indirect_cond_expr): New.
	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
	parameter for scratch_pad.
	(combine_blocks): Same.
	(tree_if_conversion): Same.

	testsuite/
	* g++.dg/tree-ssa/ifc-pr46029.C: New.
	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
	* gcc.dg/tree-ssa/ifc-8.c: New.
	* gcc.dg/tree-ssa/ifc-9.c: New.
	* gcc.dg/tree-ssa/ifc-10.c: New.
	* gcc.dg/tree-ssa/ifc-11.c: New.
	* gcc.dg/tree-ssa/ifc-12.c: New.
	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
---
 gcc/ChangeLog                                      |  28 ++
 gcc/doc/invoke.texi                                |  18 +-
 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C        |  76 ++++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c             |  17 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c             |  16 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c             |  13 +
 gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c              |  19 +-
 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c              |  29 ++
 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c              |  17 +
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c      |  10 +-
 .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c      |  46 +++
 gcc/tree-data-ref.c                                |  13 +-
 gcc/tree-data-ref.h                                |  14 +
 gcc/tree-if-conv.c                                 | 392 +++++++++------------
 14 files changed, 460 insertions(+), 248 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
 create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3dec6b1..70af07c 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,31 @@
+2015-05-18  Sebastian Pop  <s.pop@samsung.com>
+
+	PR tree-optimization/46029
+	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
+	* tree-if-conv.c (has_unaligned_memory_refs): New.
+	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
+	(create_scratchpad): New.
+	(create_indirect_cond_expr): New.
+	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
+	parameter for scratch_pad.
+	(combine_blocks): Same.
+	(tree_if_conversion): Same.
+	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
+	scratch_pad.
+	(struct ifc_dr): Removed.
+	(IFC_DR): Removed.
+	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
+	(DR_RW_UNCONDITIONALLY): Removed.
+	(memrefs_read_or_written_unconditionally): Removed.
+	(write_memrefs_written_at_least_once): Removed.
+	(ifcvt_memrefs_wont_trap): Removed.
+	(ifcvt_could_trap_p): Does not take refs parameter anymore.
+	(if_convertible_gimple_assign_stmt_p): Same.
+	(if_convertible_stmt_p): Same.
+	(if_convertible_loop_p_1): Remove initialization of dr->aux,
+	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
+	(if_convertible_loop_p): Remove deallocation of the same.
+
 2015-05-15  Marc Glisse  <marc.glisse@inria.fr>
 
 	PR tree-optimization/64454
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 117b5d9..c0e27e8 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8815,20 +8815,26 @@ if vectorization is enabled.
 @item -ftree-loop-if-convert-stores
 @opindex ftree-loop-if-convert-stores
 Attempt to also if-convert conditional jumps containing memory writes.
-This transformation can be unsafe for multi-threaded programs as it
-transforms conditional memory writes into unconditional memory writes.
 For example,
 @smallexample
 for (i = 0; i < N; i++)
   if (cond)
-    A[i] = expr;
+    A[i] = B[i] + 2;
 @end smallexample
 is transformed to
 @smallexample
-for (i = 0; i < N; i++)
-  A[i] = cond ? expr : A[i];
+void *scratchpad = alloca (64);
+for (i = 0; i < N; i++) @{
+  a = cond ? &A[i] : scratchpad;
+  b = cond ? &B[i] : scratchpad;
+  *a = *b + 2;
+@}
 @end smallexample
-potentially producing data races.
+The compiler allocates a scratchpad memory on the stack for each
+function in which the if-conversion of memory stores or reads
+happened.  This scratchpad memory is used during the part of the
+computation that is discarded, i.e., when the condition is evaluated
+to false.
 
 @item -ftree-loop-distribution
 @opindex ftree-loop-distribution
diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
new file mode 100644
index 0000000..2a54bdb
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
@@ -0,0 +1,76 @@
+// { dg-do run }
+/* { dg-options "-O -ftree-loop-if-convert-stores" } */
+
+namespace
+{
+  struct rb_tree_node_
+  {
+    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_metadata;
+    }
+    rb_tree_node_ *m_p_left;
+    rb_tree_node_ *m_p_parent;
+    unsigned m_metadata;
+  };
+
+  struct bin_search_tree_const_node_it_
+  {
+    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
+    {
+    }
+    unsigned &get_metadata ()
+    {
+      return m_p_nd->get_metadata ();
+    }
+    bin_search_tree_const_node_it_ get_l_child ()
+    {
+      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
+    }
+
+    rb_tree_node_ *m_p_nd;
+  };
+
+  struct bin_search_tree_no_data_
+  {
+    typedef rb_tree_node_ *node_pointer;
+      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
+    {
+    }
+    void insert_imp_empty (int r_value)
+    {
+      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
+      m_p_head->m_p_parent = p_new_node;
+      p_new_node->m_p_parent = m_p_head;
+      update_to_top (m_p_head->m_p_parent);
+    }
+    void apply_update (bin_search_tree_const_node_it_ nd_it)
+    {
+      unsigned
+	l_max_endpoint
+	=
+	(nd_it.get_l_child ().m_p_nd ==
+	 0) ? 0 : nd_it.get_l_child ().get_metadata ();
+      nd_it.get_metadata () = l_max_endpoint;
+    }
+    void update_to_top (node_pointer p_nd)
+    {
+      while (p_nd != m_p_head)
+	{
+	  apply_update (p_nd);
+	  p_nd = p_nd->m_p_parent;
+	}
+    }
+
+    rb_tree_node_ * m_p_head;
+  };
+}
+
+int main ()
+{
+  bin_search_tree_no_data_ ().insert_imp_empty (0);
+  return 0;
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
new file mode 100644
index 0000000..3dcc202
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+/* middle-end/PR65947 */
+
+int a[32];
+
+int main(int argc, char **argv)
+{
+  int res = 3;
+  for (int i = 0; i < 32; i++)
+    if (a[i]) res = 7;
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
new file mode 100644
index 0000000..2943c04
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+int read(int *r, int *c)
+{
+  int res = 42;
+  int i;
+  for (i = 0; i < 32; i++)
+    if (c[i])
+      res = *r;
+
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
new file mode 100644
index 0000000..01acf0b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+void write(int *r, int *c)
+{
+  int i;
+  for (i = 0; i < 32; i++)
+    if (c[i])
+      *r = c[i];
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
index a9cc816..4be2cdb 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
+/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */
 
 void
 dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
@@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
   for (i = 0; i <= nCoeffs; i++)
     {
       level = block[i];
-      if (level < 0)
-	level = level * qmul - qadd;
-      else
-	level = level * qmul + qadd;
-      block[i] = level;
+      if (level)
+        {
+          if (level < 0)
+            {
+              level = level * qmul - qadd;
+            }
+          else
+            {
+              level = level * qmul + qadd;
+            }
+          block[i] = level;
+        }
     }
 }
 
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
new file mode 100644
index 0000000..d7cf279
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
@@ -0,0 +1,29 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
+
+typedef union tree_node *tree;
+struct tree_common
+{
+  unsigned volatile_flag : 1;
+  unsigned unsigned_flag : 1;
+};
+struct tree_type
+{
+  tree next_variant;
+  tree main_variant;
+};
+union tree_node
+{
+  struct tree_common common;
+  struct tree_type type;
+};
+void finish_enum (tree enumtype)
+{
+  tree tem;
+  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
+    {
+      if (tem == enumtype)
+	continue;
+      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
+    }
+}
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
new file mode 100644
index 0000000..6d82bb4
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */
+
+/* middle-end/PR65947 */
+
+int a[32];
+
+int main(int argc, char **argv)
+{
+  int res = 3;
+  for (int i = 0; i < 32; i++)
+    if (a[i]) res = 7;
+  return res;
+}
+
+/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
+/* { dg-final { cleanup-tree-dump "ifcvt" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
index cdf687a..7bc652b 100644
--- a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
+++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
@@ -65,5 +65,13 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
+/* "foo()" is not vectorized FOR NOW because gather-load
+   cannot handle conditional gather loads as of June 2015.
+   
+   The old way of if-converting loads was unsafe because
+   it resulted in thread-unsafe code where the as-written code was OK.
+   The old if conversion did not contain indirection in the loads,
+   so it was simpler, therefor the vectorizer was able to pick it up.  */
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
 /* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c
new file mode 100644
index 0000000..6331d5a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c
@@ -0,0 +1,46 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_int } */
+
+#include <stdarg.h>
+#include "tree-vect.h"
+
+#define N 50
+
+typedef struct {
+  short a;
+  short b;
+} data;
+
+data in1[N], in2[N], out[N];
+
+__attribute__ ((noinline)) void
+foo ()
+{
+  int i;
+  short c, d;
+
+  for (i = 0; i < N; i++)
+    {
+      c = in1[i].b;
+      d = in2[i].b;
+
+      if (c >= d)
+        {
+          out[i].b = 11;
+          out[i].a = d + 5;
+        }
+      else
+        {
+          out[i].b = d - 12;
+          out[i].a = d;
+        }
+    }
+}
+
+/* It is safe to if-convert and vectorize the above loop because
+   out[i].a and out[i].b are always written to.
+   This is the same as "if-cvt-stores-vect-ifcvt-18.c"
+   except there are no array loads inside the "if" here.  */
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 3d1d5f8..eb1c559 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -4393,22 +4393,11 @@ compute_all_dependences (vec<data_reference_p> datarefs,
   return true;
 }
 
-/* Describes a location of a memory reference.  */
-
-typedef struct data_ref_loc_d
-{
-  /* The memory reference.  */
-  tree ref;
-
-  /* True if the memory reference is read.  */
-  bool is_read;
-} data_ref_loc;
-
 
 /* Stores the locations of memory references in STMT to REFERENCES.  Returns
    true if STMT clobbers memory, false otherwise.  */
 
-static bool
+bool
 get_references_in_stmt (gimple stmt, vec<data_ref_loc, va_heap> *references)
 {
   bool clobbers_memory = false;
diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
index edb3b56..06a0f1a 100644
--- a/gcc/tree-data-ref.h
+++ b/gcc/tree-data-ref.h
@@ -557,4 +557,18 @@ lambda_matrix_new (int m, int n, struct obstack *lambda_obstack)
   return mat;
 }
 
+/* Describes a location of a memory reference.  */
+
+typedef struct data_ref_loc_d
+{
+  /* The memory reference.  */
+  tree ref;
+
+  /* True if the memory reference is read.  */
+  bool is_read;
+} data_ref_loc;
+
+bool
+get_references_in_stmt (gimple /* stmt */, vec<data_ref_loc, va_heap>* /* references */);
+
 #endif  /* GCC_TREE_DATA_REF_H  */
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 49ff458..1272a01 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -19,7 +19,7 @@ along with GCC; see the file COPYING3.  If not see
 <http://www.gnu.org/licenses/>.  */
 
 /* This pass implements a tree level if-conversion of loops.  Its
-   initial goal is to help the vectorizer to vectorize loops with
+   initial goal is to help the vectorizer to vectorize loops with 
    conditions.
 
    A short description of if-conversion:
@@ -606,188 +606,39 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
   return true;
 }
 
-/* Records the status of a data reference.  This struct is attached to
-   each DR->aux field.  */
-
-struct ifc_dr {
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int written_at_least_once;
-
-  /* -1 when not initialized, 0 when false, 1 when true.  */
-  int rw_unconditionally;
-};
-
-#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
-#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
-#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
-
-/* Returns true when the memory references of STMT are read or written
-   unconditionally.  In other words, this function returns true when
-   for every data reference A in STMT there exist other accesses to
-   a data reference with the same base with predicates that add up (OR-up) to
-   the true predicate: this ensures that the data reference A is touched
-   (read or written) on every iteration of the if-converted loop.  */
+/* Wrapper around gimple_could_trap_p refined for the needs of the
+   if-conversion.  */
 
 static bool
-memrefs_read_or_written_unconditionally (gimple stmt,
-					 vec<data_reference_p> drs)
+ifcvt_could_trap_p (gimple stmt)
 {
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; drs.iterate (i, &a); i++)
-    if (DR_STMT (a) == stmt)
-      {
-	bool found = false;
-	int x = DR_RW_UNCONDITIONALLY (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; drs.iterate (j, &b); j++)
-          {
-            tree ref_base_a = DR_REF (a);
-            tree ref_base_b = DR_REF (b);
-
-            if (DR_STMT (b) == stmt)
-              continue;
-
-            while (TREE_CODE (ref_base_a) == COMPONENT_REF
-                   || TREE_CODE (ref_base_a) == IMAGPART_EXPR
-                   || TREE_CODE (ref_base_a) == REALPART_EXPR)
-              ref_base_a = TREE_OPERAND (ref_base_a, 0);
-
-            while (TREE_CODE (ref_base_b) == COMPONENT_REF
-                   || TREE_CODE (ref_base_b) == IMAGPART_EXPR
-                   || TREE_CODE (ref_base_b) == REALPART_EXPR)
-              ref_base_b = TREE_OPERAND (ref_base_b, 0);
-
-  	    if (!operand_equal_p (ref_base_a, ref_base_b, 0))
-	      {
-	        tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
-
-	        if (DR_RW_UNCONDITIONALLY (b) == 1
-		    || is_true_predicate (cb)
-		    || is_true_predicate (ca
-                        = fold_or_predicates (EXPR_LOCATION (cb), ca, cb)))
-		  {
-		    DR_RW_UNCONDITIONALLY (a) = 1;
-  		    DR_RW_UNCONDITIONALLY (b) = 1;
-		    found = true;
-		    break;
-		  }
-               }
-	    }
-
-	if (!found)
-	  {
-	    DR_RW_UNCONDITIONALLY (a) = 0;
-	    return false;
-	  }
-      }
+  if (gimple_vuse (stmt)
+      && !gimple_could_trap_p_1 (stmt, false, false))
+    return false;
 
-  return true;
+  return gimple_could_trap_p (stmt);
 }
 
-/* Returns true when the memory references of STMT are unconditionally
-   written.  In other words, this function returns true when for every
-   data reference A written in STMT, there exist other writes to the
-   same data reference with predicates that add up (OR-up) to the true
-   predicate: this ensures that the data reference A is written on
-   every iteration of the if-converted loop.  */
-
+/* Returns true when stmt contains a data reference.  */
 static bool
-write_memrefs_written_at_least_once (gimple stmt,
-				     vec<data_reference_p> drs)
+has_non_addressable_refs (gimple stmt)
 {
-  int i, j;
-  data_reference_p a, b;
-  tree ca = bb_predicate (gimple_bb (stmt));
-
-  for (i = 0; drs.iterate (i, &a); i++)
-    if (DR_STMT (a) == stmt
-	&& DR_IS_WRITE (a))
-      {
-	bool found = false;
-	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
-
-	if (x == 0)
-	  return false;
-
-	if (x == 1)
-	  continue;
-
-	for (j = 0; drs.iterate (j, &b); j++)
-	  if (DR_STMT (b) != stmt
-	      && DR_IS_WRITE (b)
-	      && same_data_refs_base_objects (a, b))
-	    {
-	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
+  vec<data_ref_loc, va_heap>* refs;
+  vec_alloc (refs, 3);
 
-	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
-		  || is_true_predicate (cb)
-		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
-								 ca, cb)))
-		{
-		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
-		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
-		  found = true;
-		  break;
-		}
-	    }
+  bool res = get_references_in_stmt (stmt, refs);
+  unsigned i;
+  data_ref_loc *ref;
 
-	if (!found)
-	  {
-	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
-	    return false;
-	  }
+  FOR_EACH_VEC_ELT (*refs, i, ref)
+    if (may_be_nonaddressable_p (ref->ref))
+      {
+	res = true;
+	break;
       }
 
-  return true;
-}
-
-/* Return true when the memory references of STMT won't trap in the
-   if-converted code.  There are two things that we have to check for:
-
-   - writes to memory occur to writable memory: if-conversion of
-   memory writes transforms the conditional memory writes into
-   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
-   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
-   be executed at all in the original code, it may be a readonly
-   memory.  To check that A is not const-qualified, we check that
-   there exists at least an unconditional write to A in the current
-   function.
-
-   - reads or writes to memory are valid memory accesses for every
-   iteration.  To check that the memory accesses are correctly formed
-   and that we are allowed to read and write in these locations, we
-   check that the memory accesses to be if-converted occur at every
-   iteration unconditionally.  */
-
-static bool
-ifcvt_memrefs_wont_trap (gimple stmt, vec<data_reference_p> refs)
-{
-  return write_memrefs_written_at_least_once (stmt, refs)
-    && memrefs_read_or_written_unconditionally (stmt, refs);
-}
-
-/* Wrapper around gimple_could_trap_p refined for the needs of the
-   if-conversion.  Try to prove that the memory accesses of STMT could
-   not trap in the innermost loop containing STMT.  */
-
-static bool
-ifcvt_could_trap_p (gimple stmt, vec<data_reference_p> refs)
-{
-  if (gimple_vuse (stmt)
-      && !gimple_could_trap_p_1 (stmt, false, false)
-      && ifcvt_memrefs_wont_trap (stmt, refs))
-    return false;
-
-  return gimple_could_trap_p (stmt);
+  vec_free (refs);
+  return res;
 }
 
 /* Return true if STMT could be converted into a masked load or store
@@ -849,7 +700,6 @@ ifcvt_can_use_mask_load_store (gimple stmt)
 
 static bool
 if_convertible_gimple_assign_stmt_p (gimple stmt,
-				     vec<data_reference_p> refs,
 				     bool *any_mask_load_store)
 {
   tree lhs = gimple_assign_lhs (stmt);
@@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 
   if (flag_tree_loop_if_convert_stores)
     {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
 	{
 	  if (ifcvt_can_use_mask_load_store (stmt))
 	    {
@@ -892,9 +742,17 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
 	      return true;
 	    }
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "tree could trap...\n");
+	    fprintf (dump_file, "tree could trap\n");
 	  return false;
 	}
+
+      if (has_non_addressable_refs (stmt))
+	{
+	  if (dump_file && (dump_flags & TDF_DETAILS))
+	    fprintf (dump_file, "has non-addressable memory references\n");
+	  return false;
+	}
+
       return true;
     }
 
@@ -942,7 +800,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
    - it is builtins call.  */
 
 static bool
-if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
+if_convertible_stmt_p (gimple stmt, 
 		       bool *any_mask_load_store)
 {
   switch (gimple_code (stmt))
@@ -953,7 +811,7 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
       return true;
 
     case GIMPLE_ASSIGN:
-      return if_convertible_gimple_assign_stmt_p (stmt, refs,
+      return if_convertible_gimple_assign_stmt_p (stmt,
 						  any_mask_load_store);
 
     case GIMPLE_CALL:
@@ -1173,8 +1031,8 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
    predicate_bbs first allocates the predicates of the basic blocks.
    These fields are then initialized with the tree expressions
    representing the predicates under which a basic block is executed
-   in the LOOP.  As the loop->header is executed at each iteration, it
-   has the "true" predicate.  Other statements executed under a
+   in the LOOP.  As the loop->header is executed at each iteration,
+   it has the "true" predicate.  Other statements executed under a
    condition are predicated with that condition, for example
 
    | if (x)
@@ -1322,17 +1180,7 @@ if_convertible_loop_p_1 (struct loop *loop,
     }
 
   if (flag_tree_loop_if_convert_stores)
-    {
-      data_reference_p dr;
-
-      for (i = 0; refs->iterate (i, &dr); i++)
-	{
-	  dr->aux = XNEW (struct ifc_dr);
-	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
-	  DR_RW_UNCONDITIONALLY (dr) = -1;
-	}
-      predicate_bbs (loop);
-    }
+    predicate_bbs (loop);
 
   for (i = 0; i < loop->num_nodes; i++)
     {
@@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
       /* Check the if-convertibility of statements in predicated BBs.  */
       if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
 	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+	  if (!if_convertible_stmt_p (gsi_stmt (itr), 
 				      any_mask_load_store))
 	    return false;
     }
@@ -1988,6 +1836,82 @@ mask_exists (int size, vec<int> vec)
   return -1;
 }
 
+/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and
+   returns the ADDRESS_OF_AI.  */
+
+static tree
+insert_address_of (tree ai, gimple_stmt_iterator *gsi)
+{
+  tree addr_expr = build_fold_addr_expr (ai);
+  tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_");
+  gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
+
+  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
+  gimple_assign_set_lhs (addr_stmt, address_of_ai);
+  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
+  update_stmt (addr_stmt);
+  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
+
+  return address_of_ai;
+}
+
+/* Insert at the beginning of the first basic block of the current
+   function the allocation on the stack of N_BYTES of memory and
+   return a pointer to this scratchpad memory.  */
+
+static tree
+create_scratchpad (unsigned int n_bytes)
+{
+  basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) );
+  gimple_stmt_iterator gsi = gsi_after_labels (bb);
+  tree x = build_int_cst (integer_type_node, n_bytes - 1);
+  tree elt_type = char_type_node;
+  tree array_type = build_array_type (elt_type, build_index_type (x));
+  tree base = create_tmp_reg (array_type, "scratch_pad");
+  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, NULL_TREE,
+		    NULL_TREE);
+
+  return insert_address_of (a0, &gsi);
+}
+
+/* Returns a memory reference to the pointer defined by the
+   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+			   gimple_stmt_iterator *gsi, bool swap)
+{
+  tree cond_expr;
+  tree pointer;
+  gimple pointer_stmt;
+
+  /* address_of_ai = &A[i];  */
+  tree address_of_ai = insert_address_of (ai, gsi);
+
+  /* Allocate the scratch pad only once per function.  */
+  if (!*scratch_pad)
+    *scratch_pad = create_scratchpad (64);
+
+  /* pointer = cond ? address_of_ai : scratch_pad;  */
+  pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_");
+
+  cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai),
+		      unshare_expr (cond),
+		      swap ? *scratch_pad  : address_of_ai,
+		      swap ? address_of_ai : *scratch_pad);
+
+  pointer_stmt = gimple_build_assign (pointer, cond_expr);
+  pointer = make_ssa_name (pointer, pointer_stmt);
+  gimple_assign_set_lhs (pointer_stmt, pointer);
+  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
+  update_stmt (pointer_stmt);
+  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
+
+  return build2 (MEM_REF, TREE_TYPE (ai), pointer,
+		 build_int_cst (reference_alias_ptr_type (ai), 0));
+}
+
 /* Predicate each write to memory in LOOP.
 
    This function transforms control flow constructs containing memory
@@ -1997,12 +1921,21 @@ mask_exists (int size, vec<int> vec)
    |   if (cond)
    |     A[i] = expr;
 
-   into the following form that does not contain control flow:
+   into the following form:
 
-   | for (i = 0; i < N; i++)
-   |   A[i] = cond ? expr : A[i];
+   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
+   |
+   | for (i = 0; i < N; i++) {
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
+   | }
 
-   The original CFG looks like this:
+   SCRATCH_PAD is allocated on the stack for each function once and it is
+   large enough to contain any kind of scalar assignment or read.  All
+   values read or written to SCRATCH_PAD are not used in the computation.
+
+   In a more detailed way, the if-conversion of memory writes works
+   like this, supposing that the original CFG looks like this:
 
    | bb_0
    |   i = 0
@@ -2052,10 +1985,12 @@ mask_exists (int size, vec<int> vec)
    |   goto bb_1
    | end_bb_4
 
-   predicate_mem_writes is then predicating the memory write as follows:
+   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
+   preceding the loop header, and is predicating the memory write:
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    | end_bb_0
    |
    | bb_1
@@ -2063,12 +1998,14 @@ mask_exists (int size, vec<int> vec)
    | end_bb_1
    |
    | bb_2
+   |   cond = some_computation;
    |   if (cond) goto bb_3 else goto bb_4
    | end_bb_2
    |
    | bb_3
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   goto bb_4
    | end_bb_3
    |
@@ -2081,12 +2018,14 @@ mask_exists (int size, vec<int> vec)
 
    | bb_0
    |   i = 0
+   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
    |   if (i < N) goto bb_5 else goto bb_1
    | end_bb_0
    |
    | bb_1
    |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
    |   if (i < N) goto bb_5 else goto bb_4
    | end_bb_1
    |
@@ -2096,7 +2035,7 @@ mask_exists (int size, vec<int> vec)
 */
 
 static void
-predicate_mem_writes (loop_p loop)
+predicate_mem_writes (loop_p loop, tree *scratch_pad)
 {
   unsigned int i, orig_loop_num_nodes = loop->num_nodes;
   auto_vec<int, 1> vect_sizes;
@@ -2179,24 +2118,29 @@ predicate_mem_writes (loop_p loop)
 	  }
 	else if (gimple_vdef (stmt))
 	  {
-	    tree lhs = gimple_assign_lhs (stmt);
-	    tree rhs = gimple_assign_rhs1 (stmt);
-	    tree type = TREE_TYPE (lhs);
 
-	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
-	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
-	    if (swap)
-	      {
-		tree tem = lhs;
-		lhs = rhs;
-		rhs = tem;
-	      }
-	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
-					       is_gimple_condexpr, NULL_TREE,
-					       true, GSI_SAME_STMT);
-	    rhs = fold_build_cond_expr (type, unshare_expr (cond), rhs, lhs);
-	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
+  	    /* A[i] = x;  */
+	    tree ai = gimple_assign_lhs (stmt);
+
+	    /* pointer = cond ? &A[i] : scratch_pad;  */
+	    tree star_pointer = create_indirect_cond_expr (ai, cond,
+			  scratch_pad, &gsi, swap);
+	    /* *pointer = x;  */
+	    gimple_assign_set_lhs (stmt, star_pointer);
 	    update_stmt (stmt);
+
+	  }
+	else if (gimple_vuse (stmt))
+	  {
+	      /* x = A[i];  */
+	      tree ai = gimple_assign_rhs1 (stmt);
+
+	      /* pointer = cond ? &A[i] : scratch_pad;  */
+	      tree star_pointer = create_indirect_cond_expr (ai, cond,
+					     scratch_pad, &gsi, swap);
+	      /* x = *pointer;  */
+	      gimple_assign_set_rhs1 (stmt, star_pointer);
+	      update_stmt (stmt);
 	  }
     }
 }
@@ -2247,7 +2191,7 @@ remove_conditions_and_labels (loop_p loop)
    blocks.  Replace PHI nodes with conditional modify expressions.  */
 
 static void
-combine_blocks (struct loop *loop, bool any_mask_load_store)
+combine_blocks (struct loop *loop, bool any_mask_load_store, tree *scratch_pad)
 {
   basic_block bb, exit_bb, merge_target_bb;
   unsigned int orig_loop_num_nodes = loop->num_nodes;
@@ -2261,7 +2205,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
   predicate_all_scalar_phis (loop);
 
   if (flag_tree_loop_if_convert_stores || any_mask_load_store)
-    predicate_mem_writes (loop);
+    predicate_mem_writes (loop, scratch_pad);
 
   /* Merge basic blocks: first remove all the edges in the loop,
      except for those from the exit block.  */
@@ -2523,7 +2467,7 @@ ifcvt_walk_pattern_tree (tree var, vec<gimple> *defuse_list,
   return;
 }
 
-/* Returns true if STMT can be a root of bool pattern apllied
+/* Returns true if STMT can be a root of bool pattern applied
    by vectorizer.  */
 
 static bool
@@ -2553,7 +2497,7 @@ stmt_is_root_of_bool_pattern (gimple stmt)
   return false;
 }
 
-/*  Traverse all statements in BB which correspondent to loop header to
+/*  Traverse all statements in BB which correspond to loop header to
     find out all statements which can start bool pattern applied by
     vectorizer and convert multiple uses in it to conform pattern
     restrictions.  Such case can occur if the same predicate is used both
@@ -2634,7 +2578,7 @@ ifcvt_local_dce (basic_block bb)
       gimple_set_plf (phi, GF_PLF_2, true);
       worklist.safe_push (phi);
     }
-  /* Consider load/store statemnts, CALL and COND as live.  */
+  /* Consider load/store statements, CALL and COND as live.  */
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
     {
       stmt = gsi_stmt (gsi);
@@ -2716,13 +2660,13 @@ ifcvt_local_dce (basic_block bb)
    changed.  */
 
 static unsigned int
-tree_if_conversion (struct loop *loop)
+tree_if_conversion (struct loop *loop, tree *scratch_pad)
 {
   unsigned int todo = 0;
   ifc_bbs = NULL;
   bool any_mask_load_store = false;
 
-  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
+  /* Set up aggressive if-conversion for loops marked with simd pragma.  */
   aggressive_if_conv = loop->force_vectorize;
   /* Check either outer loop was marked with simd pragma.  */
   if (!aggressive_if_conv)
@@ -2751,7 +2695,7 @@ tree_if_conversion (struct loop *loop)
   /* Now all statements are if-convertible.  Combine all the basic
      blocks into one huge basic block doing the if-conversion
      on-the-fly.  */
-  combine_blocks (loop, any_mask_load_store);
+  combine_blocks (loop, any_mask_load_store, scratch_pad);
 
   /* Delete dead predicate computations and repair tree correspondent
      to bool pattern to delete multiple uses of preidcates.  */
@@ -2832,12 +2776,14 @@ pass_if_conversion::execute (function *fun)
   if (number_of_loops (fun) <= 1)
     return 0;
 
+  tree scratch_pad = NULL_TREE;
+
   FOR_EACH_LOOP (loop, 0)
     if (flag_tree_loop_if_convert == 1
 	|| flag_tree_loop_if_convert_stores == 1
 	|| ((flag_tree_loop_vectorize || loop->force_vectorize)
 	    && !loop->dont_vectorize))
-      todo |= tree_if_conversion (loop);
+      todo |= tree_if_conversion (loop, &scratch_pad);
 
 #ifdef ENABLE_CHECKING
   {
-- 
2.1.0.243.g30d45f7

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-12 21:05 fix PR46029: reimplement if conversion of loads and stores Abe Skolnik
@ 2015-06-22 16:31 ` Alan Lawrence
  2015-06-24 17:11   ` Jeff Law
  2015-06-24 16:51 ` Ramana Radhakrishnan
  1 sibling, 1 reply; 17+ messages in thread
From: Alan Lawrence @ 2015-06-22 16:31 UTC (permalink / raw)
  To: Abe Skolnik; +Cc: gcc-patches, richard.guenther, sebpop

Abe Skolnik wrote:
> Hi everybody!
> 
> In the current implementation of if conversion, loads and stores are
> if-converted in a thread-unsafe way:
> 
>   * loads were always executed, even when they should have not been.
>     Some source code could be rendered invalid due to null pointers
>     that were OK in the original program because they were never
>     dereferenced.
> 
>   * writes were if-converted via load/maybe-modify/store, which
>     renders some code multithreading-unsafe.
> 
> This patch reimplements if-conversion of loads and stores in a safe
> way using a scratchpad allocated by the compiler on the stack:
> 
>   * loads are done through an indirection, reading either the correct
>     data from the correct source [if the condition is true] or reading
>     from the scratchpad and later ignoring this read result [if the
>     condition is false].
> 
>   * writes are also done through an indirection, writing either to the
>     correct destination [if the condition is true] or to the
>     scratchpad [if the condition is false].
> 
> Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
> old if-conversion resulted in unsafe code that could fail under
> multithreading even though the as-written code _was_ thread-safe.
> 
> Passed regression testing and bootstrap on amd64-linux.
> Is this OK to commit to trunk?
> 
> Regards,
> 
> Abe
> 

Thanks for getting back to this!

My main thought concerns the direction we are travelling here. A major reason 
why we do if-conversion is to enable vectorization. Is this is targetted at 
gathering/scattering loads? Following vectorization, different elements of the 
vector being loaded/stored may have to go to/from the scratchpad or to/from main 
memory.

Or, are we aiming at the case where the predicate or address are invariant? That 
seems unlikely - loop unswitching would be better for the predicate; loading 
from an address, we'd just peel and hoist; storing, this'd result in the address 
holding the last value written, at exit from the loop, a curious idiom. Where 
the predicate/address is invariant across the vector? (!)

Or, at we aiming at non-vectorized code?

Beyond that question...

Does the description for -ftree-loop-if-convert-stores in doc/invoke.texi 
describe what the flag now does? (It doesn't mention loads; the code doesn't 
look like we use scratchpads at all without -ftree-loop-if-convert-stores, or am 
I missing something?)

In tree-if-conv.c:
@@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,

    if (flag_tree_loop_if_convert_stores)
      {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
         {
           if (ifcvt_can_use_mask_load_store (stmt))
             {

and

+
+      if (has_non_addressable_refs (stmt))
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           fprintf (dump_file, "has non-addressable memory references\n");
+         return false;
+       }
+

if it doesn't trap, but has_non_addressable_refs, can't we use 
ifcvt_can_use_mask_load_store there too?

And/or, I think I may be misunderstanding here, but if an access could trap, but 
is addressable, can't we use the scratchpad technique to get round the trapping 
problem?

(Look at it another way - this patch makes strictly more things return true from 
ifcvt_could_trap_p, which always exits immediately from 
if_convertible_gimple_assign_stmt_p...?)


Re. creation of scratchpads:
    (1) Should the '64' byte size be the result of scanning the function, for 
the largest data size to which we store? (ideally, conditionally store!)
    (2) Allocating only once per function: if we had one scratchpad per loop, it 
could/would live inside the test of "gimple_build_call_internal 
(IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more loops in the 
function, but then fail to vectorize them, we'll leave the scratchpad around for 
later phases to clean up. Is that OK?


Also some style nits:

@@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
        /* Check the if-convertibility of statements in predicated BBs.  */
        if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
         for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
-         if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
+         if (!if_convertible_stmt_p (gsi_stmt (itr),
                                       any_mask_load_store))
             return false;
      }

bet that fits on one line now.

+ * Returns a memory reference to the pointer defined by the
+    conditional expression: pointer = cond ? &A[i] : scratch_pad; and
+   inserts this code at GSI.  */
+
+static tree
+create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
+                          gimple_stmt_iterator *gsi, bool swap)

in comment, should A[i] just be AI, as I see nothing in 
create_indirect_cond_expr that requires ai to be an array dereference?

@@ -2063,12 +1998,14 @@ mask_exists (int size, vec<int> vec)
     | end_bb_1
     |
     | bb_2
+   |   cond = some_computation;
     |   if (cond) goto bb_3 else goto bb_4
     | end_bb_2
     |
     | bb_3
     |   cond = some_computation;
-   |   A[i] = cond ? expr : A[i];
+   |   p = cond ? &A[i] : scratch_pad;
+   |   *p = expr;
     |   goto bb_4
     | end_bb_3
     |

I think you want to remove the 'cond = some_computation' from bb_3 rather than 
copy it.

In tree-data-ref.h, data structures and macro defs came (mingled together) 
before prototypes for functions defined elsewhere, with function declarations 
beneath.


Thanks, Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-12 21:05 fix PR46029: reimplement if conversion of loads and stores Abe Skolnik
  2015-06-22 16:31 ` Alan Lawrence
@ 2015-06-24 16:51 ` Ramana Radhakrishnan
  2015-06-24 17:02   ` Jeff Law
  1 sibling, 1 reply; 17+ messages in thread
From: Ramana Radhakrishnan @ 2015-06-24 16:51 UTC (permalink / raw)
  To: gcc-patches; +Cc: s.pop, a.skolknik



On 12/06/15 21:50, Abe Skolnik wrote:
> Hi everybody!
>
> In the current implementation of if conversion, loads and stores are
> if-converted in a thread-unsafe way:
>
>    * loads were always executed, even when they should have not been.
>      Some source code could be rendered invalid due to null pointers
>      that were OK in the original program because they were never
>      dereferenced.
>
>    * writes were if-converted via load/maybe-modify/store, which
>      renders some code multithreading-unsafe.
>
> This patch reimplements if-conversion of loads and stores in a safe
> way using a scratchpad allocated by the compiler on the stack:
>
>    * loads are done through an indirection, reading either the correct
>      data from the correct source [if the condition is true] or reading
>      from the scratchpad and later ignoring this read result [if the
>      condition is false].
>
>    * writes are also done through an indirection, writing either to the
>      correct destination [if the condition is true] or to the
>      scratchpad [if the condition is false].
>
> Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
> old if-conversion resulted in unsafe code that could fail under
> multithreading even though the as-written code _was_ thread-safe.
>
> Passed regression testing and bootstrap on amd64-linux.
> Is this OK to commit to trunk?

I can't approve or reject but this certainly looks like an improvement 
compared to where we are as we get rid of the data races.

The only gotcha I can think with this approach is that this introduces 
false dependencies that would cause "unnecessary" write-after-write 
hazards with the writes to the scratchpad when you unroll the loop - but 
that's not necessarily worse than where we are today.

Some fun stats from a previous Friday afternoon poke at this without 
doing any benchmarking as such.

In a bootstrap with BOOT_CFLAGS="-O2 -ftree-loop-if-convert-stores" and 
one without it, I see about 12.20% more csel's on an AArch64 bootstrap 
(goes from 7898 -> 8862) vs plain old -O2.

And I did see the one case in libquantum get sorted with this, though 
the performance results were funny let's say (+5% in one case, -1.5% on 
another core), I haven't analysed it deeply yet but it does look 
interesting.

regards
Ramana


>
> Regards,
>
> Abe
>
>
>
>
> 2015-06-12  Sebastian Pop  <s.pop@samsung.com>
>              Abe Skolnik  <a.skolnik@samsung.com>
>
> 	PR tree-optimization/46029
> 	* tree-data-ref.c (struct data_ref_loc_d): Moved...
> 	(get_references_in_stmt): Exported.
> 	* tree-data-ref.h (struct data_ref_loc_d): ... here.
> 	(get_references_in_stmt): Declared.
>
> 	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
> 	* tree-if-conv.c (struct ifc_dr): Removed.
> 	(IFC_DR): Removed.
> 	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
> 	(DR_RW_UNCONDITIONALLY): Removed.
> 	(memrefs_read_or_written_unconditionally): Removed.
> 	(write_memrefs_written_at_least_once): Removed.
> 	(ifcvt_could_trap_p): Does not take refs parameter anymore.
> 	(ifcvt_memrefs_wont_trap): Removed.
> 	(has_non_addressable_refs): New.
> 	(if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs.
> 	Removed use of refs.
> 	(if_convertible_stmt_p): Removed use of refs.
> 	(if_convertible_gimple_assign_stmt_p): Same.
> 	(if_convertible_loop_p_1): Removed use of refs.  Remove initialization
> 	of dr->aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
> 	(insert_address_of): New.
> 	(create_scratchpad): New.
> 	(create_indirect_cond_expr): New.
> 	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
> 	parameter for scratch_pad.
> 	(combine_blocks): Same.
> 	(tree_if_conversion): Same.
>
> 	testsuite/
> 	* g++.dg/tree-ssa/ifc-pr46029.C: New.
> 	* gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel.
> 	* gcc.dg/tree-ssa/ifc-8.c: New.
> 	* gcc.dg/tree-ssa/ifc-9.c: New.
> 	* gcc.dg/tree-ssa/ifc-10.c: New.
> 	* gcc.dg/tree-ssa/ifc-11.c: New.
> 	* gcc.dg/tree-ssa/ifc-12.c: New.
> 	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled.
> 	* gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c: New.
> ---
>   gcc/ChangeLog                                      |  28 ++
>   gcc/doc/invoke.texi                                |  18 +-
>   gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C        |  76 ++++
>   gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c             |  17 +
>   gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c             |  16 +
>   gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c             |  13 +
>   gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c              |  19 +-
>   gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c              |  29 ++
>   gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c              |  17 +
>   .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c      |  10 +-
>   .../gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c      |  46 +++
>   gcc/tree-data-ref.c                                |  13 +-
>   gcc/tree-data-ref.h                                |  14 +
>   gcc/tree-if-conv.c                                 | 392 +++++++++------------
>   14 files changed, 460 insertions(+), 248 deletions(-)
>   create mode 100644 gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
>   create mode 100644 gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3dec6b1..70af07c 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,31 @@
> +2015-05-18  Sebastian Pop  <s.pop@samsung.com>
> +
> +	PR tree-optimization/46029
> +	* doc/invoke.texi (-ftree-loop-if-convert-stores): Update description.
> +	* tree-if-conv.c (has_unaligned_memory_refs): New.
> +	(if_convertible_gimple_assign_stmt_p): Call has_unaligned_memory_refs.
> +	(create_scratchpad): New.
> +	(create_indirect_cond_expr): New.
> +	(predicate_mem_writes): Call create_indirect_cond_expr.  Take an extra
> +	parameter for scratch_pad.
> +	(combine_blocks): Same.
> +	(tree_if_conversion): Same.
> +	(main_tree_if_conversion): Pass to tree_if_conversion a pointer to
> +	scratch_pad.
> +	(struct ifc_dr): Removed.
> +	(IFC_DR): Removed.
> +	(DR_WRITTEN_AT_LEAST_ONCE): Removed.
> +	(DR_RW_UNCONDITIONALLY): Removed.
> +	(memrefs_read_or_written_unconditionally): Removed.
> +	(write_memrefs_written_at_least_once): Removed.
> +	(ifcvt_memrefs_wont_trap): Removed.
> +	(ifcvt_could_trap_p): Does not take refs parameter anymore.
> +	(if_convertible_gimple_assign_stmt_p): Same.
> +	(if_convertible_stmt_p): Same.
> +	(if_convertible_loop_p_1): Remove initialization of dr->aux,
> +	DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY.
> +	(if_convertible_loop_p): Remove deallocation of the same.
> +
>   2015-05-15  Marc Glisse  <marc.glisse@inria.fr>
>
>   	PR tree-optimization/64454
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 117b5d9..c0e27e8 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -8815,20 +8815,26 @@ if vectorization is enabled.
>   @item -ftree-loop-if-convert-stores
>   @opindex ftree-loop-if-convert-stores
>   Attempt to also if-convert conditional jumps containing memory writes.
> -This transformation can be unsafe for multi-threaded programs as it
> -transforms conditional memory writes into unconditional memory writes.
>   For example,
>   @smallexample
>   for (i = 0; i < N; i++)
>     if (cond)
> -    A[i] = expr;
> +    A[i] = B[i] + 2;
>   @end smallexample
>   is transformed to
>   @smallexample
> -for (i = 0; i < N; i++)
> -  A[i] = cond ? expr : A[i];
> +void *scratchpad = alloca (64);
> +for (i = 0; i < N; i++) @{
> +  a = cond ? &A[i] : scratchpad;
> +  b = cond ? &B[i] : scratchpad;
> +  *a = *b + 2;
> +@}
>   @end smallexample
> -potentially producing data races.
> +The compiler allocates a scratchpad memory on the stack for each
> +function in which the if-conversion of memory stores or reads
> +happened.  This scratchpad memory is used during the part of the
> +computation that is discarded, i.e., when the condition is evaluated
> +to false.
>
>   @item -ftree-loop-distribution
>   @opindex ftree-loop-distribution
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
> new file mode 100644
> index 0000000..2a54bdb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/ifc-pr46029.C
> @@ -0,0 +1,76 @@
> +// { dg-do run }
> +/* { dg-options "-O -ftree-loop-if-convert-stores" } */
> +
> +namespace
> +{
> +  struct rb_tree_node_
> +  {
> +    rb_tree_node_ ():m_p_left (0), m_p_parent (0), m_metadata (0)
> +    {
> +    }
> +    unsigned &get_metadata ()
> +    {
> +      return m_metadata;
> +    }
> +    rb_tree_node_ *m_p_left;
> +    rb_tree_node_ *m_p_parent;
> +    unsigned m_metadata;
> +  };
> +
> +  struct bin_search_tree_const_node_it_
> +  {
> +    bin_search_tree_const_node_it_ (rb_tree_node_ * p_nd):m_p_nd (p_nd)
> +    {
> +    }
> +    unsigned &get_metadata ()
> +    {
> +      return m_p_nd->get_metadata ();
> +    }
> +    bin_search_tree_const_node_it_ get_l_child ()
> +    {
> +      return bin_search_tree_const_node_it_ (m_p_nd->m_p_left);
> +    }
> +
> +    rb_tree_node_ *m_p_nd;
> +  };
> +
> +  struct bin_search_tree_no_data_
> +  {
> +    typedef rb_tree_node_ *node_pointer;
> +      bin_search_tree_no_data_ ():m_p_head (new rb_tree_node_ ())
> +    {
> +    }
> +    void insert_imp_empty (int r_value)
> +    {
> +      rb_tree_node_ *p_new_node = new rb_tree_node_ ();
> +      m_p_head->m_p_parent = p_new_node;
> +      p_new_node->m_p_parent = m_p_head;
> +      update_to_top (m_p_head->m_p_parent);
> +    }
> +    void apply_update (bin_search_tree_const_node_it_ nd_it)
> +    {
> +      unsigned
> +	l_max_endpoint
> +	=
> +	(nd_it.get_l_child ().m_p_nd ==
> +	 0) ? 0 : nd_it.get_l_child ().get_metadata ();
> +      nd_it.get_metadata () = l_max_endpoint;
> +    }
> +    void update_to_top (node_pointer p_nd)
> +    {
> +      while (p_nd != m_p_head)
> +	{
> +	  apply_update (p_nd);
> +	  p_nd = p_nd->m_p_parent;
> +	}
> +    }
> +
> +    rb_tree_node_ * m_p_head;
> +  };
> +}
> +
> +int main ()
> +{
> +  bin_search_tree_no_data_ ().insert_imp_empty (0);
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
> new file mode 100644
> index 0000000..3dcc202
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-10.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +
> +/* middle-end/PR65947 */
> +
> +int a[32];
> +
> +int main(int argc, char **argv)
> +{
> +  int res = 3;
> +  for (int i = 0; i < 32; i++)
> +    if (a[i]) res = 7;
> +  return res;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
> +/* { dg-final { cleanup-tree-dump "ifcvt" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
> new file mode 100644
> index 0000000..2943c04
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-11.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +
> +int read(int *r, int *c)
> +{
> +  int res = 42;
> +  int i;
> +  for (i = 0; i < 32; i++)
> +    if (c[i])
> +      res = *r;
> +
> +  return res;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
> +/* { dg-final { cleanup-tree-dump "ifcvt" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> new file mode 100644
> index 0000000..01acf0b
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-12.c
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O3 -ftree-loop-if-convert-stores -fdump-tree-ifcvt-details -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +
> +void write(int *r, int *c)
> +{
> +  int i;
> +  for (i = 0; i < 32; i++)
> +    if (c[i])
> +      *r = c[i];
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
> +/* { dg-final { cleanup-tree-dump "ifcvt" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> index a9cc816..4be2cdb 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c
> @@ -1,5 +1,5 @@
>   /* { dg-do compile } */
> -/* { dg-options "-c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */
>
>   void
>   dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
> @@ -12,11 +12,18 @@ dct_unquantize_h263_inter_c (short *block, int n, int qscale, int nCoeffs)
>     for (i = 0; i <= nCoeffs; i++)
>       {
>         level = block[i];
> -      if (level < 0)
> -	level = level * qmul - qadd;
> -      else
> -	level = level * qmul + qadd;
> -      block[i] = level;
> +      if (level)
> +        {
> +          if (level < 0)
> +            {
> +              level = level * qmul - qadd;
> +            }
> +          else
> +            {
> +              level = level * qmul + qadd;
> +            }
> +          block[i] = level;
> +        }
>       }
>   }
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
> new file mode 100644
> index 0000000..d7cf279
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-8.c
> @@ -0,0 +1,29 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O2 -ftree-vectorize" { target *-*-* } } */
> +
> +typedef union tree_node *tree;
> +struct tree_common
> +{
> +  unsigned volatile_flag : 1;
> +  unsigned unsigned_flag : 1;
> +};
> +struct tree_type
> +{
> +  tree next_variant;
> +  tree main_variant;
> +};
> +union tree_node
> +{
> +  struct tree_common common;
> +  struct tree_type type;
> +};
> +void finish_enum (tree enumtype)
> +{
> +  tree tem;
> +  for (tem = ((enumtype)->type.main_variant); tem; tem = ((tem)->type.next_variant))
> +    {
> +      if (tem == enumtype)
> +	continue;
> +      ((tem)->common.unsigned_flag) = ((enumtype)->common.unsigned_flag);
> +    }
> +}
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
> new file mode 100644
> index 0000000..6d82bb4
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-9.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-options "-c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats" { target *-*-* } } */
> +
> +/* middle-end/PR65947 */
> +
> +int a[32];
> +
> +int main(int argc, char **argv)
> +{
> +  int res = 3;
> +  for (int i = 0; i < 32; i++)
> +    if (a[i]) res = 7;
> +  return res;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "Applying if-conversion" 1 "ifcvt" } } */
> +/* { dg-final { cleanup-tree-dump "ifcvt" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> index cdf687a..7bc652b 100644
> --- a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> +++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c
> @@ -65,5 +65,13 @@ main (void)
>     return 0;
>   }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
> +/* "foo()" is not vectorized FOR NOW because gather-load
> +   cannot handle conditional gather loads as of June 2015.
> +
> +   The old way of if-converting loads was unsafe because
> +   it resulted in thread-unsafe code where the as-written code was OK.
> +   The old if conversion did not contain indirection in the loads,
> +   so it was simpler, therefor the vectorizer was able to pick it up.  */
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 0 "vect"  { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
>   /* { dg-final { cleanup-tree-dump "vect" } } */
> diff --git a/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c
> new file mode 100644
> index 0000000..6331d5a
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/vect/if-cvt-stores-vect-ifcvt-19.c
> @@ -0,0 +1,46 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_int } */
> +
> +#include <stdarg.h>
> +#include "tree-vect.h"
> +
> +#define N 50
> +
> +typedef struct {
> +  short a;
> +  short b;
> +} data;
> +
> +data in1[N], in2[N], out[N];
> +
> +__attribute__ ((noinline)) void
> +foo ()
> +{
> +  int i;
> +  short c, d;
> +
> +  for (i = 0; i < N; i++)
> +    {
> +      c = in1[i].b;
> +      d = in2[i].b;
> +
> +      if (c >= d)
> +        {
> +          out[i].b = 11;
> +          out[i].a = d + 5;
> +        }
> +      else
> +        {
> +          out[i].b = d - 12;
> +          out[i].a = d;
> +        }
> +    }
> +}
> +
> +/* It is safe to if-convert and vectorize the above loop because
> +   out[i].a and out[i].b are always written to.
> +   This is the same as "if-cvt-stores-vect-ifcvt-18.c"
> +   except there are no array loads inside the "if" here.  */
> +
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  { xfail { { vect_no_align && { ! vect_hw_misalign } } || { ! vect_strided2 } } } } } */
> +/* { dg-final { cleanup-tree-dump "vect" } } */
> diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
> index 3d1d5f8..eb1c559 100644
> --- a/gcc/tree-data-ref.c
> +++ b/gcc/tree-data-ref.c
> @@ -4393,22 +4393,11 @@ compute_all_dependences (vec<data_reference_p> datarefs,
>     return true;
>   }
>
> -/* Describes a location of a memory reference.  */
> -
> -typedef struct data_ref_loc_d
> -{
> -  /* The memory reference.  */
> -  tree ref;
> -
> -  /* True if the memory reference is read.  */
> -  bool is_read;
> -} data_ref_loc;
> -
>
>   /* Stores the locations of memory references in STMT to REFERENCES.  Returns
>      true if STMT clobbers memory, false otherwise.  */
>
> -static bool
> +bool
>   get_references_in_stmt (gimple stmt, vec<data_ref_loc, va_heap> *references)
>   {
>     bool clobbers_memory = false;
> diff --git a/gcc/tree-data-ref.h b/gcc/tree-data-ref.h
> index edb3b56..06a0f1a 100644
> --- a/gcc/tree-data-ref.h
> +++ b/gcc/tree-data-ref.h
> @@ -557,4 +557,18 @@ lambda_matrix_new (int m, int n, struct obstack *lambda_obstack)
>     return mat;
>   }
>
> +/* Describes a location of a memory reference.  */
> +
> +typedef struct data_ref_loc_d
> +{
> +  /* The memory reference.  */
> +  tree ref;
> +
> +  /* True if the memory reference is read.  */
> +  bool is_read;
> +} data_ref_loc;
> +
> +bool
> +get_references_in_stmt (gimple /* stmt */, vec<data_ref_loc, va_heap>* /* references */);
> +
>   #endif  /* GCC_TREE_DATA_REF_H  */
> diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
> index 49ff458..1272a01 100644
> --- a/gcc/tree-if-conv.c
> +++ b/gcc/tree-if-conv.c
> @@ -19,7 +19,7 @@ along with GCC; see the file COPYING3.  If not see
>   <http://www.gnu.org/licenses/>.  */
>
>   /* This pass implements a tree level if-conversion of loops.  Its
> -   initial goal is to help the vectorizer to vectorize loops with
> +   initial goal is to help the vectorizer to vectorize loops with
>      conditions.
>
>      A short description of if-conversion:
> @@ -606,188 +606,39 @@ if_convertible_phi_p (struct loop *loop, basic_block bb, gphi *phi,
>     return true;
>   }
>
> -/* Records the status of a data reference.  This struct is attached to
> -   each DR->aux field.  */
> -
> -struct ifc_dr {
> -  /* -1 when not initialized, 0 when false, 1 when true.  */
> -  int written_at_least_once;
> -
> -  /* -1 when not initialized, 0 when false, 1 when true.  */
> -  int rw_unconditionally;
> -};
> -
> -#define IFC_DR(DR) ((struct ifc_dr *) (DR)->aux)
> -#define DR_WRITTEN_AT_LEAST_ONCE(DR) (IFC_DR (DR)->written_at_least_once)
> -#define DR_RW_UNCONDITIONALLY(DR) (IFC_DR (DR)->rw_unconditionally)
> -
> -/* Returns true when the memory references of STMT are read or written
> -   unconditionally.  In other words, this function returns true when
> -   for every data reference A in STMT there exist other accesses to
> -   a data reference with the same base with predicates that add up (OR-up) to
> -   the true predicate: this ensures that the data reference A is touched
> -   (read or written) on every iteration of the if-converted loop.  */
> +/* Wrapper around gimple_could_trap_p refined for the needs of the
> +   if-conversion.  */
>
>   static bool
> -memrefs_read_or_written_unconditionally (gimple stmt,
> -					 vec<data_reference_p> drs)
> +ifcvt_could_trap_p (gimple stmt)
>   {
> -  int i, j;
> -  data_reference_p a, b;
> -  tree ca = bb_predicate (gimple_bb (stmt));
> -
> -  for (i = 0; drs.iterate (i, &a); i++)
> -    if (DR_STMT (a) == stmt)
> -      {
> -	bool found = false;
> -	int x = DR_RW_UNCONDITIONALLY (a);
> -
> -	if (x == 0)
> -	  return false;
> -
> -	if (x == 1)
> -	  continue;
> -
> -	for (j = 0; drs.iterate (j, &b); j++)
> -          {
> -            tree ref_base_a = DR_REF (a);
> -            tree ref_base_b = DR_REF (b);
> -
> -            if (DR_STMT (b) == stmt)
> -              continue;
> -
> -            while (TREE_CODE (ref_base_a) == COMPONENT_REF
> -                   || TREE_CODE (ref_base_a) == IMAGPART_EXPR
> -                   || TREE_CODE (ref_base_a) == REALPART_EXPR)
> -              ref_base_a = TREE_OPERAND (ref_base_a, 0);
> -
> -            while (TREE_CODE (ref_base_b) == COMPONENT_REF
> -                   || TREE_CODE (ref_base_b) == IMAGPART_EXPR
> -                   || TREE_CODE (ref_base_b) == REALPART_EXPR)
> -              ref_base_b = TREE_OPERAND (ref_base_b, 0);
> -
> -  	    if (!operand_equal_p (ref_base_a, ref_base_b, 0))
> -	      {
> -	        tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> -
> -	        if (DR_RW_UNCONDITIONALLY (b) == 1
> -		    || is_true_predicate (cb)
> -		    || is_true_predicate (ca
> -                        = fold_or_predicates (EXPR_LOCATION (cb), ca, cb)))
> -		  {
> -		    DR_RW_UNCONDITIONALLY (a) = 1;
> -  		    DR_RW_UNCONDITIONALLY (b) = 1;
> -		    found = true;
> -		    break;
> -		  }
> -               }
> -	    }
> -
> -	if (!found)
> -	  {
> -	    DR_RW_UNCONDITIONALLY (a) = 0;
> -	    return false;
> -	  }
> -      }
> +  if (gimple_vuse (stmt)
> +      && !gimple_could_trap_p_1 (stmt, false, false))
> +    return false;
>
> -  return true;
> +  return gimple_could_trap_p (stmt);
>   }
>
> -/* Returns true when the memory references of STMT are unconditionally
> -   written.  In other words, this function returns true when for every
> -   data reference A written in STMT, there exist other writes to the
> -   same data reference with predicates that add up (OR-up) to the true
> -   predicate: this ensures that the data reference A is written on
> -   every iteration of the if-converted loop.  */
> -
> +/* Returns true when stmt contains a data reference.  */
>   static bool
> -write_memrefs_written_at_least_once (gimple stmt,
> -				     vec<data_reference_p> drs)
> +has_non_addressable_refs (gimple stmt)
>   {
> -  int i, j;
> -  data_reference_p a, b;
> -  tree ca = bb_predicate (gimple_bb (stmt));
> -
> -  for (i = 0; drs.iterate (i, &a); i++)
> -    if (DR_STMT (a) == stmt
> -	&& DR_IS_WRITE (a))
> -      {
> -	bool found = false;
> -	int x = DR_WRITTEN_AT_LEAST_ONCE (a);
> -
> -	if (x == 0)
> -	  return false;
> -
> -	if (x == 1)
> -	  continue;
> -
> -	for (j = 0; drs.iterate (j, &b); j++)
> -	  if (DR_STMT (b) != stmt
> -	      && DR_IS_WRITE (b)
> -	      && same_data_refs_base_objects (a, b))
> -	    {
> -	      tree cb = bb_predicate (gimple_bb (DR_STMT (b)));
> +  vec<data_ref_loc, va_heap>* refs;
> +  vec_alloc (refs, 3);
>
> -	      if (DR_WRITTEN_AT_LEAST_ONCE (b) == 1
> -		  || is_true_predicate (cb)
> -		  || is_true_predicate (ca = fold_or_predicates (EXPR_LOCATION (cb),
> -								 ca, cb)))
> -		{
> -		  DR_WRITTEN_AT_LEAST_ONCE (a) = 1;
> -		  DR_WRITTEN_AT_LEAST_ONCE (b) = 1;
> -		  found = true;
> -		  break;
> -		}
> -	    }
> +  bool res = get_references_in_stmt (stmt, refs);
> +  unsigned i;
> +  data_ref_loc *ref;
>
> -	if (!found)
> -	  {
> -	    DR_WRITTEN_AT_LEAST_ONCE (a) = 0;
> -	    return false;
> -	  }
> +  FOR_EACH_VEC_ELT (*refs, i, ref)
> +    if (may_be_nonaddressable_p (ref->ref))
> +      {
> +	res = true;
> +	break;
>         }
>
> -  return true;
> -}
> -
> -/* Return true when the memory references of STMT won't trap in the
> -   if-converted code.  There are two things that we have to check for:
> -
> -   - writes to memory occur to writable memory: if-conversion of
> -   memory writes transforms the conditional memory writes into
> -   unconditional writes, i.e. "if (cond) A[i] = foo" is transformed
> -   into "A[i] = cond ? foo : A[i]", and as the write to memory may not
> -   be executed at all in the original code, it may be a readonly
> -   memory.  To check that A is not const-qualified, we check that
> -   there exists at least an unconditional write to A in the current
> -   function.
> -
> -   - reads or writes to memory are valid memory accesses for every
> -   iteration.  To check that the memory accesses are correctly formed
> -   and that we are allowed to read and write in these locations, we
> -   check that the memory accesses to be if-converted occur at every
> -   iteration unconditionally.  */
> -
> -static bool
> -ifcvt_memrefs_wont_trap (gimple stmt, vec<data_reference_p> refs)
> -{
> -  return write_memrefs_written_at_least_once (stmt, refs)
> -    && memrefs_read_or_written_unconditionally (stmt, refs);
> -}
> -
> -/* Wrapper around gimple_could_trap_p refined for the needs of the
> -   if-conversion.  Try to prove that the memory accesses of STMT could
> -   not trap in the innermost loop containing STMT.  */
> -
> -static bool
> -ifcvt_could_trap_p (gimple stmt, vec<data_reference_p> refs)
> -{
> -  if (gimple_vuse (stmt)
> -      && !gimple_could_trap_p_1 (stmt, false, false)
> -      && ifcvt_memrefs_wont_trap (stmt, refs))
> -    return false;
> -
> -  return gimple_could_trap_p (stmt);
> +  vec_free (refs);
> +  return res;
>   }
>
>   /* Return true if STMT could be converted into a masked load or store
> @@ -849,7 +700,6 @@ ifcvt_can_use_mask_load_store (gimple stmt)
>
>   static bool
>   if_convertible_gimple_assign_stmt_p (gimple stmt,
> -				     vec<data_reference_p> refs,
>   				     bool *any_mask_load_store)
>   {
>     tree lhs = gimple_assign_lhs (stmt);
> @@ -883,7 +733,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>
>     if (flag_tree_loop_if_convert_stores)
>       {
> -      if (ifcvt_could_trap_p (stmt, refs))
> +      if (ifcvt_could_trap_p (stmt))
>   	{
>   	  if (ifcvt_can_use_mask_load_store (stmt))
>   	    {
> @@ -892,9 +742,17 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>   	      return true;
>   	    }
>   	  if (dump_file && (dump_flags & TDF_DETAILS))
> -	    fprintf (dump_file, "tree could trap...\n");
> +	    fprintf (dump_file, "tree could trap\n");
>   	  return false;
>   	}
> +
> +      if (has_non_addressable_refs (stmt))
> +	{
> +	  if (dump_file && (dump_flags & TDF_DETAILS))
> +	    fprintf (dump_file, "has non-addressable memory references\n");
> +	  return false;
> +	}
> +
>         return true;
>       }
>
> @@ -942,7 +800,7 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
>      - it is builtins call.  */
>
>   static bool
> -if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
> +if_convertible_stmt_p (gimple stmt,
>   		       bool *any_mask_load_store)
>   {
>     switch (gimple_code (stmt))
> @@ -953,7 +811,7 @@ if_convertible_stmt_p (gimple stmt, vec<data_reference_p> refs,
>         return true;
>
>       case GIMPLE_ASSIGN:
> -      return if_convertible_gimple_assign_stmt_p (stmt, refs,
> +      return if_convertible_gimple_assign_stmt_p (stmt,
>   						  any_mask_load_store);
>
>       case GIMPLE_CALL:
> @@ -1173,8 +1031,8 @@ get_loop_body_in_if_conv_order (const struct loop *loop)
>      predicate_bbs first allocates the predicates of the basic blocks.
>      These fields are then initialized with the tree expressions
>      representing the predicates under which a basic block is executed
> -   in the LOOP.  As the loop->header is executed at each iteration, it
> -   has the "true" predicate.  Other statements executed under a
> +   in the LOOP.  As the loop->header is executed at each iteration,
> +   it has the "true" predicate.  Other statements executed under a
>      condition are predicated with that condition, for example
>
>      | if (x)
> @@ -1322,17 +1180,7 @@ if_convertible_loop_p_1 (struct loop *loop,
>       }
>
>     if (flag_tree_loop_if_convert_stores)
> -    {
> -      data_reference_p dr;
> -
> -      for (i = 0; refs->iterate (i, &dr); i++)
> -	{
> -	  dr->aux = XNEW (struct ifc_dr);
> -	  DR_WRITTEN_AT_LEAST_ONCE (dr) = -1;
> -	  DR_RW_UNCONDITIONALLY (dr) = -1;
> -	}
> -      predicate_bbs (loop);
> -    }
> +    predicate_bbs (loop);
>
>     for (i = 0; i < loop->num_nodes; i++)
>       {
> @@ -1342,7 +1190,7 @@ if_convertible_loop_p_1 (struct loop *loop,
>         /* Check the if-convertibility of statements in predicated BBs.  */
>         if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>   	for (itr = gsi_start_bb (bb); !gsi_end_p (itr); gsi_next (&itr))
> -	  if (!if_convertible_stmt_p (gsi_stmt (itr), *refs,
> +	  if (!if_convertible_stmt_p (gsi_stmt (itr),
>   				      any_mask_load_store))
>   	    return false;
>       }
> @@ -1988,6 +1836,82 @@ mask_exists (int size, vec<int> vec)
>     return -1;
>   }
>
> +/* Inserts at position GSI a statement "ADDRESS_OF_AI = &AI;" and
> +   returns the ADDRESS_OF_AI.  */
> +
> +static tree
> +insert_address_of (tree ai, gimple_stmt_iterator *gsi)
> +{
> +  tree addr_expr = build_fold_addr_expr (ai);
> +  tree address_of_ai = create_tmp_reg (TREE_TYPE (addr_expr), "_ifc_");
> +  gimple addr_stmt = gimple_build_assign (address_of_ai, addr_expr);
> +
> +  address_of_ai = make_ssa_name (address_of_ai, addr_stmt);
> +  gimple_assign_set_lhs (addr_stmt, address_of_ai);
> +  SSA_NAME_DEF_STMT (address_of_ai) = addr_stmt;
> +  update_stmt (addr_stmt);
> +  gsi_insert_before (gsi, addr_stmt, GSI_SAME_STMT);
> +
> +  return address_of_ai;
> +}
> +
> +/* Insert at the beginning of the first basic block of the current
> +   function the allocation on the stack of N_BYTES of memory and
> +   return a pointer to this scratchpad memory.  */
> +
> +static tree
> +create_scratchpad (unsigned int n_bytes)
> +{
> +  basic_block bb = single_succ ( ENTRY_BLOCK_PTR_FOR_FN(cfun) );
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +  tree x = build_int_cst (integer_type_node, n_bytes - 1);
> +  tree elt_type = char_type_node;
> +  tree array_type = build_array_type (elt_type, build_index_type (x));
> +  tree base = create_tmp_reg (array_type, "scratch_pad");
> +  tree a0 = build4 (ARRAY_REF, elt_type, base, integer_zero_node, NULL_TREE,
> +		    NULL_TREE);
> +
> +  return insert_address_of (a0, &gsi);
> +}
> +
> +/* Returns a memory reference to the pointer defined by the
> +   conditional expression: pointer = cond ? &A[i] : scratch_pad; and
> +   inserts this code at GSI.  */
> +
> +static tree
> +create_indirect_cond_expr (tree ai, tree cond, tree *scratch_pad,
> +			   gimple_stmt_iterator *gsi, bool swap)
> +{
> +  tree cond_expr;
> +  tree pointer;
> +  gimple pointer_stmt;
> +
> +  /* address_of_ai = &A[i];  */
> +  tree address_of_ai = insert_address_of (ai, gsi);
> +
> +  /* Allocate the scratch pad only once per function.  */
> +  if (!*scratch_pad)
> +    *scratch_pad = create_scratchpad (64);
> +
> +  /* pointer = cond ? address_of_ai : scratch_pad;  */
> +  pointer = create_tmp_reg (TREE_TYPE (address_of_ai), "_ifc_");
> +
> +  cond_expr = build3 (COND_EXPR, TREE_TYPE (address_of_ai),
> +		      unshare_expr (cond),
> +		      swap ? *scratch_pad  : address_of_ai,
> +		      swap ? address_of_ai : *scratch_pad);
> +
> +  pointer_stmt = gimple_build_assign (pointer, cond_expr);
> +  pointer = make_ssa_name (pointer, pointer_stmt);
> +  gimple_assign_set_lhs (pointer_stmt, pointer);
> +  SSA_NAME_DEF_STMT (pointer) = pointer_stmt;
> +  update_stmt (pointer_stmt);
> +  gsi_insert_before (gsi, pointer_stmt, GSI_SAME_STMT);
> +
> +  return build2 (MEM_REF, TREE_TYPE (ai), pointer,
> +		 build_int_cst (reference_alias_ptr_type (ai), 0));
> +}
> +
>   /* Predicate each write to memory in LOOP.
>
>      This function transforms control flow constructs containing memory
> @@ -1997,12 +1921,21 @@ mask_exists (int size, vec<int> vec)
>      |   if (cond)
>      |     A[i] = expr;
>
> -   into the following form that does not contain control flow:
> +   into the following form:
>
> -   | for (i = 0; i < N; i++)
> -   |   A[i] = cond ? expr : A[i];
> +   | void *scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
> +   |
> +   | for (i = 0; i < N; i++) {
> +   |   p = cond ? &A[i] : scratch_pad;
> +   |   *p = expr;
> +   | }
>
> -   The original CFG looks like this:
> +   SCRATCH_PAD is allocated on the stack for each function once and it is
> +   large enough to contain any kind of scalar assignment or read.  All
> +   values read or written to SCRATCH_PAD are not used in the computation.
> +
> +   In a more detailed way, the if-conversion of memory writes works
> +   like this, supposing that the original CFG looks like this:
>
>      | bb_0
>      |   i = 0
> @@ -2052,10 +1985,12 @@ mask_exists (int size, vec<int> vec)
>      |   goto bb_1
>      | end_bb_4
>
> -   predicate_mem_writes is then predicating the memory write as follows:
> +   predicate_mem_writes is then allocating SCRATCH_PAD in the basic block
> +   preceding the loop header, and is predicating the memory write:
>
>      | bb_0
>      |   i = 0
> +   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>      | end_bb_0
>      |
>      | bb_1
> @@ -2063,12 +1998,14 @@ mask_exists (int size, vec<int> vec)
>      | end_bb_1
>      |
>      | bb_2
> +   |   cond = some_computation;
>      |   if (cond) goto bb_3 else goto bb_4
>      | end_bb_2
>      |
>      | bb_3
>      |   cond = some_computation;
> -   |   A[i] = cond ? expr : A[i];
> +   |   p = cond ? &A[i] : scratch_pad;
> +   |   *p = expr;
>      |   goto bb_4
>      | end_bb_3
>      |
> @@ -2081,12 +2018,14 @@ mask_exists (int size, vec<int> vec)
>
>      | bb_0
>      |   i = 0
> +   |   scratch_pad = alloca (MAX_TYPE_SIZE_IN_BYTES);
>      |   if (i < N) goto bb_5 else goto bb_1
>      | end_bb_0
>      |
>      | bb_1
>      |   cond = some_computation;
> -   |   A[i] = cond ? expr : A[i];
> +   |   p = cond ? &A[i] : scratch_pad;
> +   |   *p = expr;
>      |   if (i < N) goto bb_5 else goto bb_4
>      | end_bb_1
>      |
> @@ -2096,7 +2035,7 @@ mask_exists (int size, vec<int> vec)
>   */
>
>   static void
> -predicate_mem_writes (loop_p loop)
> +predicate_mem_writes (loop_p loop, tree *scratch_pad)
>   {
>     unsigned int i, orig_loop_num_nodes = loop->num_nodes;
>     auto_vec<int, 1> vect_sizes;
> @@ -2179,24 +2118,29 @@ predicate_mem_writes (loop_p loop)
>   	  }
>   	else if (gimple_vdef (stmt))
>   	  {
> -	    tree lhs = gimple_assign_lhs (stmt);
> -	    tree rhs = gimple_assign_rhs1 (stmt);
> -	    tree type = TREE_TYPE (lhs);
>
> -	    lhs = ifc_temp_var (type, unshare_expr (lhs), &gsi);
> -	    rhs = ifc_temp_var (type, unshare_expr (rhs), &gsi);
> -	    if (swap)
> -	      {
> -		tree tem = lhs;
> -		lhs = rhs;
> -		rhs = tem;
> -	      }
> -	    cond = force_gimple_operand_gsi_1 (&gsi, unshare_expr (cond),
> -					       is_gimple_condexpr, NULL_TREE,
> -					       true, GSI_SAME_STMT);
> -	    rhs = fold_build_cond_expr (type, unshare_expr (cond), rhs, lhs);
> -	    gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, &gsi));
> +  	    /* A[i] = x;  */
> +	    tree ai = gimple_assign_lhs (stmt);
> +
> +	    /* pointer = cond ? &A[i] : scratch_pad;  */
> +	    tree star_pointer = create_indirect_cond_expr (ai, cond,
> +			  scratch_pad, &gsi, swap);
> +	    /* *pointer = x;  */
> +	    gimple_assign_set_lhs (stmt, star_pointer);
>   	    update_stmt (stmt);
> +
> +	  }
> +	else if (gimple_vuse (stmt))
> +	  {
> +	      /* x = A[i];  */
> +	      tree ai = gimple_assign_rhs1 (stmt);
> +
> +	      /* pointer = cond ? &A[i] : scratch_pad;  */
> +	      tree star_pointer = create_indirect_cond_expr (ai, cond,
> +					     scratch_pad, &gsi, swap);
> +	      /* x = *pointer;  */
> +	      gimple_assign_set_rhs1 (stmt, star_pointer);
> +	      update_stmt (stmt);
>   	  }
>       }
>   }
> @@ -2247,7 +2191,7 @@ remove_conditions_and_labels (loop_p loop)
>      blocks.  Replace PHI nodes with conditional modify expressions.  */
>
>   static void
> -combine_blocks (struct loop *loop, bool any_mask_load_store)
> +combine_blocks (struct loop *loop, bool any_mask_load_store, tree *scratch_pad)
>   {
>     basic_block bb, exit_bb, merge_target_bb;
>     unsigned int orig_loop_num_nodes = loop->num_nodes;
> @@ -2261,7 +2205,7 @@ combine_blocks (struct loop *loop, bool any_mask_load_store)
>     predicate_all_scalar_phis (loop);
>
>     if (flag_tree_loop_if_convert_stores || any_mask_load_store)
> -    predicate_mem_writes (loop);
> +    predicate_mem_writes (loop, scratch_pad);
>
>     /* Merge basic blocks: first remove all the edges in the loop,
>        except for those from the exit block.  */
> @@ -2523,7 +2467,7 @@ ifcvt_walk_pattern_tree (tree var, vec<gimple> *defuse_list,
>     return;
>   }
>
> -/* Returns true if STMT can be a root of bool pattern apllied
> +/* Returns true if STMT can be a root of bool pattern applied
>      by vectorizer.  */
>
>   static bool
> @@ -2553,7 +2497,7 @@ stmt_is_root_of_bool_pattern (gimple stmt)
>     return false;
>   }
>
> -/*  Traverse all statements in BB which correspondent to loop header to
> +/*  Traverse all statements in BB which correspond to loop header to
>       find out all statements which can start bool pattern applied by
>       vectorizer and convert multiple uses in it to conform pattern
>       restrictions.  Such case can occur if the same predicate is used both
> @@ -2634,7 +2578,7 @@ ifcvt_local_dce (basic_block bb)
>         gimple_set_plf (phi, GF_PLF_2, true);
>         worklist.safe_push (phi);
>       }
> -  /* Consider load/store statemnts, CALL and COND as live.  */
> +  /* Consider load/store statements, CALL and COND as live.  */
>     for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
>       {
>         stmt = gsi_stmt (gsi);
> @@ -2716,13 +2660,13 @@ ifcvt_local_dce (basic_block bb)
>      changed.  */
>
>   static unsigned int
> -tree_if_conversion (struct loop *loop)
> +tree_if_conversion (struct loop *loop, tree *scratch_pad)
>   {
>     unsigned int todo = 0;
>     ifc_bbs = NULL;
>     bool any_mask_load_store = false;
>
> -  /* Set-up aggressive if-conversion for loops marked with simd pragma.  */
> +  /* Set up aggressive if-conversion for loops marked with simd pragma.  */
>     aggressive_if_conv = loop->force_vectorize;
>     /* Check either outer loop was marked with simd pragma.  */
>     if (!aggressive_if_conv)
> @@ -2751,7 +2695,7 @@ tree_if_conversion (struct loop *loop)
>     /* Now all statements are if-convertible.  Combine all the basic
>        blocks into one huge basic block doing the if-conversion
>        on-the-fly.  */
> -  combine_blocks (loop, any_mask_load_store);
> +  combine_blocks (loop, any_mask_load_store, scratch_pad);
>
>     /* Delete dead predicate computations and repair tree correspondent
>        to bool pattern to delete multiple uses of preidcates.  */
> @@ -2832,12 +2776,14 @@ pass_if_conversion::execute (function *fun)
>     if (number_of_loops (fun) <= 1)
>       return 0;
>
> +  tree scratch_pad = NULL_TREE;
> +
>     FOR_EACH_LOOP (loop, 0)
>       if (flag_tree_loop_if_convert == 1
>   	|| flag_tree_loop_if_convert_stores == 1
>   	|| ((flag_tree_loop_vectorize || loop->force_vectorize)
>   	    && !loop->dont_vectorize))
> -      todo |= tree_if_conversion (loop);
> +      todo |= tree_if_conversion (loop, &scratch_pad);
>
>   #ifdef ENABLE_CHECKING
>     {
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-24 16:51 ` Ramana Radhakrishnan
@ 2015-06-24 17:02   ` Jeff Law
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Law @ 2015-06-24 17:02 UTC (permalink / raw)
  To: Ramana Radhakrishnan, gcc-patches; +Cc: s.pop, a.skolknik

On 06/24/2015 10:50 AM, Ramana Radhakrishnan wrote:
>
>
> On 12/06/15 21:50, Abe Skolnik wrote:
>> Hi everybody!
>>
>> In the current implementation of if conversion, loads and stores are
>> if-converted in a thread-unsafe way:
>>
>>    * loads were always executed, even when they should have not been.
>>      Some source code could be rendered invalid due to null pointers
>>      that were OK in the original program because they were never
>>      dereferenced.
>>
>>    * writes were if-converted via load/maybe-modify/store, which
>>      renders some code multithreading-unsafe.
>>
>> This patch reimplements if-conversion of loads and stores in a safe
>> way using a scratchpad allocated by the compiler on the stack:
>>
>>    * loads are done through an indirection, reading either the correct
>>      data from the correct source [if the condition is true] or reading
>>      from the scratchpad and later ignoring this read result [if the
>>      condition is false].
>>
>>    * writes are also done through an indirection, writing either to the
>>      correct destination [if the condition is true] or to the
>>      scratchpad [if the condition is false].
>>
>> Vectorization of "if-cvt-stores-vect-ifcvt-18.c" disabled because the
>> old if-conversion resulted in unsafe code that could fail under
>> multithreading even though the as-written code _was_ thread-safe.
>>
>> Passed regression testing and bootstrap on amd64-linux.
>> Is this OK to commit to trunk?
>
> I can't approve or reject but this certainly looks like an improvement
> compared to where we are as we get rid of the data races.
Right.  I was going to assume the primary purpose to to address 
correctness issues, not increase the amount of if-conversion for 
optimization purposes.

I have a couple of high level concerns around the scratchpad usage 
(aliasing, write-write hazards), but until I dig into the patch I don't 
know if they're real issues or not.


Jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-22 16:31 ` Alan Lawrence
@ 2015-06-24 17:11   ` Jeff Law
  2015-06-25  9:48     ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Jeff Law @ 2015-06-24 17:11 UTC (permalink / raw)
  To: Alan Lawrence, Abe Skolnik; +Cc: gcc-patches, richard.guenther, sebpop

On 06/22/2015 10:27 AM, Alan Lawrence wrote:

>
> My main thought concerns the direction we are travelling here. A major
> reason why we do if-conversion is to enable vectorization. Is this is
> targetted at gathering/scattering loads? Following vectorization,
> different elements of the vector being loaded/stored may have to go
> to/from the scratchpad or to/from main memory.
>
> Or, are we aiming at the case where the predicate or address are
> invariant? That seems unlikely - loop unswitching would be better for
> the predicate; loading from an address, we'd just peel and hoist;
> storing, this'd result in the address holding the last value written, at
> exit from the loop, a curious idiom. Where the predicate/address is
> invariant across the vector? (!)
>
> Or, at we aiming at non-vectorized code?
I think we're aiming at correctness issues, particularly WRT not 
allowing the optimizers to introduce new data races for C11/C++11.

>
>
> Re. creation of scratchpads:
>     (1) Should the '64' byte size be the result of scanning the
> function, for the largest data size to which we store? (ideally,
> conditionally store!)
I suspect most functions have conditional stores, but far fewer have 
conditional stores that we'd like to if-convert.  ISTM that if we can 
lazily allocate the scratchpad that'd be best.   If this were an RTL 
pass, then I'd say query the backend for the widest mode store insn and 
use that to size the scratchpad.  We may have something similar we can 
do in gimple without resorting querying insn backend capabilities. 
Perhaps walking the in-scope addressable variables or somesuch.


>     (2) Allocating only once per function: if we had one scratchpad per
> loop, it could/would live inside the test of "gimple_build_call_internal
> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more
> loops in the function, but then fail to vectorize them, we'll leave the
> scratchpad around for later phases to clean up. Is that OK?
If the scratchpad is local to a function, then I'd expect we'd clean it 
up just like any other unused local.  Once it's a global, then all bets 
are off.

Anyway, I probably should just look at the patch before making more 
comments.

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-24 17:11   ` Jeff Law
@ 2015-06-25  9:48     ` Richard Biener
  2015-06-25 14:28       ` Sebastian Pop
  2015-07-06 20:46       ` Jeff Law
  0 siblings, 2 replies; 17+ messages in thread
From: Richard Biener @ 2015-06-25  9:48 UTC (permalink / raw)
  To: Jeff Law; +Cc: Alan Lawrence, Abe Skolnik, gcc-patches, sebpop

On Wed, Jun 24, 2015 at 7:10 PM, Jeff Law <law@redhat.com> wrote:
> On 06/22/2015 10:27 AM, Alan Lawrence wrote:
>
>>
>> My main thought concerns the direction we are travelling here. A major
>> reason why we do if-conversion is to enable vectorization. Is this is
>> targetted at gathering/scattering loads? Following vectorization,
>> different elements of the vector being loaded/stored may have to go
>> to/from the scratchpad or to/from main memory.
>>
>> Or, are we aiming at the case where the predicate or address are
>> invariant? That seems unlikely - loop unswitching would be better for
>> the predicate; loading from an address, we'd just peel and hoist;
>> storing, this'd result in the address holding the last value written, at
>> exit from the loop, a curious idiom. Where the predicate/address is
>> invariant across the vector? (!)
>>
>> Or, at we aiming at non-vectorized code?
>
> I think we're aiming at correctness issues, particularly WRT not allowing
> the optimizers to introduce new data races for C11/C++11.

Yes, and if you just look at scalar code then with the rewrite we can
enable store if-conversion unconditionally.

_But_

when the new scheme triggers vectorization cannot succeed on the
result as we get

  if (cond)
    *p = val;

if-converted to

  tem = cond ? p : &scratch;
  *tem = val;

and

   if (cond)
     val = *p;

if-converted to

  scatch = val;
  tem = cond ? p : &scratch;
  val = *tem;

and thus the store and loads appear as scather/gather ones to
the vectorizer (if-conversion could directly generate masked
load/stores of course and not use a scratch-pad at all in that case).

So the question is whether we get more non-vectorized if-converted
code out of this (and thus whether we want to use
--param allow-store-data-races to get the old code back which is
nicer to less capable CPUs and probably faster than using
scatter/gather or masked loads/stores).  Note that I think scatter
support is still not implemented in the vectorizer.  I also wonder
if we should really care about load data races (not sure your patch
does).

I didn't look at the patch in detail yet - please address Alans comments
and re-post an updated patch.

In general I like the idea.

Thanks,
Richard.

>>
>>
>> Re. creation of scratchpads:
>>     (1) Should the '64' byte size be the result of scanning the
>> function, for the largest data size to which we store? (ideally,
>> conditionally store!)
>
> I suspect most functions have conditional stores, but far fewer have
> conditional stores that we'd like to if-convert.  ISTM that if we can lazily
> allocate the scratchpad that'd be best.   If this were an RTL pass, then I'd
> say query the backend for the widest mode store insn and use that to size
> the scratchpad.  We may have something similar we can do in gimple without
> resorting querying insn backend capabilities. Perhaps walking the in-scope
> addressable variables or somesuch.
>
>
>>     (2) Allocating only once per function: if we had one scratchpad per
>> loop, it could/would live inside the test of "gimple_build_call_internal
>> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more
>> loops in the function, but then fail to vectorize them, we'll leave the
>> scratchpad around for later phases to clean up. Is that OK?
>
> If the scratchpad is local to a function, then I'd expect we'd clean it up
> just like any other unused local.  Once it's a global, then all bets are
> off.
>
> Anyway, I probably should just look at the patch before making more
> comments.
>
> jeff
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-25  9:48     ` Richard Biener
@ 2015-06-25 14:28       ` Sebastian Pop
  2015-06-26 12:35         ` Alan Lawrence
  2015-07-06 20:46       ` Jeff Law
  1 sibling, 1 reply; 17+ messages in thread
From: Sebastian Pop @ 2015-06-25 14:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, Alan Lawrence, Abe Skolnik, gcc-patches

On Thu, Jun 25, 2015 at 4:43 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> when the new scheme triggers vectorization cannot succeed on the
> result as we get
>
>   if (cond)
>     *p = val;
>
> if-converted to
>
>   tem = cond ? p : &scratch;
>   *tem = val;

That's correct.

>
> and
>
>    if (cond)
>      val = *p;
>
> if-converted to
>
>   scatch = val;
>   tem = cond ? p : &scratch;
>   val = *tem;

The patch does this slightly differently:

   tem = cond ? p : &scratch;
   val = cond ? *tem : val;

I think I like your version better as it has only one cond_expr.

>
> and thus the store and loads appear as scather/gather ones to
> the vectorizer (if-conversion could directly generate masked
> load/stores of course and not use a scratch-pad at all in that case).
>
> So the question is whether we get more non-vectorized if-converted
> code out of this

this is the case.

> (and thus whether we want to use
> --param allow-store-data-races to get the old code back which is
> nicer to less capable CPUs and probably faster than using
> scatter/gather or masked loads/stores).

A flag to allow load and store data-races is an interesting suggestion.

Abe also suggested to continue optimizing the other way in cases
where we know to write or load from the same location on all branches:

if (c)
  A[i] = ...
else
  A[i] = ...

> scatter support is still not implemented in the vectorizer.

Correct.

> I also wonder if we should really care about load data races
> (not sure your patch does).

The patch does both loads and stores.

> I didn't look at the patch in detail yet - please address Alans comments
> and re-post an updated patch.
>
> In general I like the idea.

Thanks for your review,
Sebastian

>
> Thanks,
> Richard.
>
>>>
>>>
>>> Re. creation of scratchpads:
>>>     (1) Should the '64' byte size be the result of scanning the
>>> function, for the largest data size to which we store? (ideally,
>>> conditionally store!)
>>
>> I suspect most functions have conditional stores, but far fewer have
>> conditional stores that we'd like to if-convert.  ISTM that if we can lazily
>> allocate the scratchpad that'd be best.   If this were an RTL pass, then I'd
>> say query the backend for the widest mode store insn and use that to size
>> the scratchpad.  We may have something similar we can do in gimple without
>> resorting querying insn backend capabilities. Perhaps walking the in-scope
>> addressable variables or somesuch.
>>
>>
>>>     (2) Allocating only once per function: if we had one scratchpad per
>>> loop, it could/would live inside the test of "gimple_build_call_internal
>>> (IFN_LOOP_VECTORIZED, ...". Otherwise, if we if-convert one or more
>>> loops in the function, but then fail to vectorize them, we'll leave the
>>> scratchpad around for later phases to clean up. Is that OK?
>>
>> If the scratchpad is local to a function, then I'd expect we'd clean it up
>> just like any other unused local.  Once it's a global, then all bets are
>> off.
>>
>> Anyway, I probably should just look at the patch before making more
>> comments.
>>
>> jeff
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-25 14:28       ` Sebastian Pop
@ 2015-06-26 12:35         ` Alan Lawrence
  2015-06-26 15:10           ` Richard Biener
  0 siblings, 1 reply; 17+ messages in thread
From: Alan Lawrence @ 2015-06-26 12:35 UTC (permalink / raw)
  To: Sebastian Pop; +Cc: Richard Biener, Jeff Law, Abe Skolnik, gcc-patches

Sebastian Pop wrote:
> On Thu, Jun 25, 2015 at 4:43 AM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> when the new scheme triggers vectorization cannot succeed on the
>> result as we get
>>
>>   if (cond)
>>     *p = val;
>>
>> if-converted to
>>
>>   tem = cond ? p : &scratch;
>>   *tem = val;
> 
> That's correct.
> 
>> and
>>
>>    if (cond)
>>      val = *p;
>>
>> if-converted to
>>
>>   scatch = val;
>>   tem = cond ? p : &scratch;
>>   val = *tem;
> 
> The patch does this slightly differently:
> 
>    tem = cond ? p : &scratch;
>    val = cond ? *tem : val;
> 
> I think I like your version better as it has only one cond_expr.

Another slight concern...by reusing scratchpad's, are we at risk of creating 
lots of false dependencies here, that restrict instruction scheduling / other 
opts later?

>> [...]
>> and thus the store and loads appear as scather/gather ones to
>> the vectorizer (if-conversion could directly generate masked
>> load/stores of course and not use a scratch-pad at all in that case).

Thank you Richard for much better expressing what I was thinking here too :)

> Abe also suggested to continue optimizing the other way in cases
> where we know to write or load from the same location on all branches:
> 
> if (c)
>   A[i] = ...
> else
>   A[i] = ...

The store here really should be commoned, yes.

(Related but different?: BZ 56625.)


I might add, I find this code much easier to follow than the old (removed) code 
about data references, memrefs_read_unconditionally, and 
write_memrefs_written_at_least-once...

Thanks, Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-26 12:35         ` Alan Lawrence
@ 2015-06-26 15:10           ` Richard Biener
  2015-06-26 21:29             ` Jeff Law
  0 siblings, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-06-26 15:10 UTC (permalink / raw)
  To: Alan Lawrence, Sebastian Pop; +Cc: Jeff Law, Abe Skolnik, gcc-patches

On June 26, 2015 2:21:22 PM GMT+02:00, Alan Lawrence <alan.lawrence@arm.com> wrote:
>Sebastian Pop wrote:
>> On Thu, Jun 25, 2015 at 4:43 AM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> when the new scheme triggers vectorization cannot succeed on the
>>> result as we get
>>>
>>>   if (cond)
>>>     *p = val;
>>>
>>> if-converted to
>>>
>>>   tem = cond ? p : &scratch;
>>>   *tem = val;
>> 
>> That's correct.
>> 
>>> and
>>>
>>>    if (cond)
>>>      val = *p;
>>>
>>> if-converted to
>>>
>>>   scatch = val;
>>>   tem = cond ? p : &scratch;
>>>   val = *tem;
>> 
>> The patch does this slightly differently:
>> 
>>    tem = cond ? p : &scratch;
>>    val = cond ? *tem : val;
>> 
>> I think I like your version better as it has only one cond_expr.
>
>Another slight concern...by reusing scratchpad's, are we at risk of
>creating 
>lots of false dependencies here, that restrict instruction scheduling /
>other 
>opts later?

Another possibility is to not share them and make sure to insert clobbers for them when they become dead so later stack slot sharing can merge them again.

Richard.

>>> [...]
>>> and thus the store and loads appear as scather/gather ones to
>>> the vectorizer (if-conversion could directly generate masked
>>> load/stores of course and not use a scratch-pad at all in that
>case).
>
>Thank you Richard for much better expressing what I was thinking here
>too :)
>
>> Abe also suggested to continue optimizing the other way in cases
>> where we know to write or load from the same location on all
>branches:
>> 
>> if (c)
>>   A[i] = ...
>> else
>>   A[i] = ...
>
>The store here really should be commoned, yes.
>
>(Related but different?: BZ 56625.)
>
>
>I might add, I find this code much easier to follow than the old
>(removed) code 
>about data references, memrefs_read_unconditionally, and 
>write_memrefs_written_at_least-once...
>
>Thanks, Alan


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-26 15:10           ` Richard Biener
@ 2015-06-26 21:29             ` Jeff Law
  0 siblings, 0 replies; 17+ messages in thread
From: Jeff Law @ 2015-06-26 21:29 UTC (permalink / raw)
  To: Richard Biener, Alan Lawrence, Sebastian Pop; +Cc: Abe Skolnik, gcc-patches

On 06/26/2015 08:57 AM, Richard Biener wrote:
>
> Another possibility is to not share them and make sure to insert clobbers for them when they become dead so later stack slot sharing can merge them again.
We've certainly had cases in the past where re-using a stack slot for an 
object that has gone out of scope for some other object in a different 
scope with a different type are not seen as conflicting by the aliasing 
machinery in the past resulting in incorrect code motions.

Creating distinct scratchpads and letting the existing machinery (which 
has been fixed to deal with the issues noted above I believe) would make 
more sense than trying to reimplement the sharing correctness stuff 
within if-conversion.

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-06-25  9:48     ` Richard Biener
  2015-06-25 14:28       ` Sebastian Pop
@ 2015-07-06 20:46       ` Jeff Law
  1 sibling, 0 replies; 17+ messages in thread
From: Jeff Law @ 2015-07-06 20:46 UTC (permalink / raw)
  To: Richard Biener; +Cc: Alan Lawrence, Abe Skolnik, gcc-patches, sebpop

On 06/25/2015 03:43 AM, Richard Biener wrote:
>
> Yes, and if you just look at scalar code then with the rewrite we can
> enable store if-conversion unconditionally.
>
> _But_
>
> when the new scheme triggers vectorization cannot succeed on the
> result as we get
>
>    if (cond)
>      *p = val;
>
> if-converted to
>
>    tem = cond ? p : &scratch;
>    *tem = val;
>
> and
>
>     if (cond)
>       val = *p;
>
> if-converted to
>
>    scatch = val;
>    tem = cond ? p : &scratch;
>    val = *tem;
>
> and thus the store and loads appear as scather/gather ones to
> the vectorizer (if-conversion could directly generate masked
> load/stores of course and not use a scratch-pad at all in that case).
Right.  But are we correctly handling these cases in the current if 
conversion code?  If we're currently mis-handling them, then Abe's 
changes would seem like a step forward from a correctness standpoint, 
even if they take us a step backwards from a performance standpoint.

>
> So the question is whether we get more non-vectorized if-converted
> code out of this (and thus whether we want to use
> --param allow-store-data-races to get the old code back which is
> nicer to less capable CPUs and probably faster than using
> scatter/gather or masked loads/stores).
I do think conditionalizing some of this on the allo-store-data-races 
makes sense.




   Note that I think scatter
> support is still not implemented in the vectorizer.  I also wonder
> if we should really care about load data races (not sure your patch
> does).
I thought the consensus was that load data races weren't important from 
a correctness standpoint.  Torvald would be the man to know the answer 
on that one (triegel@redhat.com).

> In general I like the idea.
Similarly.  Just trying to find the time to actually look at it is 
proving hard :(

jeff

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-07-08  9:56 ` Richard Biener
@ 2015-07-08 16:27   ` Abe
  0 siblings, 0 replies; 17+ messages in thread
From: Abe @ 2015-07-08 16:27 UTC (permalink / raw)
  To: Richard Biener
  Cc: Jeff Law, Alan Lawrence, sebpop@gmail.com >> Sebastian Pop,
	gcc-patches

>>> (if-conversion could directly generate masked load/stores
>>>   of course and not use a scratch-pad at all in that case).

[Abe wrote:]
>> IMO that`s a great idea, but I don`t know how to do it.
>> Hints would be welcome.  In particular, how does one
 >> "generate masked load/stores" at the GIMPLE level?

[Richard Biener wrote:]
> It already does that, see predicate_mem_writes.
 > You should definitely preserve that path

Thanks.  Yes, we have not intentionally disabled that.


> On what hardware did you test?

AMD64 arch., Intel implementation.  Nothing fancy AFAIK in the flags to make it super-specific,
e.g. "-march=nocona" or "-march=native".  Occasionally using AVX-2 flags as specified by some test cases.

Regards,

Abe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-07-08  9:14 ` Alan Lawrence
@ 2015-07-08 15:52   ` Abe
  0 siblings, 0 replies; 17+ messages in thread
From: Abe @ 2015-07-08 15:52 UTC (permalink / raw)
  To: Alan Lawrence
  Cc: Jeff Law, Richard Biener,
	sebpop@gmail.com >> Sebastian Pop, gcc-patches

[Abe wrote:]
>> I believe Sebastian would agree that the new if converter  is safer than the old one
 >> in terms of correctness at the time of running the code being compiled.
[...]
>> For now, we have a few performance regressions
[snip]


[Alan wrote:]
> TBH my two cents would be that a performance-regressed, but correct, compiler, is far better to release,  than a
 > performance-"improved" one that generates unsafe code (e.g. with extra faults in the straightforward single-threaded case!)...

I strongly agree that -- by default -- correctness trumps performance.  The only times it is allowable to reverse that relationship, IMO,
is when the user* of the compiler has explicitly specified flags [e.g. "-ffast-math", "-Ofast"] that tell the compiler that the user*
currently cares more about performance than about {correctness-according-to-spec and/or safety in all conditions including null pointers}.

['*': or Makefiles, build scripts, etc.]

FYI: TTBOMK, the old if converter was not unsafe with default flags or with only "big knobs" like "-O3"; I`m unsure what it did
under "-ffast-math" and "-Ofast", if anything of interest.  The main advantage of the new if converter over the old one is that
the new one is safe in certain situations wherein the old one is unsafe, e.g. the old one may cause the vectorized code to segfault
where the non-if-converted code would have run just fine all the way to program completion with the same inputs.  This additional
safety allows the new converter to be used under more conditions, which in turn allows it to be enabled by default.  We intend
for all the safe if-conversions to be done by default whenever the vectorizer is on.  If there are any unsafe conversions left,
which I`m not sure there are, then we will enable them only when the user* specifies something like "-fif-conversion=allow-unsafe".
The "allows it to be enabled by default" property should help the code that GCC generates under "-O3" w/o any additional
flags to be faster than it currently is, for the relevant targets, *without sacrificing even _one_ _bit_ of correctness*.

Regards,

Abe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-07-07 21:23 Abe
  2015-07-08  9:14 ` Alan Lawrence
@ 2015-07-08  9:56 ` Richard Biener
  2015-07-08 16:27   ` Abe
  1 sibling, 1 reply; 17+ messages in thread
From: Richard Biener @ 2015-07-08  9:56 UTC (permalink / raw)
  To: Abe
  Cc: Jeff Law, Alan Lawrence, sebpop@gmail.com >> Sebastian Pop,
	gcc-patches

On Tue, Jul 7, 2015 at 11:23 PM, Abe <abe_skolnik@yahoo.com> wrote:
>> (if-conversion could directly generate masked load/stores
>>  of course and not use a scratch-pad at all in that case).
>
>
> IMO that`s a great idea, but I don`t know how to do it.  Hints would be
> welcome.  In particular, how does one "generate masked load/stores" at the
> GIMPLE level?

It already does that, see predicate_mem_writes.  You should definitely
preserve that path (I think it does that only when versioning the loop
for if-conversion, non-vectorized loops will then not be if-converted).

>
>> But are we correctly handling these cases in the current if conversion
>> code?
>
>
> I`m uncertain to what that is intended to refer, but I believe Sebastian
> would agree that the new if converter is safer than the old one in terms of
> correctness at the time of running the code being compiled.
>
>
>> Abe's changes would seem like a step forward from a correctness standpoint
>
>
> Not to argue, but as a point of humility: Sebastian did by far the most work
> on this patch.  I just modernized it and helped move it along.  Even _that_
> was done with Sebastian`s help.
>
>
>> even if they take us a step backwards from a performance standpoint.
>
>
> For now, we have a few performance regressions, and so far we have found
> that it`s non-trivial to remove all of those regressions.

On what hardware did you test?

> We may be better off pushing the current patch to trunk and having the
> performance regressions currently introduced by the new if converter be
> fixed by later patches.
> Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the
> code will get more "eyeballs" than if it lingers in an uncommitted patch or
> in a branch.
> I, for one, would love some help in fixing these performance regressions.
> ;-)
>
> If fixing the performance regressions winds up taking too long, perhaps the
> current imperfect patch could be undone on trunk just before a release is
> tagged,
> and then we`ll push it in again when trunk goes back to being allowed to be
> unstable?  According to my analysis of the data near the end of the page at
> <https://gcc.gnu.org/develop.html>, we have until roughly April of 2016 to
> work on not-yet-perfect patches in trunk.
>
>
>>> So the question is whether we get more non-vectorized if-converted
>>> code out of this (and thus whether we want to use --param
>>> allow-store-data-races to get the old code back which is nicer to less
>>> capable CPUs and probably faster than using scatter/gather or masked
>>> loads/stores).
>
>
>> I do think conditionalizing some of this on the allow-store-data-races
>> makes sense.
>
>
> I think having both the old if-converter and the new one "live on" in GCC is
> nontrivial, but not impossible.  I also don`t think it`s the best long-term
> goal,
> but only a short-term workaround.  In the long run, IMO there should be only
> one if converter, albeit perhaps with tweaking flags [e.g.
> "-fallow-unsafe-if-conversion"].
>
>
>> I also wonder if we should really care about load data races (not sure
>> your patch does).
>
>
> According to a recent long discussion I had with Sebastian, our current
> patch does not have the flaw I was concerned it might have in terms of loads
> because:
>
>   [1] the scratchpad is only being used to if-convert assignments to
> thread-local scalars, never to globals/statics, and because...
>
>   [2] the gimplifier is supposed to detect "the address of this scalar has
> been taken" and when such is detected in the code being compiled,
>       it causes the scalar to no longer look like a scalar in GIMPLE so that
> we are also safe from stale-data problems that could come from
>       corner-case code that takes the address of a thread-local variable and
> gives that address to another thread [which then proceeds to
>       overwrite the value in the supposedly-thread-local scalar that belongs
> to a different thread from the one doing the writing]
>
>
> Regards,
>
> Abe
>
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
  2015-07-07 21:23 Abe
@ 2015-07-08  9:14 ` Alan Lawrence
  2015-07-08 15:52   ` Abe
  2015-07-08  9:56 ` Richard Biener
  1 sibling, 1 reply; 17+ messages in thread
From: Alan Lawrence @ 2015-07-08  9:14 UTC (permalink / raw)
  To: Abe
  Cc: Jeff Law, Richard Biener,
	sebpop@gmail.com >> Sebastian Pop, gcc-patches

Abe wrote:

> I`m uncertain to what that is intended to refer, but I believe Sebastian would agree that the new if converter is safer than the old one in terms of correctness at the time of running the code being compiled.
 >
>> even if they take us a step backwards from a performance standpoint.
> 
> For now, we have a few performance regressions, and so far we have found that it`s non-trivial to remove all of those regressions.
> We may be better off pushing the current patch to trunk and having the performance regressions currently introduced by the new if converter be fixed by later patches.
> Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the code will get more "eyeballs" than if it lingers in an uncommitted patch or in a branch.
> I, for one, would love some help in fixing these performance regressions. ;-)
> 
> If fixing the performance regressions winds up taking too long, perhaps the current imperfect patch could be undone on trunk just before a release is tagged,
> and then we`ll push it in again when trunk goes back to being allowed to be unstable? 

TBH my two cents would be that a performance-regressed, but correct, compiler, 
is far better to release, than a performance-"improved" one that generates 
unsafe code (e.g. with extra faults in the straightforward single-threaded case!)...

--Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: fix PR46029: reimplement if conversion of loads and stores
@ 2015-07-07 21:23 Abe
  2015-07-08  9:14 ` Alan Lawrence
  2015-07-08  9:56 ` Richard Biener
  0 siblings, 2 replies; 17+ messages in thread
From: Abe @ 2015-07-07 21:23 UTC (permalink / raw)
  To: Jeff Law, Richard Biener, Alan Lawrence,
	sebpop@gmail.com >> Sebastian Pop
  Cc: gcc-patches

> (if-conversion could directly generate masked load/stores
>  of course and not use a scratch-pad at all in that case).

IMO that`s a great idea, but I don`t know how to do it.  Hints would be welcome.  In particular, how does one "generate masked load/stores" at the GIMPLE level?


> But are we correctly handling these cases in the current if conversion code?

I`m uncertain to what that is intended to refer, but I believe Sebastian would agree that the new if converter is safer than the old one in terms of correctness at the time of running the code being compiled.


> Abe's changes would seem like a step forward from a correctness standpoint

Not to argue, but as a point of humility: Sebastian did by far the most work on this patch.  I just modernized it and helped move it along.  Even _that_ was done with Sebastian`s help.


> even if they take us a step backwards from a performance standpoint.

For now, we have a few performance regressions, and so far we have found that it`s non-trivial to remove all of those regressions.
We may be better off pushing the current patch to trunk and having the performance regressions currently introduced by the new if converter be fixed by later patches.
Pushing to trunk gives us excellent "visibility" amongst GCC hackers, so the code will get more "eyeballs" than if it lingers in an uncommitted patch or in a branch.
I, for one, would love some help in fixing these performance regressions. ;-)

If fixing the performance regressions winds up taking too long, perhaps the current imperfect patch could be undone on trunk just before a release is tagged,
and then we`ll push it in again when trunk goes back to being allowed to be unstable?  According to my analysis of the data near the end of the page at
<https://gcc.gnu.org/develop.html>, we have until roughly April of 2016 to work on not-yet-perfect patches in trunk.


>> So the question is whether we get more non-vectorized if-converted
>> code out of this (and thus whether we want to use --param
>> allow-store-data-races to get the old code back which is nicer to less
>> capable CPUs and probably faster than using scatter/gather or masked loads/stores).

> I do think conditionalizing some of this on the allow-store-data-races makes sense.

I think having both the old if-converter and the new one "live on" in GCC is nontrivial, but not impossible.  I also don`t think it`s the best long-term goal,
but only a short-term workaround.  In the long run, IMO there should be only one if converter, albeit perhaps with tweaking flags [e.g. "-fallow-unsafe-if-conversion"].


> I also wonder if we should really care about load data races (not sure your patch does).

According to a recent long discussion I had with Sebastian, our current patch does not have the flaw I was concerned it might have in terms of loads because:

   [1] the scratchpad is only being used to if-convert assignments to thread-local scalars, never to globals/statics, and because...

   [2] the gimplifier is supposed to detect "the address of this scalar has been taken" and when such is detected in the code being compiled,
       it causes the scalar to no longer look like a scalar in GIMPLE so that we are also safe from stale-data problems that could come from
       corner-case code that takes the address of a thread-local variable and gives that address to another thread [which then proceeds to
       overwrite the value in the supposedly-thread-local scalar that belongs to a different thread from the one doing the writing]


Regards,

Abe


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: fix PR46029: reimplement if conversion of loads and stores
       [not found]   ` <680180205.479081.1435168382362.JavaMail.yahoo@mail.yahoo.com>
@ 2015-06-30  9:36     ` Alan Lawrence
  0 siblings, 0 replies; 17+ messages in thread
From: Alan Lawrence @ 2015-06-30  9:36 UTC (permalink / raw)
  To: Abe Skolnik; +Cc: gcc-patches, Richard Biener, law

Abe Skolnik wrote:

>> In tree-if-conv.c:[…]> if it doesn't trap, but has_non_addressable_refs, can't we use
>> ifcvt_can_use_mask_load_store there too?
>> if an access could trap, but is addressable,> can't we use the scratchpad technique
>> to get round the trapping problem?
> 
> That`s how we deal with loads.  We use the scratchpad in case the condition is false.  That way it doesn`t matter if the original code was hypothetically dereferencing a null pointer in the condition-false section, for example: we aren`t unconditionally executing both sides exactly as written.  On the false side, we load from the scratchpad and then ignore the result of that load.

Ok, so I'm looking at tree-if-conv.c:

@@ -876,32 +726,40 @@ if_convertible_gimple_assign_stmt_p (gimple stmt,
        return false;
      }

    /* tree-into-ssa.c uses GF_PLF_1, so avoid it, because
       in between if_convertible_loop_p and combine_blocks
       we can perform loop versioning.  */
    gimple_set_plf (stmt, GF_PLF_2, false);

    if (flag_tree_loop_if_convert_stores)
      {
-      if (ifcvt_could_trap_p (stmt, refs))
+      if (ifcvt_could_trap_p (stmt))
         {
           if (ifcvt_can_use_mask_load_store (stmt))
             {
               gimple_set_plf (stmt, GF_PLF_2, true);
               *any_mask_load_store = true;
               return true;
             }
           if (dump_file && (dump_flags & TDF_DETAILS))
-           fprintf (dump_file, "tree could trap...\n");
+           fprintf (dump_file, "tree could trap\n");
           return false;
         }
+
+      if (has_non_addressable_refs (stmt))
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           fprintf (dump_file, "has non-addressable memory references\n");
+         return false;
+       }
+
        return true;
      }


I feel I'm being stupid here, but...if ifcvt_could_trap_p && 
!ifcvt_can_use_mask_load_store, we return false here. Specifically, we return 
false if ifcvt_could_trap_p && !ifcvt_can_use_mask_load_store && 
!has_non_addressable_refs. Shouldn't we return true in that case, in order to 
use the scratchpad technique rather than bail out?

Thanks, Alan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-07-08 16:27 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-12 21:05 fix PR46029: reimplement if conversion of loads and stores Abe Skolnik
2015-06-22 16:31 ` Alan Lawrence
2015-06-24 17:11   ` Jeff Law
2015-06-25  9:48     ` Richard Biener
2015-06-25 14:28       ` Sebastian Pop
2015-06-26 12:35         ` Alan Lawrence
2015-06-26 15:10           ` Richard Biener
2015-06-26 21:29             ` Jeff Law
2015-07-06 20:46       ` Jeff Law
2015-06-24 16:51 ` Ramana Radhakrishnan
2015-06-24 17:02   ` Jeff Law
     [not found] <001301d0ae99$0f015800$2d040800$@samsung.com>
     [not found] ` <682387955.453848.1435166632115.JavaMail.yahoo@mail.yahoo.com>
     [not found]   ` <680180205.479081.1435168382362.JavaMail.yahoo@mail.yahoo.com>
2015-06-30  9:36     ` Alan Lawrence
2015-07-07 21:23 Abe
2015-07-08  9:14 ` Alan Lawrence
2015-07-08 15:52   ` Abe
2015-07-08  9:56 ` Richard Biener
2015-07-08 16:27   ` Abe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).