public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [0/8] Improve vector alias checks for WAR and WAW dependencies
@ 2019-11-11 18:46 Richard Sandiford
  2019-11-11 18:47 ` [1/8] Move canonicalisation of dr_with_seg_len_pair_ts Richard Sandiford
                   ` (7 more replies)
  0 siblings, 8 replies; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:46 UTC (permalink / raw)
  To: gcc-patches

For:

  void
  f1 (int *x, int *y)
  {
    for (int i = 0; i < 32; ++i)
      x[i] += y[i];
  }

we check at runtime whether one vector at x would overlap one vector at y.
But in cases like this, the vector code would handle x <= y just fine,
since any write to address A still happens after any read from address A.
The only problem is if x is ahead of y by less than a vector.

The same is true for two writes:

  void
  f2 (int *x, int *y)
  {
    for (int i = 0; i < 32; ++i)
      {
        x[i] = i;
        y[i] = 2;
      }
  }

If y <= x then a vector write at y after a vector write at x would
have the same net effect as the original scalar writes.

Read-after-write cases like:

  int
  f3 (int *x, int *y)
  {
    int res = 0;
    for (int i = 0; i < 32; ++i)
      {
        x[i] = i;
        res += y[i];
      }
    return res;
  }

can cope with x == y, but otherwise don't allow overlap in either
direction.  Since checking for x == y at runtime would require extra
code, we're probably better off sticking with the current overlap test.

An overlap test is also needed if the scalar or vector accesses covered
by the alias check are mixed together, rather than all statements for
the second access following all statements for the first access.

This patch series tracks whether accesses in an alias check are
well-ordered, and also tracks which combination of RAW, WAR and WAW
dependencies the alias check covers.  It then uses a more relaxed
condition for well-ordered WAR and WAW alias checks.

The most important case this allows is equal addresses/indices for WAR
dependencies.  E.g. more realistic instances of functions like f1 can
support both "constructive" and "destructive" operations, where the
destination pointer is explicitly allowed to be the same as a source
pointer or point to an independent object.

The series probably doesn't help other cases that much in practice.
However, the checks involved are just as simple as (and sometimes
slightly simpler than) the corresponding overlap tests, so there should
be no downside to using the more relaxed tests whenever we can.

The main reason for doing this now is that SVE2 has instructions that
detect RAW and WAR/WAW hazards between two vector pointers.  A follow-on
patch adds support for them.

Each patch tested individually on aarch64-linux-gnu and the series
as a whole on x86_64-linux-gnu.  OK to install?

Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [2/8] Delay swapping data refs in prune_runtime_alias_test_list
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
  2019-11-11 18:47 ` [1/8] Move canonicalisation of dr_with_seg_len_pair_ts Richard Sandiford
@ 2019-11-11 18:47 ` Richard Sandiford
  2019-11-15 11:06   ` Richard Biener
  2019-11-11 18:48 ` [3/8] Add flags to dr_with_seg_len_pair_t Richard Sandiford
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:47 UTC (permalink / raw)
  To: gcc-patches

prune_runtime_alias_test_list swapped dr_as between two dr_with_seg_len
pairs before finally deciding whether to merge them.  Bailing out later
would therefore leave the pairs in an incorrect state.

IMO a better fix would be to split this out into a subroutine that
produces a temporary dr_with_seg_len on success, rather than changing
an existing one in-place.  It would then be easy to merge both the dr_as
and dr_bs if we wanted to, rather than requiring one of them to be equal.
But here I tried to do something that could be backported if necessary.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.c (prune_runtime_alias_test_list): Delay
	swapping the dr_as based on init values until we've decided
	whether to merge them.

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:30:43.203244558 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:30:47.199216669 +0000
@@ -1556,13 +1556,6 @@ prune_runtime_alias_test_list (vec<dr_wi
 	  if (!ordered_p (init_a1, init_a2))
 	    continue;
 
-	  /* Make sure dr_a1 starts left of dr_a2.  */
-	  if (maybe_gt (init_a1, init_a2))
-	    {
-	      std::swap (*dr_a1, *dr_a2);
-	      std::swap (init_a1, init_a2);
-	    }
-
 	  /* Work out what the segment length would be if we did combine
 	     DR_A1 and DR_A2:
 
@@ -1579,7 +1572,10 @@ prune_runtime_alias_test_list (vec<dr_wi
 
 	     The lengths both have sizetype, so the sign is taken from
 	     the step instead.  */
-	  if (!operand_equal_p (dr_a1->seg_len, dr_a2->seg_len, 0))
+	  poly_uint64 new_seg_len = 0;
+	  bool new_seg_len_p = !operand_equal_p (dr_a1->seg_len,
+						 dr_a2->seg_len, 0);
+	  if (new_seg_len_p)
 	    {
 	      poly_uint64 seg_len_a1, seg_len_a2;
 	      if (!poly_int_tree_p (dr_a1->seg_len, &seg_len_a1)
@@ -1597,14 +1593,24 @@ prune_runtime_alias_test_list (vec<dr_wi
 	      int sign_a = tree_int_cst_sgn (indicator_a);
 	      int sign_b = tree_int_cst_sgn (indicator_b);
 
-	      poly_uint64 new_seg_len;
 	      if (sign_a <= 0 && sign_b <= 0)
 		new_seg_len = lower_bound (seg_len_a1, seg_len_a2);
 	      else if (sign_a >= 0 && sign_b >= 0)
 		new_seg_len = upper_bound (seg_len_a1, seg_len_a2);
 	      else
 		continue;
+	    }
+	  /* At this point we're committed to merging the refs.  */
 
+	  /* Make sure dr_a1 starts left of dr_a2.  */
+	  if (maybe_gt (init_a1, init_a2))
+	    {
+	      std::swap (*dr_a1, *dr_a2);
+	      std::swap (init_a1, init_a2);
+	    }
+
+	  if (new_seg_len_p)
+	    {
 	      dr_a1->seg_len = build_int_cst (TREE_TYPE (dr_a1->seg_len),
 					      new_seg_len);
 	      dr_a1->align = MIN (dr_a1->align, known_alignment (new_seg_len));

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [1/8] Move canonicalisation of dr_with_seg_len_pair_ts
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
@ 2019-11-11 18:47 ` Richard Sandiford
  2019-11-15 11:01   ` Richard Biener
  2019-11-11 18:47 ` [2/8] Delay swapping data refs in prune_runtime_alias_test_list Richard Sandiford
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:47 UTC (permalink / raw)
  To: gcc-patches

The two users of tree-data-ref's runtime alias checks both canonicalise
the order of the dr_with_seg_lens in a pair before passing them to
prune_runtime_alias_test_list.  It's more convenient for later patches
if prune_runtime_alias_test_list does that itself.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.c (prune_runtime_alias_test_list): Sort the
	two accesses in each dr_with_seg_len_pair_t before trying to
	combine separate dr_with_seg_len_pair_ts.
	* tree-loop-distribution.c (compute_alias_check_pairs): Don't do
	that here.
	* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Likewise.

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-07-18 09:22:13.893767915 +0100
+++ gcc/tree-data-ref.c	2019-11-11 18:30:43.203244558 +0000
@@ -1487,13 +1487,32 @@ comp_dr_with_seg_len_pair (const void *p
 prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
 			       poly_uint64)
 {
+  /* Canonicalize each pair so that the base components are ordered wrt
+     data_ref_compare_tree.  This allows the loop below to merge more
+     cases.  */
+  unsigned int i;
+  dr_with_seg_len_pair_t *alias_pair;
+  FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
+    {
+      data_reference_p dr_a = alias_pair->first.dr;
+      data_reference_p dr_b = alias_pair->second.dr;
+      int comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
+					    DR_BASE_ADDRESS (dr_b));
+      if (comp_res == 0)
+	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));
+      if (comp_res == 0)
+	comp_res = data_ref_compare_tree (DR_INIT (dr_a), DR_INIT (dr_b));
+      if (comp_res > 0)
+	std::swap (alias_pair->first, alias_pair->second);
+    }
+
   /* Sort the collected data ref pairs so that we can scan them once to
      combine all possible aliasing checks.  */
   alias_pairs->qsort (comp_dr_with_seg_len_pair);
 
   /* Scan the sorted dr pairs and check if we can combine alias checks
      of two neighboring dr pairs.  */
-  for (size_t i = 1; i < alias_pairs->length (); ++i)
+  for (i = 1; i < alias_pairs->length (); ++i)
     {
       /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
       dr_with_seg_len *dr_a1 = &(*alias_pairs)[i-1].first,
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	2019-07-10 19:41:20.539944929 +0100
+++ gcc/tree-loop-distribution.c	2019-11-11 18:30:43.207244530 +0000
@@ -2457,12 +2457,6 @@ compute_alias_check_pairs (class loop *l
       struct data_reference *dr_a = DDR_A (ddr);
       struct data_reference *dr_b = DDR_B (ddr);
       tree seg_length_a, seg_length_b;
-      int comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
-					    DR_BASE_ADDRESS (dr_b));
-
-      if (comp_res == 0)
-	comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));
-      gcc_assert (comp_res != 0);
 
       if (latch_dominated_by_data_ref (loop, dr_a))
 	seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
@@ -2485,10 +2479,6 @@ compute_alias_check_pairs (class loop *l
 	(dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
 	 dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
 
-      /* Canonicalize pairs by sorting the two DR members.  */
-      if (comp_res > 0)
-	std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
-
       comp_alias_pairs->safe_push (dr_with_seg_len_pair);
     }
 
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2019-11-08 16:06:18.948980179 +0000
+++ gcc/tree-vect-data-refs.c	2019-11-11 18:30:43.207244530 +0000
@@ -3478,7 +3478,6 @@ vect_prune_runtime_alias_test_list (loop
   /* First, we collect all data ref pairs for aliasing checks.  */
   FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
     {
-      int comp_res;
       poly_uint64 lower_bound;
       tree segment_length_a, segment_length_b;
       unsigned HOST_WIDE_INT access_size_a, access_size_b;
@@ -3594,14 +3593,11 @@ vect_prune_runtime_alias_test_list (loop
       align_a = vect_vfa_align (dr_info_a);
       align_b = vect_vfa_align (dr_info_b);
 
-      comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_info_a->dr),
-					DR_BASE_ADDRESS (dr_info_b->dr));
-      if (comp_res == 0)
-	comp_res = data_ref_compare_tree (DR_OFFSET (dr_info_a->dr),
-					  DR_OFFSET (dr_info_b->dr));
-
       /* See whether the alias is known at compilation time.  */
-      if (comp_res == 0
+      if (operand_equal_p (DR_BASE_ADDRESS (dr_info_a->dr),
+			   DR_BASE_ADDRESS (dr_info_b->dr), 0)
+	  && operand_equal_p (DR_OFFSET (dr_info_a->dr),
+			      DR_OFFSET (dr_info_b->dr), 0)
 	  && TREE_CODE (DR_STEP (dr_info_a->dr)) == INTEGER_CST
 	  && TREE_CODE (DR_STEP (dr_info_b->dr)) == INTEGER_CST
 	  && poly_int_tree_p (segment_length_a)
@@ -3640,10 +3636,6 @@ vect_prune_runtime_alias_test_list (loop
 	 dr_with_seg_len (dr_info_b->dr, segment_length_b,
 			  access_size_b, align_b));
 
-      /* Canonicalize pairs by sorting the two DR members.  */
-      if (comp_res > 0)
-	std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
-
       comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
     }
 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [3/8] Add flags to dr_with_seg_len_pair_t
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
  2019-11-11 18:47 ` [1/8] Move canonicalisation of dr_with_seg_len_pair_ts Richard Sandiford
  2019-11-11 18:47 ` [2/8] Delay swapping data refs in prune_runtime_alias_test_list Richard Sandiford
@ 2019-11-11 18:48 ` Richard Sandiford
  2019-11-15 11:07   ` Richard Biener
  2019-11-11 18:49 ` [4/8] Record whether a dr_with_seg_len contains mixed steps Richard Sandiford
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:48 UTC (permalink / raw)
  To: gcc-patches

This patch adds a bunch of flags to dr_with_seg_len_pair_t,
for use by later patches.  The update to tree-loop-distribution.c
is conservatively correct, but might be tweakable later.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.h (DR_ALIAS_RAW, DR_ALIAS_WAR, DR_ALIAS_WAW)
	(DR_ALIAS_ARBITRARY, DR_ALIAS_SWAPPED, DR_ALIAS_UNSWAPPED): New flags.
	(dr_with_seg_len_pair_t::sequencing): New enum.
	(dr_with_seg_len_pair_t::flags): New member variable.
	(dr_with_seg_len_pair_t::dr_with_seg_len_pair_t): Take a sequencing
	parameter and initialize the flags member variable.
	* tree-loop-distribution.c (compute_alias_check_pairs): Update
	call accordingly.
	* tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Likewise.
	Ensure the two data references in an alias pair are in statement
	order, if there is a defined order.
	* tree-data-ref.c (prune_runtime_alias_test_list): Use
	DR_ALIAS_SWAPPED and DR_ALIAS_UNSWAPPED to record whether we've
	swapped the references in a dr_with_seg_len_pair_t.  OR together
	the flags when merging two dr_with_seg_len_pair_ts.  After merging,
	try to restore the original dr_with_seg_len order, updating the
	flags if that fails.

Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	2019-07-10 19:41:26.383898124 +0100
+++ gcc/tree-data-ref.h	2019-11-11 18:30:50.527193443 +0000
@@ -222,20 +222,107 @@ typedef struct data_reference *data_refe
   unsigned int align;
 };
 
+/* Flags that describe a potential alias between two dr_with_seg_lens.
+   In general, each pair of dr_with_seg_lens represents a composite of
+   multiple access pairs P, so testing flags like DR_IS_READ on the DRs
+   does not give meaningful information.
+
+   DR_ALIAS_RAW:
+	There is a pair in P for which the second reference is a read
+	and the first is a write.
+
+   DR_ALIAS_WAR:
+	There is a pair in P for which the second reference is a write
+	and the first is a read.
+
+   DR_ALIAS_WAW:
+	There is a pair in P for which both references are writes.
+
+   DR_ALIAS_ARBITRARY:
+	Either
+	(a) it isn't possible to classify one pair in P as RAW, WAW or WAR; or
+	(b) there is a pair in P that breaks the ordering assumption below.
+
+	This flag overrides the RAW, WAR and WAW flags above.
+
+   DR_ALIAS_UNSWAPPED:
+   DR_ALIAS_SWAPPED:
+	Temporary flags that indicate whether there is a pair P whose
+	DRs have or haven't been swapped around.
+
+   The ordering assumption mentioned above is that for every pair
+   (DR_A, DR_B) in P:
+
+   (1) The original code accesses n elements for DR_A and n elements for DR_B,
+       interleaved as follows:
+
+	 one access of size DR_A.access_size at DR_A.dr
+	 one access of size DR_B.access_size at DR_B.dr
+	 one access of size DR_A.access_size at DR_A.dr + STEP_A
+	 one access of size DR_B.access_size at DR_B.dr + STEP_B
+	 one access of size DR_A.access_size at DR_A.dr + STEP_A * 2
+	 one access of size DR_B.access_size at DR_B.dr + STEP_B * 2
+	 ...
+
+   (2) The new code accesses the same data in exactly two chunks:
+
+	 one group of accesses spanning |DR_A.seg_len| + DR_A.access_size
+	 one group of accesses spanning |DR_B.seg_len| + DR_B.access_size
+
+   A pair might break this assumption if the DR_A and DR_B accesses
+   in the original or the new code are mingled in some way.  For example,
+   if DR_A.access_size represents the effect of two individual writes
+   to nearby locations, the pair breaks the assumption if those writes
+   occur either side of the access for DR_B.
+
+   Note that DR_ALIAS_ARBITRARY describes whether the ordering assumption
+   fails to hold for any individual pair in P.  If the assumption *does*
+   hold for every pair in P, it doesn't matter whether it holds for the
+   composite pair or not.  In other words, P should represent the complete
+   set of pairs that the composite pair is testing, so only the ordering
+   of two accesses in the same member of P matters.  */
+const unsigned int DR_ALIAS_RAW = 1U << 0;
+const unsigned int DR_ALIAS_WAR = 1U << 1;
+const unsigned int DR_ALIAS_WAW = 1U << 2;
+const unsigned int DR_ALIAS_ARBITRARY = 1U << 3;
+const unsigned int DR_ALIAS_SWAPPED = 1U << 4;
+const unsigned int DR_ALIAS_UNSWAPPED = 1U << 5;
+
 /* This struct contains two dr_with_seg_len objects with aliasing data
    refs.  Two comparisons are generated from them.  */
 
 class dr_with_seg_len_pair_t
 {
 public:
-  dr_with_seg_len_pair_t (const dr_with_seg_len& d1,
-			       const dr_with_seg_len& d2)
-    : first (d1), second (d2) {}
+  /* WELL_ORDERED indicates that the ordering assumption described above
+     DR_ALIAS_ARBITRARY holds.  REORDERED indicates that it doesn't.  */
+  enum sequencing { WELL_ORDERED, REORDERED };
+
+  dr_with_seg_len_pair_t (const dr_with_seg_len &,
+			  const dr_with_seg_len &, sequencing);
 
   dr_with_seg_len first;
   dr_with_seg_len second;
+  unsigned int flags;
 };
 
+inline dr_with_seg_len_pair_t::
+dr_with_seg_len_pair_t (const dr_with_seg_len &d1, const dr_with_seg_len &d2,
+			sequencing seq)
+  : first (d1), second (d2), flags (0)
+{
+  if (DR_IS_READ (d1.dr) && DR_IS_WRITE (d2.dr))
+    flags |= DR_ALIAS_WAR;
+  else if (DR_IS_WRITE (d1.dr) && DR_IS_READ (d2.dr))
+    flags |= DR_ALIAS_RAW;
+  else if (DR_IS_WRITE (d1.dr) && DR_IS_WRITE (d2.dr))
+    flags |= DR_ALIAS_WAW;
+  else
+    gcc_unreachable ();
+  if (seq == REORDERED)
+    flags |= DR_ALIAS_ARBITRARY;
+}
+
 enum data_dependence_direction {
   dir_positive,
   dir_negative,
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	2019-11-11 18:30:43.207244530 +0000
+++ gcc/tree-loop-distribution.c	2019-11-11 18:30:50.527193443 +0000
@@ -2477,7 +2477,9 @@ compute_alias_check_pairs (class loop *l
 
       dr_with_seg_len_pair_t dr_with_seg_len_pair
 	(dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
-	 dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
+	 dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
+	 /* ??? Would WELL_ORDERED be safe?  */
+	 dr_with_seg_len_pair_t::REORDERED);
 
       comp_alias_pairs->safe_push (dr_with_seg_len_pair);
     }
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2019-11-11 18:30:43.207244530 +0000
+++ gcc/tree-vect-data-refs.c	2019-11-11 18:30:50.531193415 +0000
@@ -3509,10 +3509,13 @@ vect_prune_runtime_alias_test_list (loop
       dr_vec_info *dr_info_b = loop_vinfo->lookup_dr (DDR_B (ddr));
       stmt_vec_info stmt_info_b = dr_info_b->stmt;
 
+      bool preserves_scalar_order_p
+	= vect_preserves_scalar_order_p (dr_info_a, dr_info_b);
+
       /* Skip the pair if inter-iteration dependencies are irrelevant
 	 and intra-iteration dependencies are guaranteed to be honored.  */
       if (ignore_step_p
-	  && (vect_preserves_scalar_order_p (dr_info_a, dr_info_b)
+	  && (preserves_scalar_order_p
 	      || vectorizable_with_step_bound_p (dr_info_a, dr_info_b,
 						 &lower_bound)))
 	{
@@ -3630,11 +3633,21 @@ vect_prune_runtime_alias_test_list (loop
 					   stmt_info_b->stmt);
 	}
 
+      dr_with_seg_len dr_a (dr_info_a->dr, segment_length_a,
+			    access_size_a, align_a);
+      dr_with_seg_len dr_b (dr_info_b->dr, segment_length_b,
+			    access_size_b, align_b);
+      /* Canonicalize the order to be the one that's needed for accurate
+	 RAW, WAR and WAW flags, in cases where the data references are
+	 well-ordered.  The order doesn't really matter otherwise,
+	 but we might as well be consistent.  */
+      if (get_later_stmt (stmt_info_a, stmt_info_b) == stmt_info_a)
+	std::swap (dr_a, dr_b);
+
       dr_with_seg_len_pair_t dr_with_seg_len_pair
-	(dr_with_seg_len (dr_info_a->dr, segment_length_a,
-			  access_size_a, align_a),
-	 dr_with_seg_len (dr_info_b->dr, segment_length_b,
-			  access_size_b, align_b));
+	(dr_a, dr_b, (preserves_scalar_order_p
+		      ? dr_with_seg_len_pair_t::WELL_ORDERED
+		      : dr_with_seg_len_pair_t::REORDERED));
 
       comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
     }
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:30:47.199216669 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:30:50.527193443 +0000
@@ -1503,7 +1503,12 @@ prune_runtime_alias_test_list (vec<dr_wi
       if (comp_res == 0)
 	comp_res = data_ref_compare_tree (DR_INIT (dr_a), DR_INIT (dr_b));
       if (comp_res > 0)
-	std::swap (alias_pair->first, alias_pair->second);
+	{
+	  std::swap (alias_pair->first, alias_pair->second);
+	  alias_pair->flags |= DR_ALIAS_SWAPPED;
+	}
+      else
+	alias_pair->flags |= DR_ALIAS_UNSWAPPED;
     }
 
   /* Sort the collected data ref pairs so that we can scan them once to
@@ -1515,10 +1520,13 @@ prune_runtime_alias_test_list (vec<dr_wi
   for (i = 1; i < alias_pairs->length (); ++i)
     {
       /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
-      dr_with_seg_len *dr_a1 = &(*alias_pairs)[i-1].first,
-		      *dr_b1 = &(*alias_pairs)[i-1].second,
-		      *dr_a2 = &(*alias_pairs)[i].first,
-		      *dr_b2 = &(*alias_pairs)[i].second;
+      dr_with_seg_len_pair_t *alias_pair1 = &(*alias_pairs)[i - 1];
+      dr_with_seg_len_pair_t *alias_pair2 = &(*alias_pairs)[i];
+
+      dr_with_seg_len *dr_a1 = &alias_pair1->first;
+      dr_with_seg_len *dr_b1 = &alias_pair1->second;
+      dr_with_seg_len *dr_a2 = &alias_pair2->first;
+      dr_with_seg_len *dr_b2 = &alias_pair2->second;
 
       /* Remove duplicate data ref pairs.  */
       if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
@@ -1527,6 +1535,7 @@ prune_runtime_alias_test_list (vec<dr_wi
 	    dump_printf (MSG_NOTE, "found equal ranges %T, %T and %T, %T\n",
 			 DR_REF (dr_a1->dr), DR_REF (dr_b1->dr),
 			 DR_REF (dr_a2->dr), DR_REF (dr_b2->dr));
+	  alias_pair1->flags |= alias_pair2->flags;
 	  alias_pairs->ordered_remove (i--);
 	  continue;
 	}
@@ -1632,10 +1641,26 @@ prune_runtime_alias_test_list (vec<dr_wi
 	    dump_printf (MSG_NOTE, "merging ranges for %T, %T and %T, %T\n",
 			 DR_REF (dr_a1->dr), DR_REF (dr_b1->dr),
 			 DR_REF (dr_a2->dr), DR_REF (dr_b2->dr));
+	  alias_pair1->flags |= alias_pair2->flags;
 	  alias_pairs->ordered_remove (i);
 	  i--;
 	}
     }
+
+  /* Try to restore the original dr_with_seg_len order within each
+     dr_with_seg_len_pair_t.  If we ended up combining swapped and
+     unswapped pairs into the same check, we have to invalidate any
+     RAW, WAR and WAW information for it.  */
+  FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
+    {
+      unsigned int swap_mask = (DR_ALIAS_SWAPPED | DR_ALIAS_UNSWAPPED);
+      unsigned int swapped = (alias_pair->flags & swap_mask);
+      if (swapped == DR_ALIAS_SWAPPED)
+	std::swap (alias_pair->first, alias_pair->second);
+      else if (swapped != DR_ALIAS_UNSWAPPED)
+	alias_pair->flags |= DR_ALIAS_ARBITRARY;
+      alias_pair->flags &= ~swap_mask;
+    }
 }
 
 /* Given LOOP's two data references and segment lengths described by DR_A

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [4/8] Record whether a dr_with_seg_len contains mixed steps
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
                   ` (2 preceding siblings ...)
  2019-11-11 18:48 ` [3/8] Add flags to dr_with_seg_len_pair_t Richard Sandiford
@ 2019-11-11 18:49 ` Richard Sandiford
  2019-11-15 11:08   ` Richard Biener
  2019-11-11 18:50 ` [5/8] Dump the list of merged alias pairs Richard Sandiford
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:49 UTC (permalink / raw)
  To: gcc-patches

prune_runtime_alias_test_list can merge dr_with_seg_len_pair_ts that
have different steps for the first reference or different steps for the
second reference.  This patch adds a flag to record that.

I don't know whether the change to create_intersect_range_checks_index
fixes anything in practice.  It would have to be a corner case if so,
since at present we only merge two alias pairs if either the first or
the second references are identical and only the other references differ.
And the vectoriser uses VF-based segment lengths only if both references
in a pair have the same step.  Either way, it still seems wrong to use
DR_STEP when it doesn't represent all checks that have been merged into
the pair.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.h (DR_ALIAS_MIXED_STEPS): New flag.
	* tree-data-ref.c (prune_runtime_alias_test_list): Set it when
	merging data references with different steps.
	(create_intersect_range_checks_index): Take a
	dr_with_seg_len_pair_t instead of two dr_with_seg_lens.
	Bail out if DR_ALIAS_MIXED_STEPS is set.
	(create_intersect_range_checks): Take a dr_with_seg_len_pair_t
	instead of two dr_with_seg_lens.  Update call to
	create_intersect_range_checks_index.
	(create_runtime_alias_checks): Update call accordingly.

Index: gcc/tree-data-ref.h
===================================================================
--- gcc/tree-data-ref.h	2019-11-11 18:30:50.527193443 +0000
+++ gcc/tree-data-ref.h	2019-11-11 18:30:53.863170161 +0000
@@ -250,6 +250,12 @@ typedef struct data_reference *data_refe
 	Temporary flags that indicate whether there is a pair P whose
 	DRs have or haven't been swapped around.
 
+   DR_ALIAS_MIXED_STEPS:
+	The DR_STEP for one of the data references in the pair does not
+	accurately describe that reference for all members of P.  (Note
+	that the flag does not say anything about whether the DR_STEPs
+	of the two references in the pair are the same.)
+
    The ordering assumption mentioned above is that for every pair
    (DR_A, DR_B) in P:
 
@@ -287,6 +293,7 @@ const unsigned int DR_ALIAS_WAW = 1U <<
 const unsigned int DR_ALIAS_ARBITRARY = 1U << 3;
 const unsigned int DR_ALIAS_SWAPPED = 1U << 4;
 const unsigned int DR_ALIAS_UNSWAPPED = 1U << 5;
+const unsigned int DR_ALIAS_MIXED_STEPS = 1U << 6;
 
 /* This struct contains two dr_with_seg_len objects with aliasing data
    refs.  Two comparisons are generated from them.  */
Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:30:50.527193443 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:30:53.863170161 +0000
@@ -1618,6 +1618,11 @@ prune_runtime_alias_test_list (vec<dr_wi
 	      std::swap (init_a1, init_a2);
 	    }
 
+	  /* The DR_Bs are equal, so only the DR_As can introduce
+	     mixed steps.  */
+	  if (!operand_equal_p (DR_STEP (dr_a1->dr), DR_STEP (dr_a2->dr), 0))
+	    alias_pair1->flags |= DR_ALIAS_MIXED_STEPS;
+
 	  if (new_seg_len_p)
 	    {
 	      dr_a1->seg_len = build_int_cst (TREE_TYPE (dr_a1->seg_len),
@@ -1663,11 +1668,14 @@ prune_runtime_alias_test_list (vec<dr_wi
     }
 }
 
-/* Given LOOP's two data references and segment lengths described by DR_A
-   and DR_B, create expression checking if the two addresses ranges intersect
-   with each other based on index of the two addresses.  This can only be
-   done if DR_A and DR_B referring to the same (array) object and the index
-   is the only difference.  For example:
+/* Try to generate a runtime condition that is true if ALIAS_PAIR is
+   free of aliases, using a condition based on index values instead
+   of a condition based on addresses.  Return true on success,
+   storing the condition in *COND_EXPR.
+
+   This can only be done if the two data references in ALIAS_PAIR access
+   the same array object and the index is the only difference.  For example,
+   if the two data references are DR_A and DR_B:
 
                        DR_A                           DR_B
       data-ref         arr[i]                         arr[j]
@@ -1690,10 +1698,12 @@ prune_runtime_alias_test_list (vec<dr_wi
 
 static bool
 create_intersect_range_checks_index (class loop *loop, tree *cond_expr,
-				     const dr_with_seg_len& dr_a,
-				     const dr_with_seg_len& dr_b)
+				     const dr_with_seg_len_pair_t &alias_pair)
 {
-  if (integer_zerop (DR_STEP (dr_a.dr))
+  const dr_with_seg_len &dr_a = alias_pair.first;
+  const dr_with_seg_len &dr_b = alias_pair.second;
+  if ((alias_pair.flags & DR_ALIAS_MIXED_STEPS)
+      || integer_zerop (DR_STEP (dr_a.dr))
       || integer_zerop (DR_STEP (dr_b.dr))
       || DR_NUM_DIMENSIONS (dr_a.dr) != DR_NUM_DIMENSIONS (dr_b.dr))
     return false;
@@ -1915,24 +1925,26 @@ get_segment_min_max (const dr_with_seg_l
   *seg_max_out = fold_build_pointer_plus (addr_base, max_reach);
 }
 
-/* Given two data references and segment lengths described by DR_A and DR_B,
-   create expression checking if the two addresses ranges intersect with
-   each other:
+/* Generate a runtime condition that is true if ALIAS_PAIR is free of aliases,
+   storing the condition in *COND_EXPR.  The fallback is to generate a
+   a test that the two accesses do not overlap:
 
-     ((DR_A_addr_0 + DR_A_segment_length_0) <= DR_B_addr_0)
-     || (DR_B_addr_0 + DER_B_segment_length_0) <= DR_A_addr_0))  */
+     end_a <= start_b || end_b <= start_a.  */
 
 static void
 create_intersect_range_checks (class loop *loop, tree *cond_expr,
-			       const dr_with_seg_len& dr_a,
-			       const dr_with_seg_len& dr_b)
+			       const dr_with_seg_len_pair_t &alias_pair)
 {
+  const dr_with_seg_len& dr_a = alias_pair.first;
+  const dr_with_seg_len& dr_b = alias_pair.second;
   *cond_expr = NULL_TREE;
-  if (create_intersect_range_checks_index (loop, cond_expr, dr_a, dr_b))
+  if (create_intersect_range_checks_index (loop, cond_expr, alias_pair))
     return;
 
   unsigned HOST_WIDE_INT min_align;
   tree_code cmp_code;
+  /* We don't have to check DR_ALIAS_MIXED_STEPS here, since both versions
+     are equivalent.  This is just an optimization heuristic.  */
   if (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST
       && TREE_CODE (DR_STEP (dr_b.dr)) == INTEGER_CST)
     {
@@ -1989,18 +2001,19 @@ create_runtime_alias_checks (class loop
   tree part_cond_expr;
 
   fold_defer_overflow_warnings ();
-  for (size_t i = 0, s = alias_pairs->length (); i < s; ++i)
+  dr_with_seg_len_pair_t *alias_pair;
+  unsigned int i;
+  FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
     {
-      const dr_with_seg_len& dr_a = (*alias_pairs)[i].first;
-      const dr_with_seg_len& dr_b = (*alias_pairs)[i].second;
-
+      gcc_assert (alias_pair->flags);
       if (dump_enabled_p ())
 	dump_printf (MSG_NOTE,
 		     "create runtime check for data references %T and %T\n",
-		     DR_REF (dr_a.dr), DR_REF (dr_b.dr));
+		     DR_REF (alias_pair->first.dr),
+		     DR_REF (alias_pair->second.dr));
 
       /* Create condition expression for each pair data references.  */
-      create_intersect_range_checks (loop, &part_cond_expr, dr_a, dr_b);
+      create_intersect_range_checks (loop, &part_cond_expr, *alias_pair);
       if (*cond_expr)
 	*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 				  *cond_expr, part_cond_expr);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [5/8] Dump the list of merged alias pairs
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
                   ` (3 preceding siblings ...)
  2019-11-11 18:49 ` [4/8] Record whether a dr_with_seg_len contains mixed steps Richard Sandiford
@ 2019-11-11 18:50 ` Richard Sandiford
  2019-11-15 11:08   ` Richard Biener
  2019-11-11 18:51 ` [6/8] Print the type of alias check in a dump message Richard Sandiford
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:50 UTC (permalink / raw)
  To: gcc-patches

This patch dumps the final (merged) list of alias pairs.  It also adds:

- WAW and RAW versions of vect-alias-check-8.c
- a "well-ordered" version of vect-alias-check-9.c (i.e. all reads
  before any writes)
- a test with mixed steps in the same alias pair

I also tweaked the test value in vect-alias-check-9.c so that the result
was less likely to be accidentally correct if the alias isn't honoured.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.c (dump_alias_pair): New function.
	(prune_runtime_alias_test_list): Use it to dump each merged alias pair.

gcc/testsuite/
	* gcc.dg/vect/vect-alias-check-8.c: Test for the RAW flag.
	* gcc.dg/vect/vect-alias-check-9.c: Test for the ARBITRARY flag.
	(TEST_VALUE): Use a higher value for early iterations.
	* gcc.dg/vect/vect-alias-check-14.c: New test.
	* gcc.dg/vect/vect-alias-check-15.c: Likewise.
	* gcc.dg/vect/vect-alias-check-16.c: Likewise.
	* gcc.dg/vect/vect-alias-check-17.c: Likewise.

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:30:53.863170161 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:30:57.167147102 +0000
@@ -1453,6 +1453,54 @@ comp_dr_with_seg_len_pair (const void *p
   return 0;
 }
 
+/* Dump information about ALIAS_PAIR, indenting each line by INDENT.  */
+
+static void
+dump_alias_pair (dr_with_seg_len_pair_t *alias_pair, const char *indent)
+{
+  dump_printf (MSG_NOTE, "%sreference:      %T vs. %T\n", indent,
+	       DR_REF (alias_pair->first.dr),
+	       DR_REF (alias_pair->second.dr));
+
+  dump_printf (MSG_NOTE, "%ssegment length: %T", indent,
+	       alias_pair->first.seg_len);
+  if (!operand_equal_p (alias_pair->first.seg_len,
+			alias_pair->second.seg_len, 0))
+    dump_printf (MSG_NOTE, " vs. %T", alias_pair->second.seg_len);
+
+  dump_printf (MSG_NOTE, "\n%saccess size:    ", indent);
+  dump_dec (MSG_NOTE, alias_pair->first.access_size);
+  if (maybe_ne (alias_pair->first.access_size, alias_pair->second.access_size))
+    {
+      dump_printf (MSG_NOTE, " vs. ");
+      dump_dec (MSG_NOTE, alias_pair->second.access_size);
+    }
+
+  dump_printf (MSG_NOTE, "\n%salignment:      %d", indent,
+	       alias_pair->first.align);
+  if (alias_pair->first.align != alias_pair->second.align)
+    dump_printf (MSG_NOTE, " vs. %d", alias_pair->second.align);
+
+  dump_printf (MSG_NOTE, "\n%sflags:         ", indent);
+  if (alias_pair->flags & DR_ALIAS_RAW)
+    dump_printf (MSG_NOTE, " RAW");
+  if (alias_pair->flags & DR_ALIAS_WAR)
+    dump_printf (MSG_NOTE, " WAR");
+  if (alias_pair->flags & DR_ALIAS_WAW)
+    dump_printf (MSG_NOTE, " WAW");
+  if (alias_pair->flags & DR_ALIAS_ARBITRARY)
+    dump_printf (MSG_NOTE, " ARBITRARY");
+  if (alias_pair->flags & DR_ALIAS_SWAPPED)
+    dump_printf (MSG_NOTE, " SWAPPED");
+  if (alias_pair->flags & DR_ALIAS_UNSWAPPED)
+    dump_printf (MSG_NOTE, " UNSWAPPED");
+  if (alias_pair->flags & DR_ALIAS_MIXED_STEPS)
+    dump_printf (MSG_NOTE, " MIXED_STEPS");
+  if (alias_pair->flags == 0)
+    dump_printf (MSG_NOTE, " <none>");
+  dump_printf (MSG_NOTE, "\n");
+}
+
 /* Merge alias checks recorded in ALIAS_PAIRS and remove redundant ones.
    FACTOR is number of iterations that each data reference is accessed.
 
@@ -1656,6 +1704,8 @@ prune_runtime_alias_test_list (vec<dr_wi
      dr_with_seg_len_pair_t.  If we ended up combining swapped and
      unswapped pairs into the same check, we have to invalidate any
      RAW, WAR and WAW information for it.  */
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE, "merged alias checks:\n");
   FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
     {
       unsigned int swap_mask = (DR_ALIAS_SWAPPED | DR_ALIAS_UNSWAPPED);
@@ -1665,6 +1715,8 @@ prune_runtime_alias_test_list (vec<dr_wi
       else if (swapped != DR_ALIAS_UNSWAPPED)
 	alias_pair->flags |= DR_ALIAS_ARBITRARY;
       alias_pair->flags &= ~swap_mask;
+      if (dump_enabled_p ())
+	dump_alias_pair (alias_pair, "  ");
     }
 }
 
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c	2019-03-08 18:15:02.280871184 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c	2019-11-11 18:30:57.167147102 +0000
@@ -58,3 +58,5 @@ main (void)
   FOR_EACH_TYPE (DO_TEST)
   return 0;
 }
+
+/* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c	2019-03-08 18:15:02.244871320 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c	2019-11-11 18:30:57.167147102 +0000
@@ -17,7 +17,7 @@ #define FOR_EACH_TYPE(M) \
   M (sll) M (ull) \
   M (float) M (double)
 
-#define TEST_VALUE(I) ((I) * 5 / 2)
+#define TEST_VALUE(I) ((I) * 17 / 2)
 
 #define ADD_TEST(TYPE)				\
   void __attribute__((noinline, noclone))	\
@@ -51,3 +51,5 @@ main (void)
   FOR_EACH_TYPE (DO_TEST)
   return 0;
 }
+
+/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c	2019-11-11 18:30:57.167147102 +0000
@@ -0,0 +1,62 @@
+#define N 200
+#define M 4
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 17 / 2)
+
+#define ADD_TEST(TYPE)				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *a, TYPE *b)		\
+  {						\
+    for (int i = 0; i < N; i += 2)		\
+      {						\
+	TYPE b0 = b[i + 0];			\
+	TYPE b1 = b[i + 1];			\
+	a[i + 0] = b0 + 2;			\
+	a[i + 1] = b1 + 3;			\
+      }						\
+  }
+
+#define DO_TEST(TYPE)						\
+  for (int j = 0; j < M; ++j)					\
+    {								\
+      TYPE a[N + M];						\
+      for (int i = 0; i < N + M; ++i)				\
+	a[i] = TEST_VALUE (i);					\
+      test_##TYPE (a + j, a);					\
+      for (int i = 0; i < N; i += 2)				\
+	{							\
+	  TYPE base1 = j == 0 ? TEST_VALUE (i) : a[i];		\
+	  TYPE base2 = j <= 1 ? TEST_VALUE (i + 1) : a[i + 1];	\
+	  if (a[i + j] != (TYPE) (base1 + 2)			\
+	      || a[i + j + 1] != (TYPE) (base2 + 3))		\
+	    __builtin_abort ();					\
+	}							\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c	2019-11-11 18:30:57.167147102 +0000
@@ -0,0 +1,59 @@
+#define N 200
+#define DIST 32
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define ADD_TEST(TYPE)				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *x, TYPE *y)		\
+  {						\
+    for (int i = 0; i < N; ++i)			\
+      {						\
+	x[i] = i;				\
+	y[i] = 42 - i * 2;			\
+      }						\
+  }
+
+#define DO_TEST(TYPE)						\
+  for (int i = 0; i < DIST * 2; ++i)				\
+    {								\
+      TYPE a[N + DIST * 2] = {};				\
+      test_##TYPE (a + DIST, a + i);				\
+      for (int j = 0; j < N + DIST * 2; ++j)			\
+	{							\
+	  TYPE expected = 0;					\
+	  if (i > DIST && j >= i && j < i + N)			\
+	    expected = 42 - (j - i) * 2;			\
+	  if (j >= DIST && j < DIST + N)			\
+	    expected = j - DIST;				\
+	  if (i <= DIST && j >= i && j < i + N)			\
+	    expected = 42 - (j - i) * 2;			\
+	  if (expected != a[j])					\
+	    __builtin_abort ();					\
+	}							\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c	2019-11-11 18:30:57.167147102 +0000
@@ -0,0 +1,64 @@
+#define N 200
+#define DIST 32
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 13 / 2)
+
+#define ADD_TEST(TYPE)				\
+  TYPE __attribute__((noinline, noclone))	\
+  test_##TYPE (TYPE *x, TYPE *y)		\
+  {						\
+    TYPE res = 0;				\
+    for (int i = 0; i < N; ++i)			\
+      {						\
+	x[i] = i;				\
+	res += y[i];				\
+      }						\
+    return res;					\
+  }
+
+#define DO_TEST(TYPE)						\
+  for (int i = 0; i < DIST * 2; ++i)				\
+    {								\
+      TYPE a[N + DIST * 2];					\
+      for (int j = 0; j < N + DIST * 2; ++j)			\
+	a[j] = TEST_VALUE (j);					\
+      TYPE res = test_##TYPE (a + DIST, a + i);			\
+      for (int j = 0; j < N; ++j)				\
+	if (a[j + DIST] != (TYPE) j)				\
+	  __builtin_abort ();					\
+      TYPE expected_res = 0;					\
+      for (int j = i; j < i + N; ++j)				\
+	if (i <= DIST && j >= DIST && j < DIST + N)		\
+	  expected_res += j - DIST;				\
+	else							\
+	  expected_res += TEST_VALUE (j);			\
+      if (expected_res != res)					\
+	__builtin_abort ();					\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c	2019-11-11 18:30:57.167147102 +0000
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target vect_load_lanes } */
+
+struct s { int x[100]; };
+
+void
+f (struct s *s1, int a, int b)
+{
+  for (int i = 0; i < 32; ++i)
+    s1->x[a + i] = s1->x[b + i * 2] + s1->x[b + i * 3];
+}
+
+/* { dg-final { scan-tree-dump {flags: *[^\n]*MIXED_STEPS} "vect" } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [6/8] Print the type of alias check in a dump message
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
                   ` (4 preceding siblings ...)
  2019-11-11 18:50 ` [5/8] Dump the list of merged alias pairs Richard Sandiford
@ 2019-11-11 18:51 ` Richard Sandiford
  2019-11-15 11:09   ` Richard Biener
  2019-11-11 18:52 ` [7/8] Use a single comparison for index-based alias checks Richard Sandiford
  2019-11-11 19:02 ` [8/8] Optimise WAR and WAW " Richard Sandiford
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:51 UTC (permalink / raw)
  To: gcc-patches

This patch prints a message to say how an alias check is being
implemented.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.c (create_intersect_range_checks_index)
	(create_intersect_range_checks): Print dump messages.

gcc/testsuite/
	* gcc.dg/vect/vect-alias-check-1.c: Test for the type of alias check.
	* gcc.dg/vect/vect-alias-check-8.c: Likewise.
	* gcc.dg/vect/vect-alias-check-9.c: Likewise.
	* gcc.dg/vect/vect-alias-check-10.c: Likewise.
	* gcc.dg/vect/vect-alias-check-11.c: Likewise.
	* gcc.dg/vect/vect-alias-check-12.c: Likewise.
	* gcc.dg/vect/vect-alias-check-13.c: Likewise.
	* gcc.dg/vect/vect-alias-check-14.c: Likewise.
	* gcc.dg/vect/vect-alias-check-15.c: Likewise.
	* gcc.dg/vect/vect-alias-check-16.c: Likewise.
	* gcc.dg/vect/vect-alias-check-17.c: Likewise.

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:31:01.263118514 +0000
@@ -1890,6 +1890,8 @@ create_intersect_range_checks_index (cla
       else
 	*cond_expr = part_cond_expr;
     }
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE, "using an index-based overlap test\n");
   return true;
 }
 
@@ -2037,6 +2039,8 @@ create_intersect_range_checks (class loo
     = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
 	fold_build2 (cmp_code, boolean_type_node, seg_a_max, seg_b_min),
 	fold_build2 (cmp_code, boolean_type_node, seg_b_max, seg_a_min));
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE, "using an address-based overlap test\n");
 }
 
 /* Create a conditional expression that represents the run-time checks for
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c	2019-03-08 18:15:02.304871094 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c	2019-11-11 18:31:01.247118627 +0000
@@ -15,3 +15,5 @@ fn1 ()
 }
 
 /* { dg-final { scan-tree-dump "improved number of alias checks from \[0-9\]* to 1" "vect" } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c	2019-11-11 18:31:01.251118599 +0000
@@ -60,3 +60,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c	2019-11-11 18:31:01.251118599 +0000
@@ -53,3 +53,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c	2019-03-08 18:15:02.248871307 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c	2019-11-11 18:31:01.251118599 +0000
@@ -65,3 +65,6 @@ main (void)
   FOR_EACH_TYPE (DO_TEST)
   return 0;
 }
+
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c	2019-03-08 18:15:02.292871138 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c	2019-11-11 18:31:01.251118599 +0000
@@ -95,3 +95,6 @@ main (void)
 /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-24, 24\)} "vect" { target vect_double } } } */
 /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-32, 32\)} "vect" { target vect_double } } } */
 /* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */
+
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c	2019-03-08 18:15:02.252871290 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c	2019-11-11 18:31:01.251118599 +0000
@@ -95,3 +95,6 @@ main (void)
 /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 24\)} "vect" { target vect_double } } } */
 /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 32\)} "vect" { target vect_double } } } */
 /* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */
+
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-13.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-13.c	2019-03-08 18:15:02.296871124 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-13.c	2019-11-11 18:31:01.251118599 +0000
@@ -18,4 +18,6 @@ f2 (int *x, long step2, int n)
 
 /* { dg-final { scan-tree-dump {need run-time check that [^\n]*step1[^\n]* is nonzero} "vect" } } */
 /* { dg-final { scan-tree-dump-not {need run-time check that [^\n]*step2[^\n]* is nonzero} "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
 /* { dg-final { scan-tree-dump-times {LOOP VECTORIZED} 2 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c	2019-11-11 18:31:01.251118599 +0000
@@ -60,3 +60,5 @@ main (void)
 
 /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
 /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c	2019-11-11 18:31:01.251118599 +0000
@@ -57,3 +57,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c	2019-11-11 18:31:01.251118599 +0000
@@ -62,3 +62,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c	2019-11-11 18:30:57.167147102 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c	2019-11-11 18:31:01.251118599 +0000
@@ -11,3 +11,5 @@ f (struct s *s1, int a, int b)
 }
 
 /* { dg-final { scan-tree-dump {flags: *[^\n]*MIXED_STEPS} "vect" } } */
+/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [7/8] Use a single comparison for index-based alias checks
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
                   ` (5 preceding siblings ...)
  2019-11-11 18:51 ` [6/8] Print the type of alias check in a dump message Richard Sandiford
@ 2019-11-11 18:52 ` Richard Sandiford
  2019-11-15 11:12   ` Richard Biener
  2019-11-11 19:02 ` [8/8] Optimise WAR and WAW " Richard Sandiford
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 18:52 UTC (permalink / raw)
  To: gcc-patches

This patch rewrites the index-based alias checks to use conditions
of the form:

  (unsigned T) (a - b + bias) <= limit

E.g. before the patch:

  struct s { int x[100]; };

  void
  f1 (struct s *s1, int a, int b)
  {
    for (int i = 0; i < 32; ++i)
      s1->x[i + a] += s1->x[i + b];
  }

used:

        add     w3, w1, 3
        cmp     w3, w2
        add     w3, w2, 3
        ccmp    w1, w3, 0, ge
        ble     .L2

whereas after the patch it uses:

        sub     w3, w1, w2
        add     w3, w3, 3
        cmp     w3, 6
        bls     .L2

The patch also fixes the seg_len1 and seg_len2 negation for cases in
which seg_len is a "negative unsigned" value narrower than 64 bits,
like it is for 32-bit targets.  Previously we'd end up with values
like 0xffffffff000000001 instead of 1.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.c (create_intersect_range_checks_index): Rewrite
	the index tests to have the form (unsigned T) (B - A + bias) <= limit.

gcc/testsuite/
	* gcc.dg/vect/vect-alias-check-18.c: New test.
	* gcc.dg/vect/vect-alias-check-19.c: Likewise.
	* gcc.dg/vect/vect-alias-check-20.c: Likewise.

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:31:01.263118514 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:31:05.099091743 +0000
@@ -1744,7 +1744,9 @@ prune_runtime_alias_test_list (vec<dr_wi
 
    We can create expression based on index rather than address:
 
-     (i_0 + 4 < j_0 || j_0 + 4 < i_0)
+     (unsigned) (i_0 - j_0 + 3) <= 6
+
+   i.e. the indices are less than 4 apart.
 
    Note evolution step of index needs to be considered in comparison.  */
 
@@ -1781,15 +1783,8 @@ create_intersect_range_checks_index (cla
   if (neg_step)
     {
       abs_step = -abs_step;
-      seg_len1 = -seg_len1;
-      seg_len2 = -seg_len2;
-    }
-  else
-    {
-      /* Include the access size in the length, so that we only have one
-	 tree addition below.  */
-      seg_len1 += dr_a.access_size;
-      seg_len2 += dr_b.access_size;
+      seg_len1 = (-wi::to_poly_wide (dr_a.seg_len)).force_uhwi ();
+      seg_len2 = (-wi::to_poly_wide (dr_b.seg_len)).force_uhwi ();
     }
 
   /* Infer the number of iterations with which the memory segment is accessed
@@ -1803,16 +1798,13 @@ create_intersect_range_checks_index (cla
       || !can_div_trunc_p (seg_len2 + abs_step - 1, abs_step, &niter_len2))
     return false;
 
-  poly_uint64 niter_access1 = 0, niter_access2 = 0;
-  if (neg_step)
-    {
-      /* Divide each access size by the byte step, rounding up.  */
-      if (!can_div_trunc_p (dr_a.access_size - abs_step - 1,
-			    abs_step, &niter_access1)
-	  || !can_div_trunc_p (dr_b.access_size + abs_step - 1,
-			       abs_step, &niter_access2))
-	return false;
-    }
+  /* Divide each access size by the byte step, rounding up.  */
+  poly_uint64 niter_access1, niter_access2;
+  if (!can_div_trunc_p (dr_a.access_size + abs_step - 1,
+			abs_step, &niter_access1)
+      || !can_div_trunc_p (dr_b.access_size + abs_step - 1,
+			   abs_step, &niter_access2))
+    return false;
 
   unsigned int i;
   for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
@@ -1852,38 +1844,87 @@ create_intersect_range_checks_index (cla
 	 index of data reference.  Like segment length, index length is
 	 linear function of the number of iterations with index_step as
 	 the coefficient, i.e, niter_len * idx_step.  */
-      tree idx_len1 = fold_build2 (MULT_EXPR, TREE_TYPE (min1), idx_step,
-				   build_int_cst (TREE_TYPE (min1),
-						  niter_len1));
-      tree idx_len2 = fold_build2 (MULT_EXPR, TREE_TYPE (min2), idx_step,
-				   build_int_cst (TREE_TYPE (min2),
-						  niter_len2));
-      tree max1 = fold_build2 (PLUS_EXPR, TREE_TYPE (min1), min1, idx_len1);
-      tree max2 = fold_build2 (PLUS_EXPR, TREE_TYPE (min2), min2, idx_len2);
-      /* Adjust ranges for negative step.  */
+      offset_int abs_idx_step = offset_int::from (wi::to_wide (idx_step),
+						  SIGNED);
       if (neg_step)
-	{
-	  /* IDX_LEN1 and IDX_LEN2 are negative in this case.  */
-	  std::swap (min1, max1);
-	  std::swap (min2, max2);
-
-	  /* As with the lengths just calculated, we've measured the access
-	     sizes in iterations, so multiply them by the index step.  */
-	  tree idx_access1
-	    = fold_build2 (MULT_EXPR, TREE_TYPE (min1), idx_step,
-			   build_int_cst (TREE_TYPE (min1), niter_access1));
-	  tree idx_access2
-	    = fold_build2 (MULT_EXPR, TREE_TYPE (min2), idx_step,
-			   build_int_cst (TREE_TYPE (min2), niter_access2));
-
-	  /* MINUS_EXPR because the above values are negative.  */
-	  max1 = fold_build2 (MINUS_EXPR, TREE_TYPE (max1), max1, idx_access1);
-	  max2 = fold_build2 (MINUS_EXPR, TREE_TYPE (max2), max2, idx_access2);
-	}
-      tree part_cond_expr
-	= fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
-	    fold_build2 (LE_EXPR, boolean_type_node, max1, min2),
-	    fold_build2 (LE_EXPR, boolean_type_node, max2, min1));
+	abs_idx_step = -abs_idx_step;
+      poly_offset_int idx_len1 = abs_idx_step * niter_len1;
+      poly_offset_int idx_len2 = abs_idx_step * niter_len2;
+      poly_offset_int idx_access1 = abs_idx_step * niter_access1;
+      poly_offset_int idx_access2 = abs_idx_step * niter_access2;
+
+      gcc_assert (known_ge (idx_len1, 0)
+		  && known_ge (idx_len2, 0)
+		  && known_ge (idx_access1, 0)
+		  && known_ge (idx_access2, 0));
+
+      /* Each access has the following pattern, with lengths measured
+	 in units of INDEX:
+
+	      <-- idx_len -->
+	      <--- A: -ve step --->
+	      +-----+-------+-----+-------+-----+
+	      | n-1 | ..... |  0  | ..... | n-1 |
+	      +-----+-------+-----+-------+-----+
+			    <--- B: +ve step --->
+			    <-- idx_len -->
+			    |
+			   min
+
+	 where "n" is the number of scalar iterations covered by the segment
+	 and where each access spans idx_access units.
+
+	 A is the range of bytes accessed when the step is negative,
+	 B is the range when the step is positive.
+
+	 When checking for general overlap, we need to test whether
+	 the range:
+
+	   [min1 + low_offset1, min2 + high_offset1 + idx_access1 - 1]
+
+	 overlaps:
+
+	   [min2 + low_offset2, min2 + high_offset2 + idx_access2 - 1]
+
+	 where:
+
+	    low_offsetN = +ve step ? 0 : -idx_lenN;
+	   high_offsetN = +ve step ? idx_lenN : 0;
+
+	 This is equivalent to testing whether:
+
+	   min1 + low_offset1 <= min2 + high_offset2 + idx_access2 - 1
+	   && min2 + low_offset2 <= min1 + high_offset1 + idx_access1 - 1
+
+	 Converting this into a single test, there is an overlap if:
+
+	   0 <= min2 - min1 + bias <= limit
+
+	 where  bias = high_offset2 + idx_access2 - 1 - low_offset1
+	       limit = (high_offset1 - low_offset1 + idx_access1 - 1)
+		     + (high_offset2 - low_offset2 + idx_access2 - 1)
+	  i.e. limit = idx_len1 + idx_access1 - 1 + idx_len2 + idx_access2 - 1
+
+	 Combining the tests requires limit to be computable in an unsigned
+	 form of the index type; if it isn't, we fall back to the usual
+	 pointer-based checks.  */
+      poly_offset_int limit = (idx_len1 + idx_access1 - 1
+			       + idx_len2 + idx_access2 - 1);
+      tree utype = unsigned_type_for (TREE_TYPE (min1));
+      if (!wi::fits_to_tree_p (limit, utype))
+	return false;
+
+      poly_offset_int low_offset1 = neg_step ? -idx_len1 : 0;
+      poly_offset_int high_offset2 = neg_step ? 0 : idx_len2;
+      poly_offset_int bias = high_offset2 + idx_access2 - 1 - low_offset1;
+
+      tree subject = fold_build2 (MINUS_EXPR, utype,
+				  fold_convert (utype, min2),
+				  fold_convert (utype, min1));
+      subject = fold_build2 (PLUS_EXPR, utype, subject,
+			     wide_int_to_tree (utype, bias));
+      tree part_cond_expr = fold_build2 (GT_EXPR, boolean_type_node, subject,
+					 wide_int_to_tree (utype, limit));
       if (*cond_expr)
 	*cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
 				  *cond_expr, part_cond_expr);
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c	2019-11-11 18:31:05.099091743 +0000
@@ -0,0 +1,64 @@
+#define N 200
+#define DIST 32
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 11 / 2)
+
+#define ADD_TEST(TYPE)				\
+  TYPE a_##TYPE[N * 2];				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (int x, int y)			\
+  {						\
+    for (int i = 0; i < N; ++i)			\
+      a_##TYPE[x - i] += a_##TYPE[y - i];	\
+  }
+
+#define DO_TEST(TYPE)						\
+  for (int i = 0; i < DIST * 2; ++i)				\
+    {								\
+      for (int j = 0; j < N + DIST * 2; ++j)			\
+	a_##TYPE[j] = TEST_VALUE (j);				\
+      test_##TYPE (i + N - 1, DIST + N - 1);			\
+      for (int j = 0; j < N + DIST * 2; ++j)			\
+	{							\
+	  TYPE expected;					\
+	  if (j < i || j >= i + N)				\
+	    expected = TEST_VALUE (j);				\
+	  else if (i >= DIST)					\
+	    expected = ((TYPE) TEST_VALUE (j)			\
+			+ (TYPE) TEST_VALUE (j + DIST - i));	\
+	  else							\
+	    expected = ((TYPE) TEST_VALUE (j)			\
+			+ a_##TYPE[j + DIST - i]);		\
+	  if (expected != a_##TYPE[j])				\
+	    __builtin_abort ();					\
+	}							\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c	2019-11-11 18:31:05.099091743 +0000
@@ -0,0 +1,62 @@
+#define N 200
+#define DIST 32
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define ADD_TEST(TYPE)				\
+  TYPE a_##TYPE[N * 2];				\
+  void __attribute__((noinline, noclone))	\
+  test_##TYPE (int x, int y)			\
+  {						\
+    for (int i = 0; i < N; ++i)			\
+      {						\
+	a_##TYPE[i + x] = i;			\
+	a_##TYPE[i + y] = 42 - i * 2;		\
+      }						\
+  }
+
+#define DO_TEST(TYPE)						\
+  for (int i = 0; i < DIST * 2; ++i)				\
+    {								\
+      __builtin_memset (a_##TYPE, 0, sizeof (a_##TYPE));	\
+      test_##TYPE (DIST, i);					\
+      for (int j = 0; j < N + DIST * 2; ++j)			\
+	{							\
+	  TYPE expected = 0;					\
+	  if (i > DIST && j >= i && j < i + N)			\
+	    expected = 42 - (j - i) * 2;			\
+	  if (j >= DIST && j < DIST + N)			\
+	    expected = j - DIST;				\
+	  if (i <= DIST && j >= i && j < i + N)			\
+	    expected = 42 - (j - i) * 2;			\
+	  if (expected != a_##TYPE[j])				\
+	    __builtin_abort ();					\
+	}							\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
===================================================================
--- /dev/null	2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c	2019-11-11 18:31:05.099091743 +0000
@@ -0,0 +1,66 @@
+#define N 200
+#define DIST 32
+
+typedef signed char sc;
+typedef unsigned char uc;
+typedef signed short ss;
+typedef unsigned short us;
+typedef int si;
+typedef unsigned int ui;
+typedef signed long long sll;
+typedef unsigned long long ull;
+
+#define FOR_EACH_TYPE(M) \
+  M (sc) M (uc) \
+  M (ss) M (us) \
+  M (si) M (ui) \
+  M (sll) M (ull) \
+  M (float) M (double)
+
+#define TEST_VALUE(I) ((I) * 11 / 2)
+
+#define ADD_TEST(TYPE)				\
+  TYPE a_##TYPE[N * 2];				\
+  TYPE __attribute__((noinline, noclone))	\
+  test_##TYPE (int x, int y)			\
+  {						\
+    TYPE res = 0;				\
+    for (int i = 0; i < N; ++i)			\
+      {						\
+	a_##TYPE[i + x] = i;			\
+	res += a_##TYPE[i + y];			\
+      }						\
+    return res;					\
+  }
+
+#define DO_TEST(TYPE)						\
+  for (int i = 0; i < DIST * 2; ++i)				\
+    {								\
+      for (int j = 0; j < N + DIST * 2; ++j)			\
+	a_##TYPE[j] = TEST_VALUE (j);				\
+      TYPE res = test_##TYPE (DIST, i);				\
+      for (int j = 0; j < N; ++j)				\
+	if (a_##TYPE[j + DIST] != (TYPE) j)			\
+	  __builtin_abort ();					\
+      TYPE expected_res = 0;					\
+      for (int j = i; j < i + N; ++j)				\
+	if (i <= DIST && j >= DIST && j < DIST + N)		\
+	  expected_res += j - DIST;				\
+	else							\
+	  expected_res += TEST_VALUE (j);			\
+      if (expected_res != res)					\
+	__builtin_abort ();					\
+    }
+
+FOR_EACH_TYPE (ADD_TEST)
+
+int
+main (void)
+{
+  FOR_EACH_TYPE (DO_TEST)
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */
+/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [8/8] Optimise WAR and WAW alias checks
  2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
                   ` (6 preceding siblings ...)
  2019-11-11 18:52 ` [7/8] Use a single comparison for index-based alias checks Richard Sandiford
@ 2019-11-11 19:02 ` Richard Sandiford
  2019-11-18 11:05   ` Richard Biener
  7 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-11 19:02 UTC (permalink / raw)
  To: gcc-patches

For:

  void
  f1 (int *x, int *y)
  {
    for (int i = 0; i < 32; ++i)
      x[i] += y[i];
  }

we checked at runtime whether one vector at x would overlap one vector
at y.  But in cases like this, the vector code would handle x <= y just
fine, since any write to address A still happens after any read from
address A.  The only problem is if x is ahead of y by less than a
vector.

The same is true for two writes:

  void
  f2 (int *x, int *y)
  {
    for (int i = 0; i < 32; ++i)
      {
        x[i] = i;
        y[i] = 2;
      }
  }

if y <= x then a vector write at y after a vector write at x would
have the same net effect as the original scalar writes.

This patch optimises the alias checks for these two cases.  E.g.,
before the patch, f1 used:

        add     x2, x0, 15
        sub     x2, x2, x1
        cmp     x2, 30
        bls     .L2

whereas after the patch it uses:

        add     x2, x1, 4
        sub     x2, x0, x2
        cmp     x2, 8
        bls     .L2

Read-after-write cases like:

  int
  f3 (int *x, int *y)
  {
    int res = 0;
    for (int i = 0; i < 32; ++i)
      {
        x[i] = i;
        res += y[i];
      }
    return res;
  }

can cope with x == y, but otherwise don't allow overlap in either
direction.  Since checking for x == y at runtime would require extra
code, we're probably better off sticking with the current overlap test.

An overlap test is also needed if the scalar or vector accesses covered
by the alias check are mixed together, rather than all statements for
the second access following all statements for the first access.

The new code for gcc.target/aarch64/sve/var_strict_[135].c is slightly
better than before.


2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>

gcc/
	* tree-data-ref.c (create_intersect_range_checks_index): If the
	alias pair describes simple WAW and WAR dependencies, just check
	whether the first B access overlaps later A accesses.
	(create_waw_or_war_checks): New function that performs the same
	optimization on addresses.
	(create_intersect_range_checks): Call it.

gcc/testsuite/
	* gcc.dg/vect/vect-alias-check-8.c: Expect WAR/WAW checks to be used.
	* gcc.dg/vect/vect-alias-check-14.c: Likewise.
	* gcc.dg/vect/vect-alias-check-15.c: Likewise.
	* gcc.dg/vect/vect-alias-check-18.c: Likewise.
	* gcc.dg/vect/vect-alias-check-19.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_1.c: Update expected sequence.
	* gcc.target/aarch64/sve/var_stride_2.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_3.c: Likewise.
	* gcc.target/aarch64/sve/var_stride_5.c: Likewise.

Index: gcc/tree-data-ref.c
===================================================================
--- gcc/tree-data-ref.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/tree-data-ref.c	2019-11-11 18:32:13.186616541 +0000
@@ -1806,6 +1806,8 @@ create_intersect_range_checks_index (cla
 			   abs_step, &niter_access2))
     return false;
 
+  bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 0;
+
   unsigned int i;
   for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
     {
@@ -1907,16 +1909,57 @@ create_intersect_range_checks_index (cla
 
 	 Combining the tests requires limit to be computable in an unsigned
 	 form of the index type; if it isn't, we fall back to the usual
-	 pointer-based checks.  */
-      poly_offset_int limit = (idx_len1 + idx_access1 - 1
-			       + idx_len2 + idx_access2 - 1);
+	 pointer-based checks.
+
+	 We can do better if DR_B is a write and if DR_A and DR_B are
+	 well-ordered in both the original and the new code (see the
+	 comment above the DR_ALIAS_* flags for details).  In this case
+	 we know that for each i in [0, n-1], the write performed by
+	 access i of DR_B occurs after access numbers j<=i of DR_A in
+	 both the original and the new code.  Any write or anti
+	 dependencies wrt those DR_A accesses are therefore maintained.
+
+	 We just need to make sure that each individual write in DR_B does not
+	 overlap any higher-indexed access in DR_A; such DR_A accesses happen
+	 after the DR_B access in the original code but happen before it in
+	 the new code.
+
+	 We know the steps for both accesses are equal, so by induction, we
+	 just need to test whether the first write of DR_B overlaps a later
+	 access of DR_A.  In other words, we need to move min1 along by
+	 one iteration:
+
+	   min1' = min1 + idx_step
+
+	 and use the ranges:
+
+	   [min1' + low_offset1', min1' + high_offset1' + idx_access1 - 1]
+
+	 and:
+
+	   [min2, min2 + idx_access2 - 1]
+
+	 where:
+
+	    low_offset1' = +ve step ? 0 : -(idx_len1 - |idx_step|)
+	   high_offset1' = +ve_step ? idx_len1 - |idx_step| : 0.  */
+      if (waw_or_war_p)
+	idx_len1 -= abs_idx_step;
+
+      poly_offset_int limit = idx_len1 + idx_access1 - 1 + idx_access2 - 1;
+      if (!waw_or_war_p)
+	limit += idx_len2;
+
       tree utype = unsigned_type_for (TREE_TYPE (min1));
       if (!wi::fits_to_tree_p (limit, utype))
 	return false;
 
       poly_offset_int low_offset1 = neg_step ? -idx_len1 : 0;
-      poly_offset_int high_offset2 = neg_step ? 0 : idx_len2;
+      poly_offset_int high_offset2 = neg_step || waw_or_war_p ? 0 : idx_len2;
       poly_offset_int bias = high_offset2 + idx_access2 - 1 - low_offset1;
+      /* Equivalent to adding IDX_STEP to MIN1.  */
+      if (waw_or_war_p)
+	bias -= wi::to_offset (idx_step);
 
       tree subject = fold_build2 (MINUS_EXPR, utype,
 				  fold_convert (utype, min2),
@@ -1932,7 +1975,169 @@ create_intersect_range_checks_index (cla
 	*cond_expr = part_cond_expr;
     }
   if (dump_enabled_p ())
-    dump_printf (MSG_NOTE, "using an index-based overlap test\n");
+    {
+      if (waw_or_war_p)
+	dump_printf (MSG_NOTE, "using an index-based WAR/WAW test\n");
+      else
+	dump_printf (MSG_NOTE, "using an index-based overlap test\n");
+    }
+  return true;
+}
+
+/* A subroutine of create_intersect_range_checks, with a subset of the
+   same arguments.  Try to optimize cases in which the second access
+   is a write and in which some overlap is valid.  */
+
+static bool
+create_waw_or_war_checks (tree *cond_expr,
+			  const dr_with_seg_len_pair_t &alias_pair)
+{
+  const dr_with_seg_len& dr_a = alias_pair.first;
+  const dr_with_seg_len& dr_b = alias_pair.second;
+
+  /* Check for cases in which:
+
+     (a) DR_B is always a write;
+     (b) the accesses are well-ordered in both the original and new code
+	 (see the comment above the DR_ALIAS_* flags for details); and
+     (c) the DR_STEPs describe all access pairs covered by ALIAS_PAIR.  */
+  if (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW))
+    return false;
+
+  /* Check for equal (but possibly variable) steps.  */
+  tree step = DR_STEP (dr_a.dr);
+  if (!operand_equal_p (step, DR_STEP (dr_b.dr)))
+    return false;
+
+  /* Make sure that we can operate on sizetype without loss of precision.  */
+  tree addr_type = TREE_TYPE (DR_BASE_ADDRESS (dr_a.dr));
+  if (TYPE_PRECISION (addr_type) != TYPE_PRECISION (sizetype))
+    return false;
+
+  /* All addresses involved are known to have a common alignment ALIGN.
+     We can therefore subtract ALIGN from an exclusive endpoint to get
+     an inclusive endpoint.  In the best (and common) case, ALIGN is the
+     same as the access sizes of both DRs, and so subtracting ALIGN
+     cancels out the addition of an access size.  */
+  unsigned int align = MIN (dr_a.align, dr_b.align);
+  poly_uint64 last_chunk_a = dr_a.access_size - align;
+  poly_uint64 last_chunk_b = dr_b.access_size - align;
+
+  /* Get a boolean expression that is true when the step is negative.  */
+  tree indicator = dr_direction_indicator (dr_a.dr);
+  tree neg_step = fold_build2 (LT_EXPR, boolean_type_node,
+			       fold_convert (ssizetype, indicator),
+			       ssize_int (0));
+
+  /* Get lengths in sizetype.  */
+  tree seg_len_a
+    = fold_convert (sizetype, rewrite_to_non_trapping_overflow (dr_a.seg_len));
+  step = fold_convert (sizetype, rewrite_to_non_trapping_overflow (step));
+
+  /* Each access has the following pattern:
+
+	  <- |seg_len| ->
+	  <--- A: -ve step --->
+	  +-----+-------+-----+-------+-----+
+	  | n-1 | ..... |  0  | ..... | n-1 |
+	  +-----+-------+-----+-------+-----+
+			<--- B: +ve step --->
+			<- |seg_len| ->
+			|
+		   base address
+
+     where "n" is the number of scalar iterations covered by the segment.
+
+     A is the range of bytes accessed when the step is negative,
+     B is the range when the step is positive.
+
+     We know that DR_B is a write.  We also know (from checking that
+     DR_A and DR_B are well-ordered) that for each i in [0, n-1],
+     the write performed by access i of DR_B occurs after access numbers
+     j<=i of DR_A in both the original and the new code.  Any write or
+     anti dependencies wrt those DR_A accesses are therefore maintained.
+
+     We just need to make sure that each individual write in DR_B does not
+     overlap any higher-indexed access in DR_A; such DR_A accesses happen
+     after the DR_B access in the original code but happen before it in
+     the new code.
+
+     We know the steps for both accesses are equal, so by induction, we
+     just need to test whether the first write of DR_B overlaps a later
+     access of DR_A.  In other words, we need to move addr_a along by
+     one iteration:
+
+       addr_a' = addr_a + step
+
+     and check whether:
+
+       [addr_b, addr_b + last_chunk_b]
+
+     overlaps:
+
+       [addr_a' + low_offset_a, addr_a' + high_offset_a + last_chunk_a]
+
+     where [low_offset_a, high_offset_a] spans accesses [1, n-1].  I.e.:
+
+	low_offset_a = +ve step ? 0 : seg_len_a - step
+       high_offset_a = +ve step ? seg_len_a - step : 0
+
+     This is equivalent to testing whether:
+
+       addr_a' + low_offset_a <= addr_b + last_chunk_b
+       && addr_b <= addr_a' + high_offset_a + last_chunk_a
+
+     Converting this into a single test, there is an overlap if:
+
+       0 <= addr_b + last_chunk_b - addr_a' - low_offset_a <= limit
+
+     where limit = high_offset_a - low_offset_a + last_chunk_a + last_chunk_b
+
+     If DR_A is performed, limit + |step| - last_chunk_b is known to be
+     less than the size of the object underlying DR_A.  We also know
+     that last_chunk_b <= |step|; this is checked elsewhere if it isn't
+     guaranteed at compile time.  There can therefore be no overflow if
+     "limit" is calculated in an unsigned type with pointer precision.  */
+  tree addr_a = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a.dr),
+					 DR_OFFSET (dr_a.dr));
+  addr_a = fold_build_pointer_plus (addr_a, DR_INIT (dr_a.dr));
+
+  tree addr_b = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b.dr),
+					 DR_OFFSET (dr_b.dr));
+  addr_b = fold_build_pointer_plus (addr_b, DR_INIT (dr_b.dr));
+
+  /* Advance ADDR_A by one iteration and adjust the length to compensate.  */
+  addr_a = fold_build_pointer_plus (addr_a, step);
+  tree seg_len_a_minus_step = fold_build2 (MINUS_EXPR, sizetype,
+					   seg_len_a, step);
+  if (!CONSTANT_CLASS_P (seg_len_a_minus_step))
+    seg_len_a_minus_step = build1 (SAVE_EXPR, sizetype, seg_len_a_minus_step);
+
+  tree low_offset_a = fold_build3 (COND_EXPR, sizetype, neg_step,
+				   seg_len_a_minus_step, size_zero_node);
+  if (!CONSTANT_CLASS_P (low_offset_a))
+    low_offset_a = build1 (SAVE_EXPR, sizetype, low_offset_a);
+
+  /* We could use COND_EXPR <neg_step, size_zero_node, seg_len_a_minus_step>,
+     but it's usually more efficient to reuse the LOW_OFFSET_A result.  */
+  tree high_offset_a = fold_build2 (MINUS_EXPR, sizetype, seg_len_a_minus_step,
+				    low_offset_a);
+
+  /* The amount added to addr_b - addr_a'.  */
+  tree bias = fold_build2 (MINUS_EXPR, sizetype,
+			   size_int (last_chunk_b), low_offset_a);
+
+  tree limit = fold_build2 (MINUS_EXPR, sizetype, high_offset_a, low_offset_a);
+  limit = fold_build2 (PLUS_EXPR, sizetype, limit,
+		       size_int (last_chunk_a + last_chunk_b));
+
+  tree subject = fold_build2 (POINTER_DIFF_EXPR, ssizetype, addr_b, addr_a);
+  subject = fold_build2 (PLUS_EXPR, sizetype,
+			 fold_convert (sizetype, subject), bias);
+
+  *cond_expr = fold_build2 (GT_EXPR, boolean_type_node, subject, limit);
+  if (dump_enabled_p ())
+    dump_printf (MSG_NOTE, "using an address-based WAR/WAW test\n");
   return true;
 }
 
@@ -2036,6 +2241,9 @@ create_intersect_range_checks (class loo
   if (create_intersect_range_checks_index (loop, cond_expr, alias_pair))
     return;
 
+  if (create_waw_or_war_checks (cond_expr, alias_pair))
+    return;
+
   unsigned HOST_WIDE_INT min_align;
   tree_code cmp_code;
   /* We don't have to check DR_ALIAS_MIXED_STEPS here, since both versions
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c	2019-11-11 18:32:13.186616541 +0000
@@ -60,5 +60,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
-/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */
 /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c	2019-11-11 18:32:13.186616541 +0000
@@ -60,5 +60,5 @@ main (void)
 
 /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
 /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */
-/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */
 /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c	2019-11-11 18:32:13.186616541 +0000
@@ -57,5 +57,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
-/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */
 /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c	2019-11-11 18:32:13.186616541 +0000
@@ -60,5 +60,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
-/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */
 /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
===================================================================
--- gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c	2019-11-11 18:32:13.186616541 +0000
@@ -58,5 +58,5 @@ main (void)
 }
 
 /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
-/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
+/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */
 /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c	2019-11-11 18:32:13.186616541 +0000
@@ -15,13 +15,9 @@ f (TYPE *x, TYPE *y, unsigned short n, l
 /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
 /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
 /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
-/* Should multiply by (VF-1)*4 rather than (257-1)*4.  */
-/* { dg-final { scan-assembler-not {, 1024} } } */
-/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
-/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
-/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
-/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
-/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
-/* Two range checks and a check for n being zero.  */
-/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
-/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
+/* Should use a WAR check that multiplies by (VF-2)*4 rather than
+   an overlap check that multiplies by (257-1)*4.  */
+/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */
+/* One range check and a check for n being zero.  */
+/* { dg-final { scan-assembler-times {\t(?:cmp|tst)\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c	2019-11-11 18:32:13.186616541 +0000
@@ -15,7 +15,7 @@ f (TYPE *x, TYPE *y, unsigned short n, u
 /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
 /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
 /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
-/* Should multiply by (257-1)*4 rather than (VF-1)*4.  */
+/* Should multiply by (257-1)*4 rather than (VF-1)*4 or (VF-2)*4.  */
 /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x2, 10, 16\n} 1 } } */
 /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x3, 10, 16\n} 1 } } */
 /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c	2019-11-11 18:32:13.186616541 +0000
@@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, int n, long m __att
 /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
 /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
 /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
-/* Should multiply by (VF-1)*4 rather than (257-1)*4.  */
-/* { dg-final { scan-assembler-not {, 1024} } } */
-/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
-/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
-/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
-/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
-/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
-/* Two range checks and a check for n being zero.  */
-/* { dg-final { scan-assembler {\tcmp\t} } } */
-/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
+/* Should use a WAR check that multiplies by (VF-2)*4 rather than
+   an overlap check that multiplies by (257-1)*4.  */
+/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */
+/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */
+/* One range check and a check for n being zero.  */
+/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */
Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c	2019-11-11 18:32:12.000000000 +0000
+++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c	2019-11-11 18:32:13.186616541 +0000
@@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, long n, long m __at
 /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
 /* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
 /* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
-/* Should multiply by (VF-1)*8 rather than (257-1)*8.  */
-/* { dg-final { scan-assembler-not {, 2048} } } */
-/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
-/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */
-/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
-/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
-/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
-/* Two range checks and a check for n being zero.  */
-/* { dg-final { scan-assembler {\tcmp\t} } } */
-/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
+/* Should use a WAR check that multiplies by (VF-2)*8 rather than
+   an overlap check that multiplies by (257-1)*4.  */
+/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #16\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */
+/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */
+/* One range check and a check for n being zero.  */
+/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
+/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [1/8] Move canonicalisation of dr_with_seg_len_pair_ts
  2019-11-11 18:47 ` [1/8] Move canonicalisation of dr_with_seg_len_pair_ts Richard Sandiford
@ 2019-11-15 11:01   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:01 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:46 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> The two users of tree-data-ref's runtime alias checks both canonicalise
> the order of the dr_with_seg_lens in a pair before passing them to
> prune_runtime_alias_test_list.  It's more convenient for later patches
> if prune_runtime_alias_test_list does that itself.


OK.
>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.c (prune_runtime_alias_test_list): Sort the
>         two accesses in each dr_with_seg_len_pair_t before trying to
>         combine separate dr_with_seg_len_pair_ts.
>         * tree-loop-distribution.c (compute_alias_check_pairs): Don't do
>         that here.
>         * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Likewise.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-07-18 09:22:13.893767915 +0100
> +++ gcc/tree-data-ref.c 2019-11-11 18:30:43.203244558 +0000
> @@ -1487,13 +1487,32 @@ comp_dr_with_seg_len_pair (const void *p
>  prune_runtime_alias_test_list (vec<dr_with_seg_len_pair_t> *alias_pairs,
>                                poly_uint64)
>  {
> +  /* Canonicalize each pair so that the base components are ordered wrt
> +     data_ref_compare_tree.  This allows the loop below to merge more
> +     cases.  */
> +  unsigned int i;
> +  dr_with_seg_len_pair_t *alias_pair;
> +  FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
> +    {
> +      data_reference_p dr_a = alias_pair->first.dr;
> +      data_reference_p dr_b = alias_pair->second.dr;
> +      int comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
> +                                           DR_BASE_ADDRESS (dr_b));
> +      if (comp_res == 0)
> +       comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));
> +      if (comp_res == 0)
> +       comp_res = data_ref_compare_tree (DR_INIT (dr_a), DR_INIT (dr_b));
> +      if (comp_res > 0)
> +       std::swap (alias_pair->first, alias_pair->second);
> +    }
> +
>    /* Sort the collected data ref pairs so that we can scan them once to
>       combine all possible aliasing checks.  */
>    alias_pairs->qsort (comp_dr_with_seg_len_pair);
>
>    /* Scan the sorted dr pairs and check if we can combine alias checks
>       of two neighboring dr pairs.  */
> -  for (size_t i = 1; i < alias_pairs->length (); ++i)
> +  for (i = 1; i < alias_pairs->length (); ++i)
>      {
>        /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
>        dr_with_seg_len *dr_a1 = &(*alias_pairs)[i-1].first,
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c        2019-07-10 19:41:20.539944929 +0100
> +++ gcc/tree-loop-distribution.c        2019-11-11 18:30:43.207244530 +0000
> @@ -2457,12 +2457,6 @@ compute_alias_check_pairs (class loop *l
>        struct data_reference *dr_a = DDR_A (ddr);
>        struct data_reference *dr_b = DDR_B (ddr);
>        tree seg_length_a, seg_length_b;
> -      int comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_a),
> -                                           DR_BASE_ADDRESS (dr_b));
> -
> -      if (comp_res == 0)
> -       comp_res = data_ref_compare_tree (DR_OFFSET (dr_a), DR_OFFSET (dr_b));
> -      gcc_assert (comp_res != 0);
>
>        if (latch_dominated_by_data_ref (loop, dr_a))
>         seg_length_a = data_ref_segment_size (dr_a, niters_plus_one);
> @@ -2485,10 +2479,6 @@ compute_alias_check_pairs (class loop *l
>         (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
>          dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
>
> -      /* Canonicalize pairs by sorting the two DR members.  */
> -      if (comp_res > 0)
> -       std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
> -
>        comp_alias_pairs->safe_push (dr_with_seg_len_pair);
>      }
>
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2019-11-08 16:06:18.948980179 +0000
> +++ gcc/tree-vect-data-refs.c   2019-11-11 18:30:43.207244530 +0000
> @@ -3478,7 +3478,6 @@ vect_prune_runtime_alias_test_list (loop
>    /* First, we collect all data ref pairs for aliasing checks.  */
>    FOR_EACH_VEC_ELT (may_alias_ddrs, i, ddr)
>      {
> -      int comp_res;
>        poly_uint64 lower_bound;
>        tree segment_length_a, segment_length_b;
>        unsigned HOST_WIDE_INT access_size_a, access_size_b;
> @@ -3594,14 +3593,11 @@ vect_prune_runtime_alias_test_list (loop
>        align_a = vect_vfa_align (dr_info_a);
>        align_b = vect_vfa_align (dr_info_b);
>
> -      comp_res = data_ref_compare_tree (DR_BASE_ADDRESS (dr_info_a->dr),
> -                                       DR_BASE_ADDRESS (dr_info_b->dr));
> -      if (comp_res == 0)
> -       comp_res = data_ref_compare_tree (DR_OFFSET (dr_info_a->dr),
> -                                         DR_OFFSET (dr_info_b->dr));
> -
>        /* See whether the alias is known at compilation time.  */
> -      if (comp_res == 0
> +      if (operand_equal_p (DR_BASE_ADDRESS (dr_info_a->dr),
> +                          DR_BASE_ADDRESS (dr_info_b->dr), 0)
> +         && operand_equal_p (DR_OFFSET (dr_info_a->dr),
> +                             DR_OFFSET (dr_info_b->dr), 0)
>           && TREE_CODE (DR_STEP (dr_info_a->dr)) == INTEGER_CST
>           && TREE_CODE (DR_STEP (dr_info_b->dr)) == INTEGER_CST
>           && poly_int_tree_p (segment_length_a)
> @@ -3640,10 +3636,6 @@ vect_prune_runtime_alias_test_list (loop
>          dr_with_seg_len (dr_info_b->dr, segment_length_b,
>                           access_size_b, align_b));
>
> -      /* Canonicalize pairs by sorting the two DR members.  */
> -      if (comp_res > 0)
> -       std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second);
> -
>        comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
>      }
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [2/8] Delay swapping data refs in prune_runtime_alias_test_list
  2019-11-11 18:47 ` [2/8] Delay swapping data refs in prune_runtime_alias_test_list Richard Sandiford
@ 2019-11-15 11:06   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:06 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:47 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> prune_runtime_alias_test_list swapped dr_as between two dr_with_seg_len
> pairs before finally deciding whether to merge them.  Bailing out later
> would therefore leave the pairs in an incorrect state.
>
> IMO a better fix would be to split this out into a subroutine that
> produces a temporary dr_with_seg_len on success, rather than changing
> an existing one in-place.  It would then be easy to merge both the dr_as
> and dr_bs if we wanted to, rather than requiring one of them to be equal.
> But here I tried to do something that could be backported if necessary.

OK.

>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.c (prune_runtime_alias_test_list): Delay
>         swapping the dr_as based on init values until we've decided
>         whether to merge them.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:30:43.203244558 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:30:47.199216669 +0000
> @@ -1556,13 +1556,6 @@ prune_runtime_alias_test_list (vec<dr_wi
>           if (!ordered_p (init_a1, init_a2))
>             continue;
>
> -         /* Make sure dr_a1 starts left of dr_a2.  */
> -         if (maybe_gt (init_a1, init_a2))
> -           {
> -             std::swap (*dr_a1, *dr_a2);
> -             std::swap (init_a1, init_a2);
> -           }
> -
>           /* Work out what the segment length would be if we did combine
>              DR_A1 and DR_A2:
>
> @@ -1579,7 +1572,10 @@ prune_runtime_alias_test_list (vec<dr_wi
>
>              The lengths both have sizetype, so the sign is taken from
>              the step instead.  */
> -         if (!operand_equal_p (dr_a1->seg_len, dr_a2->seg_len, 0))
> +         poly_uint64 new_seg_len = 0;
> +         bool new_seg_len_p = !operand_equal_p (dr_a1->seg_len,
> +                                                dr_a2->seg_len, 0);
> +         if (new_seg_len_p)
>             {
>               poly_uint64 seg_len_a1, seg_len_a2;
>               if (!poly_int_tree_p (dr_a1->seg_len, &seg_len_a1)
> @@ -1597,14 +1593,24 @@ prune_runtime_alias_test_list (vec<dr_wi
>               int sign_a = tree_int_cst_sgn (indicator_a);
>               int sign_b = tree_int_cst_sgn (indicator_b);
>
> -             poly_uint64 new_seg_len;
>               if (sign_a <= 0 && sign_b <= 0)
>                 new_seg_len = lower_bound (seg_len_a1, seg_len_a2);
>               else if (sign_a >= 0 && sign_b >= 0)
>                 new_seg_len = upper_bound (seg_len_a1, seg_len_a2);
>               else
>                 continue;
> +           }
> +         /* At this point we're committed to merging the refs.  */
>
> +         /* Make sure dr_a1 starts left of dr_a2.  */
> +         if (maybe_gt (init_a1, init_a2))
> +           {
> +             std::swap (*dr_a1, *dr_a2);
> +             std::swap (init_a1, init_a2);
> +           }
> +
> +         if (new_seg_len_p)
> +           {
>               dr_a1->seg_len = build_int_cst (TREE_TYPE (dr_a1->seg_len),
>                                               new_seg_len);
>               dr_a1->align = MIN (dr_a1->align, known_alignment (new_seg_len));

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/8] Add flags to dr_with_seg_len_pair_t
  2019-11-11 18:48 ` [3/8] Add flags to dr_with_seg_len_pair_t Richard Sandiford
@ 2019-11-15 11:07   ` Richard Biener
  2019-11-15 11:37     ` Richard Sandiford
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:07 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:47 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch adds a bunch of flags to dr_with_seg_len_pair_t,
> for use by later patches.  The update to tree-loop-distribution.c
> is conservatively correct, but might be tweakable later.

Does this all work with interleaved SLP loads/stores like

  a[i] = b[i];
  tem1 = b[i+1];
  tem2 = b[i+2];
  a[i+2] = tem2;
  a[i+1] = tem1;

where we don't preserve the scalar order but emit code at the
latest stmt of the grouped access?  That is, what does
"preserve scalar oder" actually mean?


> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.h (DR_ALIAS_RAW, DR_ALIAS_WAR, DR_ALIAS_WAW)
>         (DR_ALIAS_ARBITRARY, DR_ALIAS_SWAPPED, DR_ALIAS_UNSWAPPED): New flags.
>         (dr_with_seg_len_pair_t::sequencing): New enum.
>         (dr_with_seg_len_pair_t::flags): New member variable.
>         (dr_with_seg_len_pair_t::dr_with_seg_len_pair_t): Take a sequencing
>         parameter and initialize the flags member variable.
>         * tree-loop-distribution.c (compute_alias_check_pairs): Update
>         call accordingly.
>         * tree-vect-data-refs.c (vect_prune_runtime_alias_test_list): Likewise.
>         Ensure the two data references in an alias pair are in statement
>         order, if there is a defined order.
>         * tree-data-ref.c (prune_runtime_alias_test_list): Use
>         DR_ALIAS_SWAPPED and DR_ALIAS_UNSWAPPED to record whether we've
>         swapped the references in a dr_with_seg_len_pair_t.  OR together
>         the flags when merging two dr_with_seg_len_pair_ts.  After merging,
>         try to restore the original dr_with_seg_len order, updating the
>         flags if that fails.
>
> Index: gcc/tree-data-ref.h
> ===================================================================
> --- gcc/tree-data-ref.h 2019-07-10 19:41:26.383898124 +0100
> +++ gcc/tree-data-ref.h 2019-11-11 18:30:50.527193443 +0000
> @@ -222,20 +222,107 @@ typedef struct data_reference *data_refe
>    unsigned int align;
>  };
>
> +/* Flags that describe a potential alias between two dr_with_seg_lens.
> +   In general, each pair of dr_with_seg_lens represents a composite of
> +   multiple access pairs P, so testing flags like DR_IS_READ on the DRs
> +   does not give meaningful information.
> +
> +   DR_ALIAS_RAW:
> +       There is a pair in P for which the second reference is a read
> +       and the first is a write.
> +
> +   DR_ALIAS_WAR:
> +       There is a pair in P for which the second reference is a write
> +       and the first is a read.
> +
> +   DR_ALIAS_WAW:
> +       There is a pair in P for which both references are writes.
> +
> +   DR_ALIAS_ARBITRARY:
> +       Either
> +       (a) it isn't possible to classify one pair in P as RAW, WAW or WAR; or
> +       (b) there is a pair in P that breaks the ordering assumption below.
> +
> +       This flag overrides the RAW, WAR and WAW flags above.
> +
> +   DR_ALIAS_UNSWAPPED:
> +   DR_ALIAS_SWAPPED:
> +       Temporary flags that indicate whether there is a pair P whose
> +       DRs have or haven't been swapped around.
> +
> +   The ordering assumption mentioned above is that for every pair
> +   (DR_A, DR_B) in P:
> +
> +   (1) The original code accesses n elements for DR_A and n elements for DR_B,
> +       interleaved as follows:
> +
> +        one access of size DR_A.access_size at DR_A.dr
> +        one access of size DR_B.access_size at DR_B.dr
> +        one access of size DR_A.access_size at DR_A.dr + STEP_A
> +        one access of size DR_B.access_size at DR_B.dr + STEP_B
> +        one access of size DR_A.access_size at DR_A.dr + STEP_A * 2
> +        one access of size DR_B.access_size at DR_B.dr + STEP_B * 2
> +        ...
> +
> +   (2) The new code accesses the same data in exactly two chunks:
> +
> +        one group of accesses spanning |DR_A.seg_len| + DR_A.access_size
> +        one group of accesses spanning |DR_B.seg_len| + DR_B.access_size
> +
> +   A pair might break this assumption if the DR_A and DR_B accesses
> +   in the original or the new code are mingled in some way.  For example,
> +   if DR_A.access_size represents the effect of two individual writes
> +   to nearby locations, the pair breaks the assumption if those writes
> +   occur either side of the access for DR_B.
> +
> +   Note that DR_ALIAS_ARBITRARY describes whether the ordering assumption
> +   fails to hold for any individual pair in P.  If the assumption *does*
> +   hold for every pair in P, it doesn't matter whether it holds for the
> +   composite pair or not.  In other words, P should represent the complete
> +   set of pairs that the composite pair is testing, so only the ordering
> +   of two accesses in the same member of P matters.  */
> +const unsigned int DR_ALIAS_RAW = 1U << 0;
> +const unsigned int DR_ALIAS_WAR = 1U << 1;
> +const unsigned int DR_ALIAS_WAW = 1U << 2;
> +const unsigned int DR_ALIAS_ARBITRARY = 1U << 3;
> +const unsigned int DR_ALIAS_SWAPPED = 1U << 4;
> +const unsigned int DR_ALIAS_UNSWAPPED = 1U << 5;
> +
>  /* This struct contains two dr_with_seg_len objects with aliasing data
>     refs.  Two comparisons are generated from them.  */
>
>  class dr_with_seg_len_pair_t
>  {
>  public:
> -  dr_with_seg_len_pair_t (const dr_with_seg_len& d1,
> -                              const dr_with_seg_len& d2)
> -    : first (d1), second (d2) {}
> +  /* WELL_ORDERED indicates that the ordering assumption described above
> +     DR_ALIAS_ARBITRARY holds.  REORDERED indicates that it doesn't.  */
> +  enum sequencing { WELL_ORDERED, REORDERED };
> +
> +  dr_with_seg_len_pair_t (const dr_with_seg_len &,
> +                         const dr_with_seg_len &, sequencing);
>
>    dr_with_seg_len first;
>    dr_with_seg_len second;
> +  unsigned int flags;
>  };
>
> +inline dr_with_seg_len_pair_t::
> +dr_with_seg_len_pair_t (const dr_with_seg_len &d1, const dr_with_seg_len &d2,
> +                       sequencing seq)
> +  : first (d1), second (d2), flags (0)
> +{
> +  if (DR_IS_READ (d1.dr) && DR_IS_WRITE (d2.dr))
> +    flags |= DR_ALIAS_WAR;
> +  else if (DR_IS_WRITE (d1.dr) && DR_IS_READ (d2.dr))
> +    flags |= DR_ALIAS_RAW;
> +  else if (DR_IS_WRITE (d1.dr) && DR_IS_WRITE (d2.dr))
> +    flags |= DR_ALIAS_WAW;
> +  else
> +    gcc_unreachable ();
> +  if (seq == REORDERED)
> +    flags |= DR_ALIAS_ARBITRARY;
> +}
> +
>  enum data_dependence_direction {
>    dir_positive,
>    dir_negative,
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c        2019-11-11 18:30:43.207244530 +0000
> +++ gcc/tree-loop-distribution.c        2019-11-11 18:30:50.527193443 +0000
> @@ -2477,7 +2477,9 @@ compute_alias_check_pairs (class loop *l
>
>        dr_with_seg_len_pair_t dr_with_seg_len_pair
>         (dr_with_seg_len (dr_a, seg_length_a, access_size_a, align_a),
> -        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b));
> +        dr_with_seg_len (dr_b, seg_length_b, access_size_b, align_b),
> +        /* ??? Would WELL_ORDERED be safe?  */
> +        dr_with_seg_len_pair_t::REORDERED);
>
>        comp_alias_pairs->safe_push (dr_with_seg_len_pair);
>      }
> Index: gcc/tree-vect-data-refs.c
> ===================================================================
> --- gcc/tree-vect-data-refs.c   2019-11-11 18:30:43.207244530 +0000
> +++ gcc/tree-vect-data-refs.c   2019-11-11 18:30:50.531193415 +0000
> @@ -3509,10 +3509,13 @@ vect_prune_runtime_alias_test_list (loop
>        dr_vec_info *dr_info_b = loop_vinfo->lookup_dr (DDR_B (ddr));
>        stmt_vec_info stmt_info_b = dr_info_b->stmt;
>
> +      bool preserves_scalar_order_p
> +       = vect_preserves_scalar_order_p (dr_info_a, dr_info_b);
> +
>        /* Skip the pair if inter-iteration dependencies are irrelevant
>          and intra-iteration dependencies are guaranteed to be honored.  */
>        if (ignore_step_p
> -         && (vect_preserves_scalar_order_p (dr_info_a, dr_info_b)
> +         && (preserves_scalar_order_p
>               || vectorizable_with_step_bound_p (dr_info_a, dr_info_b,
>                                                  &lower_bound)))
>         {
> @@ -3630,11 +3633,21 @@ vect_prune_runtime_alias_test_list (loop
>                                            stmt_info_b->stmt);
>         }
>
> +      dr_with_seg_len dr_a (dr_info_a->dr, segment_length_a,
> +                           access_size_a, align_a);
> +      dr_with_seg_len dr_b (dr_info_b->dr, segment_length_b,
> +                           access_size_b, align_b);
> +      /* Canonicalize the order to be the one that's needed for accurate
> +        RAW, WAR and WAW flags, in cases where the data references are
> +        well-ordered.  The order doesn't really matter otherwise,
> +        but we might as well be consistent.  */
> +      if (get_later_stmt (stmt_info_a, stmt_info_b) == stmt_info_a)
> +       std::swap (dr_a, dr_b);
> +
>        dr_with_seg_len_pair_t dr_with_seg_len_pair
> -       (dr_with_seg_len (dr_info_a->dr, segment_length_a,
> -                         access_size_a, align_a),
> -        dr_with_seg_len (dr_info_b->dr, segment_length_b,
> -                         access_size_b, align_b));
> +       (dr_a, dr_b, (preserves_scalar_order_p
> +                     ? dr_with_seg_len_pair_t::WELL_ORDERED
> +                     : dr_with_seg_len_pair_t::REORDERED));
>
>        comp_alias_ddrs.safe_push (dr_with_seg_len_pair);
>      }
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:30:47.199216669 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:30:50.527193443 +0000
> @@ -1503,7 +1503,12 @@ prune_runtime_alias_test_list (vec<dr_wi
>        if (comp_res == 0)
>         comp_res = data_ref_compare_tree (DR_INIT (dr_a), DR_INIT (dr_b));
>        if (comp_res > 0)
> -       std::swap (alias_pair->first, alias_pair->second);
> +       {
> +         std::swap (alias_pair->first, alias_pair->second);
> +         alias_pair->flags |= DR_ALIAS_SWAPPED;
> +       }
> +      else
> +       alias_pair->flags |= DR_ALIAS_UNSWAPPED;
>      }
>
>    /* Sort the collected data ref pairs so that we can scan them once to
> @@ -1515,10 +1520,13 @@ prune_runtime_alias_test_list (vec<dr_wi
>    for (i = 1; i < alias_pairs->length (); ++i)
>      {
>        /* Deal with two ddrs (dr_a1, dr_b1) and (dr_a2, dr_b2).  */
> -      dr_with_seg_len *dr_a1 = &(*alias_pairs)[i-1].first,
> -                     *dr_b1 = &(*alias_pairs)[i-1].second,
> -                     *dr_a2 = &(*alias_pairs)[i].first,
> -                     *dr_b2 = &(*alias_pairs)[i].second;
> +      dr_with_seg_len_pair_t *alias_pair1 = &(*alias_pairs)[i - 1];
> +      dr_with_seg_len_pair_t *alias_pair2 = &(*alias_pairs)[i];
> +
> +      dr_with_seg_len *dr_a1 = &alias_pair1->first;
> +      dr_with_seg_len *dr_b1 = &alias_pair1->second;
> +      dr_with_seg_len *dr_a2 = &alias_pair2->first;
> +      dr_with_seg_len *dr_b2 = &alias_pair2->second;
>
>        /* Remove duplicate data ref pairs.  */
>        if (*dr_a1 == *dr_a2 && *dr_b1 == *dr_b2)
> @@ -1527,6 +1535,7 @@ prune_runtime_alias_test_list (vec<dr_wi
>             dump_printf (MSG_NOTE, "found equal ranges %T, %T and %T, %T\n",
>                          DR_REF (dr_a1->dr), DR_REF (dr_b1->dr),
>                          DR_REF (dr_a2->dr), DR_REF (dr_b2->dr));
> +         alias_pair1->flags |= alias_pair2->flags;
>           alias_pairs->ordered_remove (i--);
>           continue;
>         }
> @@ -1632,10 +1641,26 @@ prune_runtime_alias_test_list (vec<dr_wi
>             dump_printf (MSG_NOTE, "merging ranges for %T, %T and %T, %T\n",
>                          DR_REF (dr_a1->dr), DR_REF (dr_b1->dr),
>                          DR_REF (dr_a2->dr), DR_REF (dr_b2->dr));
> +         alias_pair1->flags |= alias_pair2->flags;
>           alias_pairs->ordered_remove (i);
>           i--;
>         }
>      }
> +
> +  /* Try to restore the original dr_with_seg_len order within each
> +     dr_with_seg_len_pair_t.  If we ended up combining swapped and
> +     unswapped pairs into the same check, we have to invalidate any
> +     RAW, WAR and WAW information for it.  */
> +  FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
> +    {
> +      unsigned int swap_mask = (DR_ALIAS_SWAPPED | DR_ALIAS_UNSWAPPED);
> +      unsigned int swapped = (alias_pair->flags & swap_mask);
> +      if (swapped == DR_ALIAS_SWAPPED)
> +       std::swap (alias_pair->first, alias_pair->second);
> +      else if (swapped != DR_ALIAS_UNSWAPPED)
> +       alias_pair->flags |= DR_ALIAS_ARBITRARY;
> +      alias_pair->flags &= ~swap_mask;
> +    }
>  }
>
>  /* Given LOOP's two data references and segment lengths described by DR_A

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [5/8] Dump the list of merged alias pairs
  2019-11-11 18:50 ` [5/8] Dump the list of merged alias pairs Richard Sandiford
@ 2019-11-15 11:08   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:08 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:49 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch dumps the final (merged) list of alias pairs.  It also adds:
>
> - WAW and RAW versions of vect-alias-check-8.c
> - a "well-ordered" version of vect-alias-check-9.c (i.e. all reads
>   before any writes)
> - a test with mixed steps in the same alias pair
>
> I also tweaked the test value in vect-alias-check-9.c so that the result
> was less likely to be accidentally correct if the alias isn't honoured.

OK.

>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.c (dump_alias_pair): New function.
>         (prune_runtime_alias_test_list): Use it to dump each merged alias pair.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-alias-check-8.c: Test for the RAW flag.
>         * gcc.dg/vect/vect-alias-check-9.c: Test for the ARBITRARY flag.
>         (TEST_VALUE): Use a higher value for early iterations.
>         * gcc.dg/vect/vect-alias-check-14.c: New test.
>         * gcc.dg/vect/vect-alias-check-15.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-16.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-17.c: Likewise.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:30:53.863170161 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:30:57.167147102 +0000
> @@ -1453,6 +1453,54 @@ comp_dr_with_seg_len_pair (const void *p
>    return 0;
>  }
>
> +/* Dump information about ALIAS_PAIR, indenting each line by INDENT.  */
> +
> +static void
> +dump_alias_pair (dr_with_seg_len_pair_t *alias_pair, const char *indent)
> +{
> +  dump_printf (MSG_NOTE, "%sreference:      %T vs. %T\n", indent,
> +              DR_REF (alias_pair->first.dr),
> +              DR_REF (alias_pair->second.dr));
> +
> +  dump_printf (MSG_NOTE, "%ssegment length: %T", indent,
> +              alias_pair->first.seg_len);
> +  if (!operand_equal_p (alias_pair->first.seg_len,
> +                       alias_pair->second.seg_len, 0))
> +    dump_printf (MSG_NOTE, " vs. %T", alias_pair->second.seg_len);
> +
> +  dump_printf (MSG_NOTE, "\n%saccess size:    ", indent);
> +  dump_dec (MSG_NOTE, alias_pair->first.access_size);
> +  if (maybe_ne (alias_pair->first.access_size, alias_pair->second.access_size))
> +    {
> +      dump_printf (MSG_NOTE, " vs. ");
> +      dump_dec (MSG_NOTE, alias_pair->second.access_size);
> +    }
> +
> +  dump_printf (MSG_NOTE, "\n%salignment:      %d", indent,
> +              alias_pair->first.align);
> +  if (alias_pair->first.align != alias_pair->second.align)
> +    dump_printf (MSG_NOTE, " vs. %d", alias_pair->second.align);
> +
> +  dump_printf (MSG_NOTE, "\n%sflags:         ", indent);
> +  if (alias_pair->flags & DR_ALIAS_RAW)
> +    dump_printf (MSG_NOTE, " RAW");
> +  if (alias_pair->flags & DR_ALIAS_WAR)
> +    dump_printf (MSG_NOTE, " WAR");
> +  if (alias_pair->flags & DR_ALIAS_WAW)
> +    dump_printf (MSG_NOTE, " WAW");
> +  if (alias_pair->flags & DR_ALIAS_ARBITRARY)
> +    dump_printf (MSG_NOTE, " ARBITRARY");
> +  if (alias_pair->flags & DR_ALIAS_SWAPPED)
> +    dump_printf (MSG_NOTE, " SWAPPED");
> +  if (alias_pair->flags & DR_ALIAS_UNSWAPPED)
> +    dump_printf (MSG_NOTE, " UNSWAPPED");
> +  if (alias_pair->flags & DR_ALIAS_MIXED_STEPS)
> +    dump_printf (MSG_NOTE, " MIXED_STEPS");
> +  if (alias_pair->flags == 0)
> +    dump_printf (MSG_NOTE, " <none>");
> +  dump_printf (MSG_NOTE, "\n");
> +}
> +
>  /* Merge alias checks recorded in ALIAS_PAIRS and remove redundant ones.
>     FACTOR is number of iterations that each data reference is accessed.
>
> @@ -1656,6 +1704,8 @@ prune_runtime_alias_test_list (vec<dr_wi
>       dr_with_seg_len_pair_t.  If we ended up combining swapped and
>       unswapped pairs into the same check, we have to invalidate any
>       RAW, WAR and WAW information for it.  */
> +  if (dump_enabled_p ())
> +    dump_printf (MSG_NOTE, "merged alias checks:\n");
>    FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
>      {
>        unsigned int swap_mask = (DR_ALIAS_SWAPPED | DR_ALIAS_UNSWAPPED);
> @@ -1665,6 +1715,8 @@ prune_runtime_alias_test_list (vec<dr_wi
>        else if (swapped != DR_ALIAS_UNSWAPPED)
>         alias_pair->flags |= DR_ALIAS_ARBITRARY;
>        alias_pair->flags &= ~swap_mask;
> +      if (dump_enabled_p ())
> +       dump_alias_pair (alias_pair, "  ");
>      }
>  }
>
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2019-03-08 18:15:02.280871184 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2019-11-11 18:30:57.167147102 +0000
> @@ -58,3 +58,5 @@ main (void)
>    FOR_EACH_TYPE (DO_TEST)
>    return 0;
>  }
> +
> +/* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c      2019-03-08 18:15:02.244871320 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c      2019-11-11 18:30:57.167147102 +0000
> @@ -17,7 +17,7 @@ #define FOR_EACH_TYPE(M) \
>    M (sll) M (ull) \
>    M (float) M (double)
>
> -#define TEST_VALUE(I) ((I) * 5 / 2)
> +#define TEST_VALUE(I) ((I) * 17 / 2)
>
>  #define ADD_TEST(TYPE)                         \
>    void __attribute__((noinline, noclone))      \
> @@ -51,3 +51,5 @@ main (void)
>    FOR_EACH_TYPE (DO_TEST)
>    return 0;
>  }
> +
> +/* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c     2019-11-11 18:30:57.167147102 +0000
> @@ -0,0 +1,62 @@
> +#define N 200
> +#define M 4
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define TEST_VALUE(I) ((I) * 17 / 2)
> +
> +#define ADD_TEST(TYPE)                         \
> +  void __attribute__((noinline, noclone))      \
> +  test_##TYPE (TYPE *a, TYPE *b)               \
> +  {                                            \
> +    for (int i = 0; i < N; i += 2)             \
> +      {                                                \
> +       TYPE b0 = b[i + 0];                     \
> +       TYPE b1 = b[i + 1];                     \
> +       a[i + 0] = b0 + 2;                      \
> +       a[i + 1] = b1 + 3;                      \
> +      }                                                \
> +  }
> +
> +#define DO_TEST(TYPE)                                          \
> +  for (int j = 0; j < M; ++j)                                  \
> +    {                                                          \
> +      TYPE a[N + M];                                           \
> +      for (int i = 0; i < N + M; ++i)                          \
> +       a[i] = TEST_VALUE (i);                                  \
> +      test_##TYPE (a + j, a);                                  \
> +      for (int i = 0; i < N; i += 2)                           \
> +       {                                                       \
> +         TYPE base1 = j == 0 ? TEST_VALUE (i) : a[i];          \
> +         TYPE base2 = j <= 1 ? TEST_VALUE (i + 1) : a[i + 1];  \
> +         if (a[i + j] != (TYPE) (base1 + 2)                    \
> +             || a[i + j + 1] != (TYPE) (base2 + 3))            \
> +           __builtin_abort ();                                 \
> +       }                                                       \
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c     2019-11-11 18:30:57.167147102 +0000
> @@ -0,0 +1,59 @@
> +#define N 200
> +#define DIST 32
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define ADD_TEST(TYPE)                         \
> +  void __attribute__((noinline, noclone))      \
> +  test_##TYPE (TYPE *x, TYPE *y)               \
> +  {                                            \
> +    for (int i = 0; i < N; ++i)                        \
> +      {                                                \
> +       x[i] = i;                               \
> +       y[i] = 42 - i * 2;                      \
> +      }                                                \
> +  }
> +
> +#define DO_TEST(TYPE)                                          \
> +  for (int i = 0; i < DIST * 2; ++i)                           \
> +    {                                                          \
> +      TYPE a[N + DIST * 2] = {};                               \
> +      test_##TYPE (a + DIST, a + i);                           \
> +      for (int j = 0; j < N + DIST * 2; ++j)                   \
> +       {                                                       \
> +         TYPE expected = 0;                                    \
> +         if (i > DIST && j >= i && j < i + N)                  \
> +           expected = 42 - (j - i) * 2;                        \
> +         if (j >= DIST && j < DIST + N)                        \
> +           expected = j - DIST;                                \
> +         if (i <= DIST && j >= i && j < i + N)                 \
> +           expected = 42 - (j - i) * 2;                        \
> +         if (expected != a[j])                                 \
> +           __builtin_abort ();                                 \
> +       }                                                       \
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c     2019-11-11 18:30:57.167147102 +0000
> @@ -0,0 +1,64 @@
> +#define N 200
> +#define DIST 32
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define TEST_VALUE(I) ((I) * 13 / 2)
> +
> +#define ADD_TEST(TYPE)                         \
> +  TYPE __attribute__((noinline, noclone))      \
> +  test_##TYPE (TYPE *x, TYPE *y)               \
> +  {                                            \
> +    TYPE res = 0;                              \
> +    for (int i = 0; i < N; ++i)                        \
> +      {                                                \
> +       x[i] = i;                               \
> +       res += y[i];                            \
> +      }                                                \
> +    return res;                                        \
> +  }
> +
> +#define DO_TEST(TYPE)                                          \
> +  for (int i = 0; i < DIST * 2; ++i)                           \
> +    {                                                          \
> +      TYPE a[N + DIST * 2];                                    \
> +      for (int j = 0; j < N + DIST * 2; ++j)                   \
> +       a[j] = TEST_VALUE (j);                                  \
> +      TYPE res = test_##TYPE (a + DIST, a + i);                        \
> +      for (int j = 0; j < N; ++j)                              \
> +       if (a[j + DIST] != (TYPE) j)                            \
> +         __builtin_abort ();                                   \
> +      TYPE expected_res = 0;                                   \
> +      for (int j = i; j < i + N; ++j)                          \
> +       if (i <= DIST && j >= DIST && j < DIST + N)             \
> +         expected_res += j - DIST;                             \
> +       else                                                    \
> +         expected_res += TEST_VALUE (j);                       \
> +      if (expected_res != res)                                 \
> +       __builtin_abort ();                                     \
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c     2019-11-11 18:30:57.167147102 +0000
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target vect_load_lanes } */
> +
> +struct s { int x[100]; };
> +
> +void
> +f (struct s *s1, int a, int b)
> +{
> +  for (int i = 0; i < 32; ++i)
> +    s1->x[a + i] = s1->x[b + i * 2] + s1->x[b + i * 3];
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *[^\n]*MIXED_STEPS} "vect" } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [4/8] Record whether a dr_with_seg_len contains mixed steps
  2019-11-11 18:49 ` [4/8] Record whether a dr_with_seg_len contains mixed steps Richard Sandiford
@ 2019-11-15 11:08   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:08 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:48 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> prune_runtime_alias_test_list can merge dr_with_seg_len_pair_ts that
> have different steps for the first reference or different steps for the
> second reference.  This patch adds a flag to record that.
>
> I don't know whether the change to create_intersect_range_checks_index
> fixes anything in practice.  It would have to be a corner case if so,
> since at present we only merge two alias pairs if either the first or
> the second references are identical and only the other references differ.
> And the vectoriser uses VF-based segment lengths only if both references
> in a pair have the same step.  Either way, it still seems wrong to use
> DR_STEP when it doesn't represent all checks that have been merged into
> the pair.

OK.

>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.h (DR_ALIAS_MIXED_STEPS): New flag.
>         * tree-data-ref.c (prune_runtime_alias_test_list): Set it when
>         merging data references with different steps.
>         (create_intersect_range_checks_index): Take a
>         dr_with_seg_len_pair_t instead of two dr_with_seg_lens.
>         Bail out if DR_ALIAS_MIXED_STEPS is set.
>         (create_intersect_range_checks): Take a dr_with_seg_len_pair_t
>         instead of two dr_with_seg_lens.  Update call to
>         create_intersect_range_checks_index.
>         (create_runtime_alias_checks): Update call accordingly.
>
> Index: gcc/tree-data-ref.h
> ===================================================================
> --- gcc/tree-data-ref.h 2019-11-11 18:30:50.527193443 +0000
> +++ gcc/tree-data-ref.h 2019-11-11 18:30:53.863170161 +0000
> @@ -250,6 +250,12 @@ typedef struct data_reference *data_refe
>         Temporary flags that indicate whether there is a pair P whose
>         DRs have or haven't been swapped around.
>
> +   DR_ALIAS_MIXED_STEPS:
> +       The DR_STEP for one of the data references in the pair does not
> +       accurately describe that reference for all members of P.  (Note
> +       that the flag does not say anything about whether the DR_STEPs
> +       of the two references in the pair are the same.)
> +
>     The ordering assumption mentioned above is that for every pair
>     (DR_A, DR_B) in P:
>
> @@ -287,6 +293,7 @@ const unsigned int DR_ALIAS_WAW = 1U <<
>  const unsigned int DR_ALIAS_ARBITRARY = 1U << 3;
>  const unsigned int DR_ALIAS_SWAPPED = 1U << 4;
>  const unsigned int DR_ALIAS_UNSWAPPED = 1U << 5;
> +const unsigned int DR_ALIAS_MIXED_STEPS = 1U << 6;
>
>  /* This struct contains two dr_with_seg_len objects with aliasing data
>     refs.  Two comparisons are generated from them.  */
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:30:50.527193443 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:30:53.863170161 +0000
> @@ -1618,6 +1618,11 @@ prune_runtime_alias_test_list (vec<dr_wi
>               std::swap (init_a1, init_a2);
>             }
>
> +         /* The DR_Bs are equal, so only the DR_As can introduce
> +            mixed steps.  */
> +         if (!operand_equal_p (DR_STEP (dr_a1->dr), DR_STEP (dr_a2->dr), 0))
> +           alias_pair1->flags |= DR_ALIAS_MIXED_STEPS;
> +
>           if (new_seg_len_p)
>             {
>               dr_a1->seg_len = build_int_cst (TREE_TYPE (dr_a1->seg_len),
> @@ -1663,11 +1668,14 @@ prune_runtime_alias_test_list (vec<dr_wi
>      }
>  }
>
> -/* Given LOOP's two data references and segment lengths described by DR_A
> -   and DR_B, create expression checking if the two addresses ranges intersect
> -   with each other based on index of the two addresses.  This can only be
> -   done if DR_A and DR_B referring to the same (array) object and the index
> -   is the only difference.  For example:
> +/* Try to generate a runtime condition that is true if ALIAS_PAIR is
> +   free of aliases, using a condition based on index values instead
> +   of a condition based on addresses.  Return true on success,
> +   storing the condition in *COND_EXPR.
> +
> +   This can only be done if the two data references in ALIAS_PAIR access
> +   the same array object and the index is the only difference.  For example,
> +   if the two data references are DR_A and DR_B:
>
>                         DR_A                           DR_B
>        data-ref         arr[i]                         arr[j]
> @@ -1690,10 +1698,12 @@ prune_runtime_alias_test_list (vec<dr_wi
>
>  static bool
>  create_intersect_range_checks_index (class loop *loop, tree *cond_expr,
> -                                    const dr_with_seg_len& dr_a,
> -                                    const dr_with_seg_len& dr_b)
> +                                    const dr_with_seg_len_pair_t &alias_pair)
>  {
> -  if (integer_zerop (DR_STEP (dr_a.dr))
> +  const dr_with_seg_len &dr_a = alias_pair.first;
> +  const dr_with_seg_len &dr_b = alias_pair.second;
> +  if ((alias_pair.flags & DR_ALIAS_MIXED_STEPS)
> +      || integer_zerop (DR_STEP (dr_a.dr))
>        || integer_zerop (DR_STEP (dr_b.dr))
>        || DR_NUM_DIMENSIONS (dr_a.dr) != DR_NUM_DIMENSIONS (dr_b.dr))
>      return false;
> @@ -1915,24 +1925,26 @@ get_segment_min_max (const dr_with_seg_l
>    *seg_max_out = fold_build_pointer_plus (addr_base, max_reach);
>  }
>
> -/* Given two data references and segment lengths described by DR_A and DR_B,
> -   create expression checking if the two addresses ranges intersect with
> -   each other:
> +/* Generate a runtime condition that is true if ALIAS_PAIR is free of aliases,
> +   storing the condition in *COND_EXPR.  The fallback is to generate a
> +   a test that the two accesses do not overlap:
>
> -     ((DR_A_addr_0 + DR_A_segment_length_0) <= DR_B_addr_0)
> -     || (DR_B_addr_0 + DER_B_segment_length_0) <= DR_A_addr_0))  */
> +     end_a <= start_b || end_b <= start_a.  */
>
>  static void
>  create_intersect_range_checks (class loop *loop, tree *cond_expr,
> -                              const dr_with_seg_len& dr_a,
> -                              const dr_with_seg_len& dr_b)
> +                              const dr_with_seg_len_pair_t &alias_pair)
>  {
> +  const dr_with_seg_len& dr_a = alias_pair.first;
> +  const dr_with_seg_len& dr_b = alias_pair.second;
>    *cond_expr = NULL_TREE;
> -  if (create_intersect_range_checks_index (loop, cond_expr, dr_a, dr_b))
> +  if (create_intersect_range_checks_index (loop, cond_expr, alias_pair))
>      return;
>
>    unsigned HOST_WIDE_INT min_align;
>    tree_code cmp_code;
> +  /* We don't have to check DR_ALIAS_MIXED_STEPS here, since both versions
> +     are equivalent.  This is just an optimization heuristic.  */
>    if (TREE_CODE (DR_STEP (dr_a.dr)) == INTEGER_CST
>        && TREE_CODE (DR_STEP (dr_b.dr)) == INTEGER_CST)
>      {
> @@ -1989,18 +2001,19 @@ create_runtime_alias_checks (class loop
>    tree part_cond_expr;
>
>    fold_defer_overflow_warnings ();
> -  for (size_t i = 0, s = alias_pairs->length (); i < s; ++i)
> +  dr_with_seg_len_pair_t *alias_pair;
> +  unsigned int i;
> +  FOR_EACH_VEC_ELT (*alias_pairs, i, alias_pair)
>      {
> -      const dr_with_seg_len& dr_a = (*alias_pairs)[i].first;
> -      const dr_with_seg_len& dr_b = (*alias_pairs)[i].second;
> -
> +      gcc_assert (alias_pair->flags);
>        if (dump_enabled_p ())
>         dump_printf (MSG_NOTE,
>                      "create runtime check for data references %T and %T\n",
> -                    DR_REF (dr_a.dr), DR_REF (dr_b.dr));
> +                    DR_REF (alias_pair->first.dr),
> +                    DR_REF (alias_pair->second.dr));
>
>        /* Create condition expression for each pair data references.  */
> -      create_intersect_range_checks (loop, &part_cond_expr, dr_a, dr_b);
> +      create_intersect_range_checks (loop, &part_cond_expr, *alias_pair);
>        if (*cond_expr)
>         *cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
>                                   *cond_expr, part_cond_expr);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [6/8] Print the type of alias check in a dump message
  2019-11-11 18:51 ` [6/8] Print the type of alias check in a dump message Richard Sandiford
@ 2019-11-15 11:09   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:09 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:50 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch prints a message to say how an alias check is being
> implemented.

OK.

>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.c (create_intersect_range_checks_index)
>         (create_intersect_range_checks): Print dump messages.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-alias-check-1.c: Test for the type of alias check.
>         * gcc.dg/vect/vect-alias-check-8.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-9.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-10.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-11.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-12.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-13.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-14.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-15.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-16.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-17.c: Likewise.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:30:57.167147102 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:31:01.263118514 +0000
> @@ -1890,6 +1890,8 @@ create_intersect_range_checks_index (cla
>        else
>         *cond_expr = part_cond_expr;
>      }
> +  if (dump_enabled_p ())
> +    dump_printf (MSG_NOTE, "using an index-based overlap test\n");
>    return true;
>  }
>
> @@ -2037,6 +2039,8 @@ create_intersect_range_checks (class loo
>      = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
>         fold_build2 (cmp_code, boolean_type_node, seg_a_max, seg_b_min),
>         fold_build2 (cmp_code, boolean_type_node, seg_b_max, seg_a_min));
> +  if (dump_enabled_p ())
> +    dump_printf (MSG_NOTE, "using an address-based overlap test\n");
>  }
>
>  /* Create a conditional expression that represents the run-time checks for
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c      2019-03-08 18:15:02.304871094 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-1.c      2019-11-11 18:31:01.247118627 +0000
> @@ -15,3 +15,5 @@ fn1 ()
>  }
>
>  /* { dg-final { scan-tree-dump "improved number of alias checks from \[0-9\]* to 1" "vect" } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2019-11-11 18:30:57.167147102 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2019-11-11 18:31:01.251118599 +0000
> @@ -60,3 +60,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c      2019-11-11 18:30:57.167147102 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-9.c      2019-11-11 18:31:01.251118599 +0000
> @@ -53,3 +53,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: [^\n]*ARBITRARY\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c     2019-03-08 18:15:02.248871307 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-10.c     2019-11-11 18:31:01.251118599 +0000
> @@ -65,3 +65,6 @@ main (void)
>    FOR_EACH_TYPE (DO_TEST)
>    return 0;
>  }
> +
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c     2019-03-08 18:15:02.292871138 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-11.c     2019-11-11 18:31:01.251118599 +0000
> @@ -95,3 +95,6 @@ main (void)
>  /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-24, 24\)} "vect" { target vect_double } } } */
>  /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* step[^ ]* \* 8[)]* is outside \(-32, 32\)} "vect" { target vect_double } } } */
>  /* { dg-final { scan-tree-dump {run-time check [^\n]* abs \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */
> +
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c     2019-03-08 18:15:02.252871290 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-12.c     2019-11-11 18:31:01.251118599 +0000
> @@ -95,3 +95,6 @@ main (void)
>  /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 24\)} "vect" { target vect_double } } } */
>  /* { dg-final { scan-tree-dump {no alias between [^\n]* when [^\n]* [_a-z][^ ]* \* 8[)]* is outside \[0, 32\)} "vect" { target vect_double } } } */
>  /* { dg-final { scan-tree-dump {run-time check [^\n]* unsigned \([^*]* \* 8[)]* >= 32} "vect" { target vect_double } } } */
> +
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-13.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-13.c     2019-03-08 18:15:02.296871124 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-13.c     2019-11-11 18:31:01.251118599 +0000
> @@ -18,4 +18,6 @@ f2 (int *x, long step2, int n)
>
>  /* { dg-final { scan-tree-dump {need run-time check that [^\n]*step1[^\n]* is nonzero} "vect" } } */
>  /* { dg-final { scan-tree-dump-not {need run-time check that [^\n]*step2[^\n]* is nonzero} "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
>  /* { dg-final { scan-tree-dump-times {LOOP VECTORIZED} 2 "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c     2019-11-11 18:30:57.167147102 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c     2019-11-11 18:31:01.251118599 +0000
> @@ -60,3 +60,5 @@ main (void)
>
>  /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
>  /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c     2019-11-11 18:30:57.167147102 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c     2019-11-11 18:31:01.251118599 +0000
> @@ -57,3 +57,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c     2019-11-11 18:30:57.167147102 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-16.c     2019-11-11 18:31:01.251118599 +0000
> @@ -62,3 +62,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c     2019-11-11 18:30:57.167147102 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-17.c     2019-11-11 18:31:01.251118599 +0000
> @@ -11,3 +11,5 @@ f (struct s *s1, int a, int b)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *[^\n]*MIXED_STEPS} "vect" } } */
> +/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [7/8] Use a single comparison for index-based alias checks
  2019-11-11 18:52 ` [7/8] Use a single comparison for index-based alias checks Richard Sandiford
@ 2019-11-15 11:12   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 11:12 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:51 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> This patch rewrites the index-based alias checks to use conditions
> of the form:
>
>   (unsigned T) (a - b + bias) <= limit
>
> E.g. before the patch:
>
>   struct s { int x[100]; };
>
>   void
>   f1 (struct s *s1, int a, int b)
>   {
>     for (int i = 0; i < 32; ++i)
>       s1->x[i + a] += s1->x[i + b];
>   }
>
> used:
>
>         add     w3, w1, 3
>         cmp     w3, w2
>         add     w3, w2, 3
>         ccmp    w1, w3, 0, ge
>         ble     .L2
>
> whereas after the patch it uses:
>
>         sub     w3, w1, w2
>         add     w3, w3, 3
>         cmp     w3, 6
>         bls     .L2
>
> The patch also fixes the seg_len1 and seg_len2 negation for cases in
> which seg_len is a "negative unsigned" value narrower than 64 bits,
> like it is for 32-bit targets.  Previously we'd end up with values
> like 0xffffffff000000001 instead of 1.

OK.

>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.c (create_intersect_range_checks_index): Rewrite
>         the index tests to have the form (unsigned T) (B - A + bias) <= limit.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-alias-check-18.c: New test.
>         * gcc.dg/vect/vect-alias-check-19.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-20.c: Likewise.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:31:01.263118514 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:31:05.099091743 +0000
> @@ -1744,7 +1744,9 @@ prune_runtime_alias_test_list (vec<dr_wi
>
>     We can create expression based on index rather than address:
>
> -     (i_0 + 4 < j_0 || j_0 + 4 < i_0)
> +     (unsigned) (i_0 - j_0 + 3) <= 6
> +
> +   i.e. the indices are less than 4 apart.
>
>     Note evolution step of index needs to be considered in comparison.  */
>
> @@ -1781,15 +1783,8 @@ create_intersect_range_checks_index (cla
>    if (neg_step)
>      {
>        abs_step = -abs_step;
> -      seg_len1 = -seg_len1;
> -      seg_len2 = -seg_len2;
> -    }
> -  else
> -    {
> -      /* Include the access size in the length, so that we only have one
> -        tree addition below.  */
> -      seg_len1 += dr_a.access_size;
> -      seg_len2 += dr_b.access_size;
> +      seg_len1 = (-wi::to_poly_wide (dr_a.seg_len)).force_uhwi ();
> +      seg_len2 = (-wi::to_poly_wide (dr_b.seg_len)).force_uhwi ();
>      }
>
>    /* Infer the number of iterations with which the memory segment is accessed
> @@ -1803,16 +1798,13 @@ create_intersect_range_checks_index (cla
>        || !can_div_trunc_p (seg_len2 + abs_step - 1, abs_step, &niter_len2))
>      return false;
>
> -  poly_uint64 niter_access1 = 0, niter_access2 = 0;
> -  if (neg_step)
> -    {
> -      /* Divide each access size by the byte step, rounding up.  */
> -      if (!can_div_trunc_p (dr_a.access_size - abs_step - 1,
> -                           abs_step, &niter_access1)
> -         || !can_div_trunc_p (dr_b.access_size + abs_step - 1,
> -                              abs_step, &niter_access2))
> -       return false;
> -    }
> +  /* Divide each access size by the byte step, rounding up.  */
> +  poly_uint64 niter_access1, niter_access2;
> +  if (!can_div_trunc_p (dr_a.access_size + abs_step - 1,
> +                       abs_step, &niter_access1)
> +      || !can_div_trunc_p (dr_b.access_size + abs_step - 1,
> +                          abs_step, &niter_access2))
> +    return false;
>
>    unsigned int i;
>    for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
> @@ -1852,38 +1844,87 @@ create_intersect_range_checks_index (cla
>          index of data reference.  Like segment length, index length is
>          linear function of the number of iterations with index_step as
>          the coefficient, i.e, niter_len * idx_step.  */
> -      tree idx_len1 = fold_build2 (MULT_EXPR, TREE_TYPE (min1), idx_step,
> -                                  build_int_cst (TREE_TYPE (min1),
> -                                                 niter_len1));
> -      tree idx_len2 = fold_build2 (MULT_EXPR, TREE_TYPE (min2), idx_step,
> -                                  build_int_cst (TREE_TYPE (min2),
> -                                                 niter_len2));
> -      tree max1 = fold_build2 (PLUS_EXPR, TREE_TYPE (min1), min1, idx_len1);
> -      tree max2 = fold_build2 (PLUS_EXPR, TREE_TYPE (min2), min2, idx_len2);
> -      /* Adjust ranges for negative step.  */
> +      offset_int abs_idx_step = offset_int::from (wi::to_wide (idx_step),
> +                                                 SIGNED);
>        if (neg_step)
> -       {
> -         /* IDX_LEN1 and IDX_LEN2 are negative in this case.  */
> -         std::swap (min1, max1);
> -         std::swap (min2, max2);
> -
> -         /* As with the lengths just calculated, we've measured the access
> -            sizes in iterations, so multiply them by the index step.  */
> -         tree idx_access1
> -           = fold_build2 (MULT_EXPR, TREE_TYPE (min1), idx_step,
> -                          build_int_cst (TREE_TYPE (min1), niter_access1));
> -         tree idx_access2
> -           = fold_build2 (MULT_EXPR, TREE_TYPE (min2), idx_step,
> -                          build_int_cst (TREE_TYPE (min2), niter_access2));
> -
> -         /* MINUS_EXPR because the above values are negative.  */
> -         max1 = fold_build2 (MINUS_EXPR, TREE_TYPE (max1), max1, idx_access1);
> -         max2 = fold_build2 (MINUS_EXPR, TREE_TYPE (max2), max2, idx_access2);
> -       }
> -      tree part_cond_expr
> -       = fold_build2 (TRUTH_OR_EXPR, boolean_type_node,
> -           fold_build2 (LE_EXPR, boolean_type_node, max1, min2),
> -           fold_build2 (LE_EXPR, boolean_type_node, max2, min1));
> +       abs_idx_step = -abs_idx_step;
> +      poly_offset_int idx_len1 = abs_idx_step * niter_len1;
> +      poly_offset_int idx_len2 = abs_idx_step * niter_len2;
> +      poly_offset_int idx_access1 = abs_idx_step * niter_access1;
> +      poly_offset_int idx_access2 = abs_idx_step * niter_access2;
> +
> +      gcc_assert (known_ge (idx_len1, 0)
> +                 && known_ge (idx_len2, 0)
> +                 && known_ge (idx_access1, 0)
> +                 && known_ge (idx_access2, 0));
> +
> +      /* Each access has the following pattern, with lengths measured
> +        in units of INDEX:
> +
> +             <-- idx_len -->
> +             <--- A: -ve step --->
> +             +-----+-------+-----+-------+-----+
> +             | n-1 | ..... |  0  | ..... | n-1 |
> +             +-----+-------+-----+-------+-----+
> +                           <--- B: +ve step --->
> +                           <-- idx_len -->
> +                           |
> +                          min
> +
> +        where "n" is the number of scalar iterations covered by the segment
> +        and where each access spans idx_access units.
> +
> +        A is the range of bytes accessed when the step is negative,
> +        B is the range when the step is positive.
> +
> +        When checking for general overlap, we need to test whether
> +        the range:
> +
> +          [min1 + low_offset1, min2 + high_offset1 + idx_access1 - 1]
> +
> +        overlaps:
> +
> +          [min2 + low_offset2, min2 + high_offset2 + idx_access2 - 1]
> +
> +        where:
> +
> +           low_offsetN = +ve step ? 0 : -idx_lenN;
> +          high_offsetN = +ve step ? idx_lenN : 0;
> +
> +        This is equivalent to testing whether:
> +
> +          min1 + low_offset1 <= min2 + high_offset2 + idx_access2 - 1
> +          && min2 + low_offset2 <= min1 + high_offset1 + idx_access1 - 1
> +
> +        Converting this into a single test, there is an overlap if:
> +
> +          0 <= min2 - min1 + bias <= limit
> +
> +        where  bias = high_offset2 + idx_access2 - 1 - low_offset1
> +              limit = (high_offset1 - low_offset1 + idx_access1 - 1)
> +                    + (high_offset2 - low_offset2 + idx_access2 - 1)
> +         i.e. limit = idx_len1 + idx_access1 - 1 + idx_len2 + idx_access2 - 1
> +
> +        Combining the tests requires limit to be computable in an unsigned
> +        form of the index type; if it isn't, we fall back to the usual
> +        pointer-based checks.  */
> +      poly_offset_int limit = (idx_len1 + idx_access1 - 1
> +                              + idx_len2 + idx_access2 - 1);
> +      tree utype = unsigned_type_for (TREE_TYPE (min1));
> +      if (!wi::fits_to_tree_p (limit, utype))
> +       return false;
> +
> +      poly_offset_int low_offset1 = neg_step ? -idx_len1 : 0;
> +      poly_offset_int high_offset2 = neg_step ? 0 : idx_len2;
> +      poly_offset_int bias = high_offset2 + idx_access2 - 1 - low_offset1;
> +
> +      tree subject = fold_build2 (MINUS_EXPR, utype,
> +                                 fold_convert (utype, min2),
> +                                 fold_convert (utype, min1));
> +      subject = fold_build2 (PLUS_EXPR, utype, subject,
> +                            wide_int_to_tree (utype, bias));
> +      tree part_cond_expr = fold_build2 (GT_EXPR, boolean_type_node, subject,
> +                                        wide_int_to_tree (utype, limit));
>        if (*cond_expr)
>         *cond_expr = fold_build2 (TRUTH_AND_EXPR, boolean_type_node,
>                                   *cond_expr, part_cond_expr);
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c     2019-11-11 18:31:05.099091743 +0000
> @@ -0,0 +1,64 @@
> +#define N 200
> +#define DIST 32
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define TEST_VALUE(I) ((I) * 11 / 2)
> +
> +#define ADD_TEST(TYPE)                         \
> +  TYPE a_##TYPE[N * 2];                                \
> +  void __attribute__((noinline, noclone))      \
> +  test_##TYPE (int x, int y)                   \
> +  {                                            \
> +    for (int i = 0; i < N; ++i)                        \
> +      a_##TYPE[x - i] += a_##TYPE[y - i];      \
> +  }
> +
> +#define DO_TEST(TYPE)                                          \
> +  for (int i = 0; i < DIST * 2; ++i)                           \
> +    {                                                          \
> +      for (int j = 0; j < N + DIST * 2; ++j)                   \
> +       a_##TYPE[j] = TEST_VALUE (j);                           \
> +      test_##TYPE (i + N - 1, DIST + N - 1);                   \
> +      for (int j = 0; j < N + DIST * 2; ++j)                   \
> +       {                                                       \
> +         TYPE expected;                                        \
> +         if (j < i || j >= i + N)                              \
> +           expected = TEST_VALUE (j);                          \
> +         else if (i >= DIST)                                   \
> +           expected = ((TYPE) TEST_VALUE (j)                   \
> +                       + (TYPE) TEST_VALUE (j + DIST - i));    \
> +         else                                                  \
> +           expected = ((TYPE) TEST_VALUE (j)                   \
> +                       + a_##TYPE[j + DIST - i]);              \
> +         if (expected != a_##TYPE[j])                          \
> +           __builtin_abort ();                                 \
> +       }                                                       \
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c     2019-11-11 18:31:05.099091743 +0000
> @@ -0,0 +1,62 @@
> +#define N 200
> +#define DIST 32
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define ADD_TEST(TYPE)                         \
> +  TYPE a_##TYPE[N * 2];                                \
> +  void __attribute__((noinline, noclone))      \
> +  test_##TYPE (int x, int y)                   \
> +  {                                            \
> +    for (int i = 0; i < N; ++i)                        \
> +      {                                                \
> +       a_##TYPE[i + x] = i;                    \
> +       a_##TYPE[i + y] = 42 - i * 2;           \
> +      }                                                \
> +  }
> +
> +#define DO_TEST(TYPE)                                          \
> +  for (int i = 0; i < DIST * 2; ++i)                           \
> +    {                                                          \
> +      __builtin_memset (a_##TYPE, 0, sizeof (a_##TYPE));       \
> +      test_##TYPE (DIST, i);                                   \
> +      for (int j = 0; j < N + DIST * 2; ++j)                   \
> +       {                                                       \
> +         TYPE expected = 0;                                    \
> +         if (i > DIST && j >= i && j < i + N)                  \
> +           expected = 42 - (j - i) * 2;                        \
> +         if (j >= DIST && j < DIST + N)                        \
> +           expected = j - DIST;                                \
> +         if (i <= DIST && j >= i && j < i + N)                 \
> +           expected = 42 - (j - i) * 2;                        \
> +         if (expected != a_##TYPE[j])                          \
> +           __builtin_abort ();                                 \
> +       }                                                       \
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c
> ===================================================================
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-20.c     2019-11-11 18:31:05.099091743 +0000
> @@ -0,0 +1,66 @@
> +#define N 200
> +#define DIST 32
> +
> +typedef signed char sc;
> +typedef unsigned char uc;
> +typedef signed short ss;
> +typedef unsigned short us;
> +typedef int si;
> +typedef unsigned int ui;
> +typedef signed long long sll;
> +typedef unsigned long long ull;
> +
> +#define FOR_EACH_TYPE(M) \
> +  M (sc) M (uc) \
> +  M (ss) M (us) \
> +  M (si) M (ui) \
> +  M (sll) M (ull) \
> +  M (float) M (double)
> +
> +#define TEST_VALUE(I) ((I) * 11 / 2)
> +
> +#define ADD_TEST(TYPE)                         \
> +  TYPE a_##TYPE[N * 2];                                \
> +  TYPE __attribute__((noinline, noclone))      \
> +  test_##TYPE (int x, int y)                   \
> +  {                                            \
> +    TYPE res = 0;                              \
> +    for (int i = 0; i < N; ++i)                        \
> +      {                                                \
> +       a_##TYPE[i + x] = i;                    \
> +       res += a_##TYPE[i + y];                 \
> +      }                                                \
> +    return res;                                        \
> +  }
> +
> +#define DO_TEST(TYPE)                                          \
> +  for (int i = 0; i < DIST * 2; ++i)                           \
> +    {                                                          \
> +      for (int j = 0; j < N + DIST * 2; ++j)                   \
> +       a_##TYPE[j] = TEST_VALUE (j);                           \
> +      TYPE res = test_##TYPE (DIST, i);                                \
> +      for (int j = 0; j < N; ++j)                              \
> +       if (a_##TYPE[j + DIST] != (TYPE) j)                     \
> +         __builtin_abort ();                                   \
> +      TYPE expected_res = 0;                                   \
> +      for (int j = i; j < i + N; ++j)                          \
> +       if (i <= DIST && j >= DIST && j < DIST + N)             \
> +         expected_res += j - DIST;                             \
> +       else                                                    \
> +         expected_res += TEST_VALUE (j);                       \
> +      if (expected_res != res)                                 \
> +       __builtin_abort ();                                     \
> +    }
> +
> +FOR_EACH_TYPE (ADD_TEST)
> +
> +int
> +main (void)
> +{
> +  FOR_EACH_TYPE (DO_TEST)
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump {flags: *RAW\n} "vect" { target vect_int } } } */
> +/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/8] Add flags to dr_with_seg_len_pair_t
  2019-11-15 11:07   ` Richard Biener
@ 2019-11-15 11:37     ` Richard Sandiford
  2019-11-15 12:00       ` Richard Biener
  0 siblings, 1 reply; 19+ messages in thread
From: Richard Sandiford @ 2019-11-15 11:37 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Nov 11, 2019 at 7:47 PM Richard Sandiford
> <richard.sandiford@arm.com> wrote:
>>
>> This patch adds a bunch of flags to dr_with_seg_len_pair_t,
>> for use by later patches.  The update to tree-loop-distribution.c
>> is conservatively correct, but might be tweakable later.
>
> Does this all work with interleaved SLP loads/stores like
>
>   a[i] = b[i];
>   tem1 = b[i+1];
>   tem2 = b[i+2];
>   a[i+2] = tem2;
>   a[i+1] = tem1;
>
> where we don't preserve the scalar order but emit code at the
> latest stmt of the grouped access?

Yeah.  vect-alias-check-9.c is a less sophisticated example of that.
In both cases vect_preserves_scalar_order_p is false.

> That is, what does "preserve scalar oder" actually mean?

It's supposed to mean that if a memory access A from the first
group and a memory access B from the second group occur within
the same scalar iteration, the order of the accesses in the
vector loop body will be the same as it was in the scalar loop body.
In other words, the order of the vector stmts honours any dependencies
between the accesses that occur within one iteration of the scalar loop.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [3/8] Add flags to dr_with_seg_len_pair_t
  2019-11-15 11:37     ` Richard Sandiford
@ 2019-11-15 12:00       ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-15 12:00 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Fri, Nov 15, 2019 at 12:33 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> Richard Biener <richard.guenther@gmail.com> writes:
> > On Mon, Nov 11, 2019 at 7:47 PM Richard Sandiford
> > <richard.sandiford@arm.com> wrote:
> >>
> >> This patch adds a bunch of flags to dr_with_seg_len_pair_t,
> >> for use by later patches.  The update to tree-loop-distribution.c
> >> is conservatively correct, but might be tweakable later.
> >
> > Does this all work with interleaved SLP loads/stores like
> >
> >   a[i] = b[i];
> >   tem1 = b[i+1];
> >   tem2 = b[i+2];
> >   a[i+2] = tem2;
> >   a[i+1] = tem1;
> >
> > where we don't preserve the scalar order but emit code at the
> > latest stmt of the grouped access?
>
> Yeah.  vect-alias-check-9.c is a less sophisticated example of that.
> In both cases vect_preserves_scalar_order_p is false.
>
> > That is, what does "preserve scalar oder" actually mean?
>
> It's supposed to mean that if a memory access A from the first
> group and a memory access B from the second group occur within
> the same scalar iteration, the order of the accesses in the
> vector loop body will be the same as it was in the scalar loop body.
> In other words, the order of the vector stmts honours any dependencies
> between the accesses that occur within one iteration of the scalar loop.

OK, I see.  The patch is OK then.

Thanks,
Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [8/8] Optimise WAR and WAW alias checks
  2019-11-11 19:02 ` [8/8] Optimise WAR and WAW " Richard Sandiford
@ 2019-11-18 11:05   ` Richard Biener
  0 siblings, 0 replies; 19+ messages in thread
From: Richard Biener @ 2019-11-18 11:05 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Nov 11, 2019 at 7:52 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>
> For:
>
>   void
>   f1 (int *x, int *y)
>   {
>     for (int i = 0; i < 32; ++i)
>       x[i] += y[i];
>   }
>
> we checked at runtime whether one vector at x would overlap one vector
> at y.  But in cases like this, the vector code would handle x <= y just
> fine, since any write to address A still happens after any read from
> address A.  The only problem is if x is ahead of y by less than a
> vector.
>
> The same is true for two writes:
>
>   void
>   f2 (int *x, int *y)
>   {
>     for (int i = 0; i < 32; ++i)
>       {
>         x[i] = i;
>         y[i] = 2;
>       }
>   }
>
> if y <= x then a vector write at y after a vector write at x would
> have the same net effect as the original scalar writes.
>
> This patch optimises the alias checks for these two cases.  E.g.,
> before the patch, f1 used:
>
>         add     x2, x0, 15
>         sub     x2, x2, x1
>         cmp     x2, 30
>         bls     .L2
>
> whereas after the patch it uses:
>
>         add     x2, x1, 4
>         sub     x2, x0, x2
>         cmp     x2, 8
>         bls     .L2
>
> Read-after-write cases like:
>
>   int
>   f3 (int *x, int *y)
>   {
>     int res = 0;
>     for (int i = 0; i < 32; ++i)
>       {
>         x[i] = i;
>         res += y[i];
>       }
>     return res;
>   }
>
> can cope with x == y, but otherwise don't allow overlap in either
> direction.  Since checking for x == y at runtime would require extra
> code, we're probably better off sticking with the current overlap test.
>
> An overlap test is also needed if the scalar or vector accesses covered
> by the alias check are mixed together, rather than all statements for
> the second access following all statements for the first access.
>
> The new code for gcc.target/aarch64/sve/var_strict_[135].c is slightly
> better than before.

OK.

Thanks,
Richard.

>
> 2019-11-11  Richard Sandiford  <richard.sandiford@arm.com>
>
> gcc/
>         * tree-data-ref.c (create_intersect_range_checks_index): If the
>         alias pair describes simple WAW and WAR dependencies, just check
>         whether the first B access overlaps later A accesses.
>         (create_waw_or_war_checks): New function that performs the same
>         optimization on addresses.
>         (create_intersect_range_checks): Call it.
>
> gcc/testsuite/
>         * gcc.dg/vect/vect-alias-check-8.c: Expect WAR/WAW checks to be used.
>         * gcc.dg/vect/vect-alias-check-14.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-15.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-18.c: Likewise.
>         * gcc.dg/vect/vect-alias-check-19.c: Likewise.
>         * gcc.target/aarch64/sve/var_stride_1.c: Update expected sequence.
>         * gcc.target/aarch64/sve/var_stride_2.c: Likewise.
>         * gcc.target/aarch64/sve/var_stride_3.c: Likewise.
>         * gcc.target/aarch64/sve/var_stride_5.c: Likewise.
>
> Index: gcc/tree-data-ref.c
> ===================================================================
> --- gcc/tree-data-ref.c 2019-11-11 18:32:12.000000000 +0000
> +++ gcc/tree-data-ref.c 2019-11-11 18:32:13.186616541 +0000
> @@ -1806,6 +1806,8 @@ create_intersect_range_checks_index (cla
>                            abs_step, &niter_access2))
>      return false;
>
> +  bool waw_or_war_p = (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW)) == 0;
> +
>    unsigned int i;
>    for (i = 0; i < DR_NUM_DIMENSIONS (dr_a.dr); i++)
>      {
> @@ -1907,16 +1909,57 @@ create_intersect_range_checks_index (cla
>
>          Combining the tests requires limit to be computable in an unsigned
>          form of the index type; if it isn't, we fall back to the usual
> -        pointer-based checks.  */
> -      poly_offset_int limit = (idx_len1 + idx_access1 - 1
> -                              + idx_len2 + idx_access2 - 1);
> +        pointer-based checks.
> +
> +        We can do better if DR_B is a write and if DR_A and DR_B are
> +        well-ordered in both the original and the new code (see the
> +        comment above the DR_ALIAS_* flags for details).  In this case
> +        we know that for each i in [0, n-1], the write performed by
> +        access i of DR_B occurs after access numbers j<=i of DR_A in
> +        both the original and the new code.  Any write or anti
> +        dependencies wrt those DR_A accesses are therefore maintained.
> +
> +        We just need to make sure that each individual write in DR_B does not
> +        overlap any higher-indexed access in DR_A; such DR_A accesses happen
> +        after the DR_B access in the original code but happen before it in
> +        the new code.
> +
> +        We know the steps for both accesses are equal, so by induction, we
> +        just need to test whether the first write of DR_B overlaps a later
> +        access of DR_A.  In other words, we need to move min1 along by
> +        one iteration:
> +
> +          min1' = min1 + idx_step
> +
> +        and use the ranges:
> +
> +          [min1' + low_offset1', min1' + high_offset1' + idx_access1 - 1]
> +
> +        and:
> +
> +          [min2, min2 + idx_access2 - 1]
> +
> +        where:
> +
> +           low_offset1' = +ve step ? 0 : -(idx_len1 - |idx_step|)
> +          high_offset1' = +ve_step ? idx_len1 - |idx_step| : 0.  */
> +      if (waw_or_war_p)
> +       idx_len1 -= abs_idx_step;
> +
> +      poly_offset_int limit = idx_len1 + idx_access1 - 1 + idx_access2 - 1;
> +      if (!waw_or_war_p)
> +       limit += idx_len2;
> +
>        tree utype = unsigned_type_for (TREE_TYPE (min1));
>        if (!wi::fits_to_tree_p (limit, utype))
>         return false;
>
>        poly_offset_int low_offset1 = neg_step ? -idx_len1 : 0;
> -      poly_offset_int high_offset2 = neg_step ? 0 : idx_len2;
> +      poly_offset_int high_offset2 = neg_step || waw_or_war_p ? 0 : idx_len2;
>        poly_offset_int bias = high_offset2 + idx_access2 - 1 - low_offset1;
> +      /* Equivalent to adding IDX_STEP to MIN1.  */
> +      if (waw_or_war_p)
> +       bias -= wi::to_offset (idx_step);
>
>        tree subject = fold_build2 (MINUS_EXPR, utype,
>                                   fold_convert (utype, min2),
> @@ -1932,7 +1975,169 @@ create_intersect_range_checks_index (cla
>         *cond_expr = part_cond_expr;
>      }
>    if (dump_enabled_p ())
> -    dump_printf (MSG_NOTE, "using an index-based overlap test\n");
> +    {
> +      if (waw_or_war_p)
> +       dump_printf (MSG_NOTE, "using an index-based WAR/WAW test\n");
> +      else
> +       dump_printf (MSG_NOTE, "using an index-based overlap test\n");
> +    }
> +  return true;
> +}
> +
> +/* A subroutine of create_intersect_range_checks, with a subset of the
> +   same arguments.  Try to optimize cases in which the second access
> +   is a write and in which some overlap is valid.  */
> +
> +static bool
> +create_waw_or_war_checks (tree *cond_expr,
> +                         const dr_with_seg_len_pair_t &alias_pair)
> +{
> +  const dr_with_seg_len& dr_a = alias_pair.first;
> +  const dr_with_seg_len& dr_b = alias_pair.second;
> +
> +  /* Check for cases in which:
> +
> +     (a) DR_B is always a write;
> +     (b) the accesses are well-ordered in both the original and new code
> +        (see the comment above the DR_ALIAS_* flags for details); and
> +     (c) the DR_STEPs describe all access pairs covered by ALIAS_PAIR.  */
> +  if (alias_pair.flags & ~(DR_ALIAS_WAR | DR_ALIAS_WAW))
> +    return false;
> +
> +  /* Check for equal (but possibly variable) steps.  */
> +  tree step = DR_STEP (dr_a.dr);
> +  if (!operand_equal_p (step, DR_STEP (dr_b.dr)))
> +    return false;
> +
> +  /* Make sure that we can operate on sizetype without loss of precision.  */
> +  tree addr_type = TREE_TYPE (DR_BASE_ADDRESS (dr_a.dr));
> +  if (TYPE_PRECISION (addr_type) != TYPE_PRECISION (sizetype))
> +    return false;
> +
> +  /* All addresses involved are known to have a common alignment ALIGN.
> +     We can therefore subtract ALIGN from an exclusive endpoint to get
> +     an inclusive endpoint.  In the best (and common) case, ALIGN is the
> +     same as the access sizes of both DRs, and so subtracting ALIGN
> +     cancels out the addition of an access size.  */
> +  unsigned int align = MIN (dr_a.align, dr_b.align);
> +  poly_uint64 last_chunk_a = dr_a.access_size - align;
> +  poly_uint64 last_chunk_b = dr_b.access_size - align;
> +
> +  /* Get a boolean expression that is true when the step is negative.  */
> +  tree indicator = dr_direction_indicator (dr_a.dr);
> +  tree neg_step = fold_build2 (LT_EXPR, boolean_type_node,
> +                              fold_convert (ssizetype, indicator),
> +                              ssize_int (0));
> +
> +  /* Get lengths in sizetype.  */
> +  tree seg_len_a
> +    = fold_convert (sizetype, rewrite_to_non_trapping_overflow (dr_a.seg_len));
> +  step = fold_convert (sizetype, rewrite_to_non_trapping_overflow (step));
> +
> +  /* Each access has the following pattern:
> +
> +         <- |seg_len| ->
> +         <--- A: -ve step --->
> +         +-----+-------+-----+-------+-----+
> +         | n-1 | ..... |  0  | ..... | n-1 |
> +         +-----+-------+-----+-------+-----+
> +                       <--- B: +ve step --->
> +                       <- |seg_len| ->
> +                       |
> +                  base address
> +
> +     where "n" is the number of scalar iterations covered by the segment.
> +
> +     A is the range of bytes accessed when the step is negative,
> +     B is the range when the step is positive.
> +
> +     We know that DR_B is a write.  We also know (from checking that
> +     DR_A and DR_B are well-ordered) that for each i in [0, n-1],
> +     the write performed by access i of DR_B occurs after access numbers
> +     j<=i of DR_A in both the original and the new code.  Any write or
> +     anti dependencies wrt those DR_A accesses are therefore maintained.
> +
> +     We just need to make sure that each individual write in DR_B does not
> +     overlap any higher-indexed access in DR_A; such DR_A accesses happen
> +     after the DR_B access in the original code but happen before it in
> +     the new code.
> +
> +     We know the steps for both accesses are equal, so by induction, we
> +     just need to test whether the first write of DR_B overlaps a later
> +     access of DR_A.  In other words, we need to move addr_a along by
> +     one iteration:
> +
> +       addr_a' = addr_a + step
> +
> +     and check whether:
> +
> +       [addr_b, addr_b + last_chunk_b]
> +
> +     overlaps:
> +
> +       [addr_a' + low_offset_a, addr_a' + high_offset_a + last_chunk_a]
> +
> +     where [low_offset_a, high_offset_a] spans accesses [1, n-1].  I.e.:
> +
> +       low_offset_a = +ve step ? 0 : seg_len_a - step
> +       high_offset_a = +ve step ? seg_len_a - step : 0
> +
> +     This is equivalent to testing whether:
> +
> +       addr_a' + low_offset_a <= addr_b + last_chunk_b
> +       && addr_b <= addr_a' + high_offset_a + last_chunk_a
> +
> +     Converting this into a single test, there is an overlap if:
> +
> +       0 <= addr_b + last_chunk_b - addr_a' - low_offset_a <= limit
> +
> +     where limit = high_offset_a - low_offset_a + last_chunk_a + last_chunk_b
> +
> +     If DR_A is performed, limit + |step| - last_chunk_b is known to be
> +     less than the size of the object underlying DR_A.  We also know
> +     that last_chunk_b <= |step|; this is checked elsewhere if it isn't
> +     guaranteed at compile time.  There can therefore be no overflow if
> +     "limit" is calculated in an unsigned type with pointer precision.  */
> +  tree addr_a = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_a.dr),
> +                                        DR_OFFSET (dr_a.dr));
> +  addr_a = fold_build_pointer_plus (addr_a, DR_INIT (dr_a.dr));
> +
> +  tree addr_b = fold_build_pointer_plus (DR_BASE_ADDRESS (dr_b.dr),
> +                                        DR_OFFSET (dr_b.dr));
> +  addr_b = fold_build_pointer_plus (addr_b, DR_INIT (dr_b.dr));
> +
> +  /* Advance ADDR_A by one iteration and adjust the length to compensate.  */
> +  addr_a = fold_build_pointer_plus (addr_a, step);
> +  tree seg_len_a_minus_step = fold_build2 (MINUS_EXPR, sizetype,
> +                                          seg_len_a, step);
> +  if (!CONSTANT_CLASS_P (seg_len_a_minus_step))
> +    seg_len_a_minus_step = build1 (SAVE_EXPR, sizetype, seg_len_a_minus_step);
> +
> +  tree low_offset_a = fold_build3 (COND_EXPR, sizetype, neg_step,
> +                                  seg_len_a_minus_step, size_zero_node);
> +  if (!CONSTANT_CLASS_P (low_offset_a))
> +    low_offset_a = build1 (SAVE_EXPR, sizetype, low_offset_a);
> +
> +  /* We could use COND_EXPR <neg_step, size_zero_node, seg_len_a_minus_step>,
> +     but it's usually more efficient to reuse the LOW_OFFSET_A result.  */
> +  tree high_offset_a = fold_build2 (MINUS_EXPR, sizetype, seg_len_a_minus_step,
> +                                   low_offset_a);
> +
> +  /* The amount added to addr_b - addr_a'.  */
> +  tree bias = fold_build2 (MINUS_EXPR, sizetype,
> +                          size_int (last_chunk_b), low_offset_a);
> +
> +  tree limit = fold_build2 (MINUS_EXPR, sizetype, high_offset_a, low_offset_a);
> +  limit = fold_build2 (PLUS_EXPR, sizetype, limit,
> +                      size_int (last_chunk_a + last_chunk_b));
> +
> +  tree subject = fold_build2 (POINTER_DIFF_EXPR, ssizetype, addr_b, addr_a);
> +  subject = fold_build2 (PLUS_EXPR, sizetype,
> +                        fold_convert (sizetype, subject), bias);
> +
> +  *cond_expr = fold_build2 (GT_EXPR, boolean_type_node, subject, limit);
> +  if (dump_enabled_p ())
> +    dump_printf (MSG_NOTE, "using an address-based WAR/WAW test\n");
>    return true;
>  }
>
> @@ -2036,6 +2241,9 @@ create_intersect_range_checks (class loo
>    if (create_intersect_range_checks_index (loop, cond_expr, alias_pair))
>      return;
>
> +  if (create_waw_or_war_checks (cond_expr, alias_pair))
> +    return;
> +
>    unsigned HOST_WIDE_INT min_align;
>    tree_code cmp_code;
>    /* We don't have to check DR_ALIAS_MIXED_STEPS here, since both versions
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-8.c      2019-11-11 18:32:13.186616541 +0000
> @@ -60,5 +60,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
> -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */
>  /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c     2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-14.c     2019-11-11 18:32:13.186616541 +0000
> @@ -60,5 +60,5 @@ main (void)
>
>  /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
>  /* { dg-final { scan-tree-dump-not {flags: [^\n]*ARBITRARY\n} "vect" } } */
> -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */
>  /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c     2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-15.c     2019-11-11 18:32:13.186616541 +0000
> @@ -57,5 +57,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
> -/* { dg-final { scan-tree-dump "using an address-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump "using an address-based WAR/WAW test" "vect" } } */
>  /* { dg-final { scan-tree-dump-not "using an index-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c     2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-18.c     2019-11-11 18:32:13.186616541 +0000
> @@ -60,5 +60,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *WAR\n} "vect" { target vect_int } } } */
> -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */
>  /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c     2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.dg/vect/vect-alias-check-19.c     2019-11-11 18:32:13.186616541 +0000
> @@ -58,5 +58,5 @@ main (void)
>  }
>
>  /* { dg-final { scan-tree-dump {flags: *WAW\n} "vect" { target vect_int } } } */
> -/* { dg-final { scan-tree-dump "using an index-based overlap test" "vect" } } */
> +/* { dg-final { scan-tree-dump "using an index-based WAR/WAW test" "vect" } } */
>  /* { dg-final { scan-tree-dump-not "using an address-based" "vect" } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c
> ===================================================================
> --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c 2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_1.c 2019-11-11 18:32:13.186616541 +0000
> @@ -15,13 +15,9 @@ f (TYPE *x, TYPE *y, unsigned short n, l
>  /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> -/* Should multiply by (VF-1)*4 rather than (257-1)*4.  */
> -/* { dg-final { scan-assembler-not {, 1024} } } */
> -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
> -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
> -/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
> -/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> -/* { dg-final { scan-assembler-not {\tcsel\tx[0-9]+} } } */
> -/* Two range checks and a check for n being zero.  */
> -/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
> -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> +/* Should use a WAR check that multiplies by (VF-2)*4 rather than
> +   an overlap check that multiplies by (257-1)*4.  */
> +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */
> +/* One range check and a check for n being zero.  */
> +/* { dg-final { scan-assembler-times {\t(?:cmp|tst)\t} 1 } } */
> +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_2.c 2019-11-11 18:32:13.186616541 +0000
> @@ -15,7 +15,7 @@ f (TYPE *x, TYPE *y, unsigned short n, u
>  /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> -/* Should multiply by (257-1)*4 rather than (VF-1)*4.  */
> +/* Should multiply by (257-1)*4 rather than (VF-1)*4 or (VF-2)*4.  */
>  /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x2, 10, 16\n} 1 } } */
>  /* { dg-final { scan-assembler-times {\tubfiz\tx[0-9]+, x3, 10, 16\n} 1 } } */
>  /* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c
> ===================================================================
> --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c 2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_3.c 2019-11-11 18:32:13.186616541 +0000
> @@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, int n, long m __att
>  /* { dg-final { scan-assembler {\tst1w\tz[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tldr\tw[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tstr\tw[0-9]+} } } */
> -/* Should multiply by (VF-1)*4 rather than (257-1)*4.  */
> -/* { dg-final { scan-assembler-not {, 1024} } } */
> -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
> -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]10} } } */
> -/* { dg-final { scan-assembler-not {\tcmp\tx[0-9]+, 0} } } */
> -/* { dg-final { scan-assembler {\tcmp\tw2, 0} } } */
> -/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
> -/* Two range checks and a check for n being zero.  */
> -/* { dg-final { scan-assembler {\tcmp\t} } } */
> -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> +/* Should use a WAR check that multiplies by (VF-2)*4 rather than
> +   an overlap check that multiplies by (257-1)*4.  */
> +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #8\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */
> +/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */
> +/* One range check and a check for n being zero.  */
> +/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
> +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c
> ===================================================================
> --- gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c 2019-11-11 18:32:12.000000000 +0000
> +++ gcc/testsuite/gcc.target/aarch64/sve/var_stride_5.c 2019-11-11 18:32:13.186616541 +0000
> @@ -15,13 +15,10 @@ f (TYPE *x, TYPE *y, long n, long m __at
>  /* { dg-final { scan-assembler {\tst1d\tz[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tldr\td[0-9]+} } } */
>  /* { dg-final { scan-assembler {\tstr\td[0-9]+} } } */
> -/* Should multiply by (VF-1)*8 rather than (257-1)*8.  */
> -/* { dg-final { scan-assembler-not {, 2048} } } */
> -/* { dg-final { scan-assembler-not {\t.bfiz\t} } } */
> -/* { dg-final { scan-assembler-not {lsl[^\n]*[, ]11} } } */
> -/* { dg-final { scan-assembler {\tcmp\tx[0-9]+, 0} } } */
> -/* { dg-final { scan-assembler-not {\tcmp\tw[0-9]+, 0} } } */
> -/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+} 2 } } */
> -/* Two range checks and a check for n being zero.  */
> -/* { dg-final { scan-assembler {\tcmp\t} } } */
> -/* { dg-final { scan-assembler-times {\tccmp\t} 2 } } */
> +/* Should use a WAR check that multiplies by (VF-2)*8 rather than
> +   an overlap check that multiplies by (257-1)*4.  */
> +/* { dg-final { scan-assembler {\tcntb\t(x[0-9]+)\n.*\tsub\tx[0-9]+, \1, #16\n.*\tmul\tx[0-9]+,[^\n]*\1} } } */
> +/* { dg-final { scan-assembler-times {\tcsel\tx[0-9]+[^\n]*xzr} 1 } } */
> +/* One range check and a check for n being zero.  */
> +/* { dg-final { scan-assembler-times {\tcmp\t} 1 } } */
> +/* { dg-final { scan-assembler-times {\tccmp\t} 1 } } */

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2019-11-18 11:04 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-11 18:46 [0/8] Improve vector alias checks for WAR and WAW dependencies Richard Sandiford
2019-11-11 18:47 ` [1/8] Move canonicalisation of dr_with_seg_len_pair_ts Richard Sandiford
2019-11-15 11:01   ` Richard Biener
2019-11-11 18:47 ` [2/8] Delay swapping data refs in prune_runtime_alias_test_list Richard Sandiford
2019-11-15 11:06   ` Richard Biener
2019-11-11 18:48 ` [3/8] Add flags to dr_with_seg_len_pair_t Richard Sandiford
2019-11-15 11:07   ` Richard Biener
2019-11-15 11:37     ` Richard Sandiford
2019-11-15 12:00       ` Richard Biener
2019-11-11 18:49 ` [4/8] Record whether a dr_with_seg_len contains mixed steps Richard Sandiford
2019-11-15 11:08   ` Richard Biener
2019-11-11 18:50 ` [5/8] Dump the list of merged alias pairs Richard Sandiford
2019-11-15 11:08   ` Richard Biener
2019-11-11 18:51 ` [6/8] Print the type of alias check in a dump message Richard Sandiford
2019-11-15 11:09   ` Richard Biener
2019-11-11 18:52 ` [7/8] Use a single comparison for index-based alias checks Richard Sandiford
2019-11-15 11:12   ` Richard Biener
2019-11-11 19:02 ` [8/8] Optimise WAR and WAW " Richard Sandiford
2019-11-18 11:05   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).