public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
@ 2024-06-14 11:49 Tamar Christina
  2024-06-19 11:54 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Tamar Christina @ 2024-06-14 11:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, bin.cheng

[-- Attachment #1: Type: text/plain, Size: 10623 bytes --]

Hi All,

IVOPTS normally uses affine trees to perform comparisons between different IVs,
but these seem to have been missing in two key spots and instead normal tree
equivalencies used.

In some cases where we have a structural equivalence but not a signedness
equivalencies we end up generating both a signed and unsigned IV for the same
candidate.

This happens quite a lot with fortran but can also happen in C because this came
code is unable to figure out when one expression is a multiple of another.

As an example in the attached testcase we get:

Initial set of candidates:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4
inv_expr 2:     (unsigned long) stride.3_27 * 4

These end up being used in the same group:

Group 1:
cand  cost    compl.  inv.expr.       inv.vars
1     0       0       NIL;    6
2     0       0       NIL;    6
3     0       0       NIL;    6

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
  # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)                                                                                                                                                                                   ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                             ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                               # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>                                                                                                                                                                                                                      ivtmp.22_82 = ivtmp.22_41 + 1;                                                                                                                                                                                                                                               ivtmp.26_72 = ivtmp.26_51 + _80;                                                                                                                                                                                                                                             ivtmp.28_98 = ivtmp.28_90 + _39;  

These two IVs are always used as unsigned, so IV ops generates:

  _73 = stride.3_27 * 4;
  _80 = (unsigned long) _73;
  _54 = (unsigned long) stride.3_27;
  _39 = _54 * 4;

Which means that in e.g. exchange2 we generate a lot of duplicate code.

This is because candidate 6 and 8 are structurally equivalent but have different
signs.

This patch changes it so that if you have two IVs that are affine equivalent to
just pick one over the other.  IV already has code for this, so the patch just
uses affine trees instead of tree for the check.

With it we get:

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4

<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  5     0       2       NIL;    NIL;
  6     0       3       NIL;    NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  1     0       0       NIL;    6
  2     0       0       NIL;    6
  3     0       0       NIL;    6
  4     0       0       NIL;    6

Initial set of candidates:
  cost: 16 (complexity 3)
  reg_cost: 6
  cand_cost: 10
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6
   group:0 --> iv_cand:6, cost=(0,3)
   group:1 --> iv_cand:1, cost=(0,0)
  invariant variables: 6
  invariant expressions: 1

The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/114932
	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
	offsets into account.
	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
	(record_group_use): Use it.
	(constant_multiple_of): Also check equality under
	aff_combination_constant_multiple_p.

gcc/testsuite/ChangeLog:

	PR tree-optimization/114932
	* gfortran.dg/addressing-modes_2.f90: New test.

---
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
--- a/gcc/tree-affine.cc
+++ b/gcc/tree-affine.cc
@@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
 				     &mult_set, mult))
     return false;
 
+  /* Everything is a multiple of 0, which means we shoudn't enforce that
+     mult_set is checked, since that forced the only valid multiple of
+     val and div to be 0 whereas 1 is also possible.  */
+  if (known_eq (val->offset, 0)
+      && known_eq (div->offset, 0))
+    mult_set = false;
+
   for (i = 0; i < div->n; i++)
     {
       class aff_comb_elt *elt
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
   return exit;
 }
 
+/* Compares the given affine tree LEFT with the tree expression RIGHT and return
+   whether they are the same under affine equality.  */
+
+static bool
+affine_compare_eq (aff_tree &left, tree right)
+{
+  aff_tree aff_right;
+  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
+  aff_combination_scale (&aff_right, -1);
+  aff_combination_add (&aff_right, &left);
+  return aff_combination_zero_p (&aff_right);
+}
+
 
 /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
    of the arguments of each expression is a constant and that the type of the
@@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
   tree addr_base = NULL;
   struct iv_group *group = NULL;
   poly_uint64 addr_offset = 0;
+  aff_tree iv_step, iv_addr_base;
+
+  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
 
   /* Record non address type use in a new group.  */
   if (address_p (type))
@@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
       tree addr_toffset;
       split_constant_offset (iv->base, &addr_base, &addr_toffset);
       addr_offset = int_cst_value (addr_toffset);
+      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  struct iv_use *use;
@@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && affine_compare_eq (iv_step, use->iv->step)
+	      && affine_compare_eq (iv_addr_base, use->addr_base))
 	    break;
 	}
       if (i == data->vgroups.length ())
@@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
       return true;
     }
 
+  aff_tree aff_top, aff_bot;
+  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
+  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
+  poly_widest_int poly_mul;
+  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
+      && poly_mul.is_constant (mul))
+    return true;
+
   code = TREE_CODE (top);
   switch (code)
     {




-- 

[-- Attachment #2: rb18566.patch --]
[-- Type: text/x-diff, Size: 4281 bytes --]

diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
--- a/gcc/tree-affine.cc
+++ b/gcc/tree-affine.cc
@@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
 				     &mult_set, mult))
     return false;
 
+  /* Everything is a multiple of 0, which means we shoudn't enforce that
+     mult_set is checked, since that forced the only valid multiple of
+     val and div to be 0 whereas 1 is also possible.  */
+  if (known_eq (val->offset, 0)
+      && known_eq (div->offset, 0))
+    mult_set = false;
+
   for (i = 0; i < div->n; i++)
     {
       class aff_comb_elt *elt
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
   return exit;
 }
 
+/* Compares the given affine tree LEFT with the tree expression RIGHT and return
+   whether they are the same under affine equality.  */
+
+static bool
+affine_compare_eq (aff_tree &left, tree right)
+{
+  aff_tree aff_right;
+  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
+  aff_combination_scale (&aff_right, -1);
+  aff_combination_add (&aff_right, &left);
+  return aff_combination_zero_p (&aff_right);
+}
+
 
 /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
    of the arguments of each expression is a constant and that the type of the
@@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
   tree addr_base = NULL;
   struct iv_group *group = NULL;
   poly_uint64 addr_offset = 0;
+  aff_tree iv_step, iv_addr_base;
+
+  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
 
   /* Record non address type use in a new group.  */
   if (address_p (type))
@@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
       tree addr_toffset;
       split_constant_offset (iv->base, &addr_base, &addr_toffset);
       addr_offset = int_cst_value (addr_toffset);
+      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  struct iv_use *use;
@@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && affine_compare_eq (iv_step, use->iv->step)
+	      && affine_compare_eq (iv_addr_base, use->addr_base))
 	    break;
 	}
       if (i == data->vgroups.length ())
@@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
       return true;
     }
 
+  aff_tree aff_top, aff_bot;
+  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
+  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
+  poly_widest_int poly_mul;
+  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
+      && poly_mul.is_constant (mul))
+    return true;
+
   code = TREE_CODE (top);
   switch (code)
     {




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-25 13:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-14 11:49 [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932] Tamar Christina
2024-06-19 11:54 ` Richard Biener
2024-06-19 14:26   ` Tamar Christina
2024-06-19 14:46     ` Michael Matz
2024-06-19 14:58       ` Tamar Christina
2024-06-20  7:48     ` Richard Biener
2024-06-24 13:30       ` Tamar Christina
2024-06-25 13:21         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).