[PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
@ 2024-06-14 11:49 Tamar Christina
  2024-06-19 11:54 ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Tamar Christina @ 2024-06-14 11:49 UTC (permalink / raw)
  To: gcc-patches; +Cc: nd, rguenther, bin.cheng

[-- Attachment #1: Type: text/plain, Size: 10623 bytes --]

Hi All,

IVOPTS normally uses affine trees to perform comparisons between different IVs,
but these seem to have been missing in two key spots and instead normal tree
equivalencies used.

In some cases where we have a structural equivalence but not a signedness
equivalencies we end up generating both a signed and unsigned IV for the same
candidate.

This happens quite a lot with fortran but can also happen in C because this came
code is unable to figure out when one expression is a multiple of another.

As an example in the attached testcase we get:

Initial set of candidates:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4
inv_expr 2:     (unsigned long) stride.3_27 * 4

These end up being used in the same group:

Group 1:
cand  cost    compl.  inv.expr.       inv.vars
1     0       0       NIL;    6
2     0       0       NIL;    6
3     0       0       NIL;    6

which ends up with IV opts picking the signed and unsigned IVs:

Improved to:
  cost: 24 (complexity 3)
  reg_cost: 9
  cand_cost: 15
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6, 8
   group:0 --> iv_cand:6, cost=(0,1)
   group:1 --> iv_cand:1, cost=(0,0)
   group:2 --> iv_cand:8, cost=(0,1)
   group:3 --> iv_cand:8, cost=(0,1)
  invariant variables: 6
  invariant expressions: 1, 2

and so generates the same IV as both signed and unsigned:

;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
  # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
  # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
  # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
  # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>

...

;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)                                                                                                                                                                                   ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                             ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                               # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>                                                                                                                                                                                                                      ivtmp.22_82 = ivtmp.22_41 + 1;                                                                                                                                                                                                                                               ivtmp.26_72 = ivtmp.26_51 + _80;                                                                                                                                                                                                                                             ivtmp.28_98 = ivtmp.28_90 + _39;  

These two IVs are always used as unsigned, so IV ops generates:

  _73 = stride.3_27 * 4;
  _80 = (unsigned long) _73;
  _54 = (unsigned long) stride.3_27;
  _39 = _54 * 4;

Which means that in e.g. exchange2 we generate a lot of duplicate code.

This is because candidate 6 and 8 are structurally equivalent but have different
signs.

This patch changes it so that if you have two IVs that are affine equivalent to
just pick one over the other.  IV already has code for this, so the patch just
uses affine trees instead of tree for the check.

With it we get:

<Invariant Expressions>:
inv_expr 1:     stride.3_27 * 4

<Group-candidate Costs>:
Group 0:
  cand  cost    compl.  inv.expr.       inv.vars
  5     0       2       NIL;    NIL;
  6     0       3       NIL;    NIL;

Group 1:
  cand  cost    compl.  inv.expr.       inv.vars
  1     0       0       NIL;    6
  2     0       0       NIL;    6
  3     0       0       NIL;    6
  4     0       0       NIL;    6

Initial set of candidates:
  cost: 16 (complexity 3)
  reg_cost: 6
  cand_cost: 10
  cand_group_cost: 0 (complexity 3)
  candidates: 1, 6
   group:0 --> iv_cand:6, cost=(0,3)
   group:1 --> iv_cand:1, cost=(0,0)
  invariant variables: 6
  invariant expressions: 1

The two patches together results in a 10% performance increase in exchange2 in
SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
time. There's also a 5% performance improvement in fotonik3d and similar
reduction in binary size.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

	PR tree-optimization/114932
	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
	offsets into account.
	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
	(record_group_use): Use it.
	(constant_multiple_of): Also check equality under
	aff_combination_constant_multiple_p.

gcc/testsuite/ChangeLog:

	PR tree-optimization/114932
	* gfortran.dg/addressing-modes_2.f90: New test.

---
diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
--- a/gcc/tree-affine.cc
+++ b/gcc/tree-affine.cc
@@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
 				     &mult_set, mult))
     return false;
 
+  /* Everything is a multiple of 0, which means we shoudn't enforce that
+     mult_set is checked, since that forced the only valid multiple of
+     val and div to be 0 whereas 1 is also possible.  */
+  if (known_eq (val->offset, 0)
+      && known_eq (div->offset, 0))
+    mult_set = false;
+
   for (i = 0; i < div->n; i++)
     {
       class aff_comb_elt *elt
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
   return exit;
 }
 
+/* Compares the given affine tree LEFT with the tree expression RIGHT and return
+   whether they are the same under affine equality.  */
+
+static bool
+affine_compare_eq (aff_tree &left, tree right)
+{
+  aff_tree aff_right;
+  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
+  aff_combination_scale (&aff_right, -1);
+  aff_combination_add (&aff_right, &left);
+  return aff_combination_zero_p (&aff_right);
+}
+
 
 /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
    of the arguments of each expression is a constant and that the type of the
@@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
   tree addr_base = NULL;
   struct iv_group *group = NULL;
   poly_uint64 addr_offset = 0;
+  aff_tree iv_step, iv_addr_base;
+
+  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
 
   /* Record non address type use in a new group.  */
   if (address_p (type))
@@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
       tree addr_toffset;
       split_constant_offset (iv->base, &addr_base, &addr_toffset);
       addr_offset = int_cst_value (addr_toffset);
+      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  struct iv_use *use;
@@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && affine_compare_eq (iv_step, use->iv->step)
+	      && affine_compare_eq (iv_addr_base, use->addr_base))
 	    break;
 	}
       if (i == data->vgroups.length ())
@@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
       return true;
     }
 
+  aff_tree aff_top, aff_bot;
+  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
+  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
+  poly_widest_int poly_mul;
+  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
+      && poly_mul.is_constant (mul))
+    return true;
+
   code = TREE_CODE (top);
   switch (code)
     {




-- 

[-- Attachment #2: rb18566.patch --]
[-- Type: text/x-diff, Size: 4281 bytes --]

diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
new file mode 100644
index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
@@ -0,0 +1,20 @@
+! { dg-do compile { target aarch64-*-* } }
+! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
+
+module a
+integer, parameter :: b=3, c=b
+contains
+subroutine d(block)
+integer f, col   , block(:, :, :), e
+do f = 1, c
+   do col = 1, c
+             block(:f,                          :, e()) = do
+     end do
+  end do
+  end
+  end
+
+! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
+! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
+
diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
--- a/gcc/tree-affine.cc
+++ b/gcc/tree-affine.cc
@@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
 				     &mult_set, mult))
     return false;
 
+  /* Everything is a multiple of 0, which means we shoudn't enforce that
+     mult_set is checked, since that forced the only valid multiple of
+     val and div to be 0 whereas 1 is also possible.  */
+  if (known_eq (val->offset, 0)
+      && known_eq (div->offset, 0))
+    mult_set = false;
+
   for (i = 0; i < div->n; i++)
     {
       class aff_comb_elt *elt
diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
--- a/gcc/tree-ssa-loop-ivopts.cc
+++ b/gcc/tree-ssa-loop-ivopts.cc
@@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
   return exit;
 }
 
+/* Compares the given affine tree LEFT with the tree expression RIGHT and return
+   whether they are the same under affine equality.  */
+
+static bool
+affine_compare_eq (aff_tree &left, tree right)
+{
+  aff_tree aff_right;
+  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
+  aff_combination_scale (&aff_right, -1);
+  aff_combination_add (&aff_right, &left);
+  return aff_combination_zero_p (&aff_right);
+}
+
 
 /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
    of the arguments of each expression is a constant and that the type of the
@@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
   tree addr_base = NULL;
   struct iv_group *group = NULL;
   poly_uint64 addr_offset = 0;
+  aff_tree iv_step, iv_addr_base;
+
+  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
 
   /* Record non address type use in a new group.  */
   if (address_p (type))
@@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
       tree addr_toffset;
       split_constant_offset (iv->base, &addr_base, &addr_toffset);
       addr_offset = int_cst_value (addr_toffset);
+      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
       for (i = 0; i < data->vgroups.length (); i++)
 	{
 	  struct iv_use *use;
@@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
 
 	  /* Check if it has the same stripped base and step.  */
 	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
-	      && operand_equal_p (iv->step, use->iv->step, 0)
-	      && operand_equal_p (addr_base, use->addr_base, 0))
+	      && affine_compare_eq (iv_step, use->iv->step)
+	      && affine_compare_eq (iv_addr_base, use->addr_base))
 	    break;
 	}
       if (i == data->vgroups.length ())
@@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
       return true;
     }
 
+  aff_tree aff_top, aff_bot;
+  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
+  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
+  poly_widest_int poly_mul;
+  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
+      && poly_mul.is_constant (mul))
+    return true;
+
   code = TREE_CODE (top);
   switch (code)
     {




^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-14 11:49 [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932] Tamar Christina
@ 2024-06-19 11:54 ` Richard Biener
  2024-06-19 14:26   ` Tamar Christina
  0 siblings, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-19 11:54 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, bin.cheng

[-- Attachment #1: Type: text/plain, Size: 11881 bytes --]

On Fri, 14 Jun 2024, Tamar Christina wrote:

> Hi All,
> 
> IVOPTS normally uses affine trees to perform comparisons between different IVs,
> but these seem to have been missing in two key spots and instead normal tree
> equivalencies used.
> 
> In some cases where we have a structural equivalence but not a signedness
> equivalencies we end up generating both a signed and unsigned IV for the same
> candidate.
> 
> This happens quite a lot with fortran but can also happen in C because this came
> code is unable to figure out when one expression is a multiple of another.
> 
> As an example in the attached testcase we get:
> 
> Initial set of candidates:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> inv_expr 2:     (unsigned long) stride.3_27 * 4
> 
> These end up being used in the same group:
> 
> Group 1:
> cand  cost    compl.  inv.expr.       inv.vars
> 1     0       0       NIL;    6
> 2     0       0       NIL;    6
> 3     0       0       NIL;    6
> 
> which ends up with IV opts picking the signed and unsigned IVs:
> 
> Improved to:
>   cost: 24 (complexity 3)
>   reg_cost: 9
>   cand_cost: 15
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6, 8
>    group:0 --> iv_cand:6, cost=(0,1)
>    group:1 --> iv_cand:1, cost=(0,0)
>    group:2 --> iv_cand:8, cost=(0,1)
>    group:3 --> iv_cand:8, cost=(0,1)
>   invariant variables: 6
>   invariant expressions: 1, 2
> 
> and so generates the same IV as both signed and unsigned:
> 
> ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq 58.2545), maybe hot
> ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080) (FALLTHRU,EXECUTABLE)
> ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465) (FALLTHRU,DFS_BACK,EXECUTABLE)
>   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
>   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
>   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
>   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> 
> ...
> 
> ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq 58.2545), maybe hot
> ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909) (FALLTHRU)                                                                                                                                                                                   ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                             ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455) (TRUE_VALUE,EXECUTABLE)                                                                                                                                                               # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>                                                                                                                                    
                                                                                   ivtmp.22_82 = ivtmp.22_41 + 1;                                                                                                                                                                                                                                               ivtmp.26_72 = ivtmp.26_51 + _80;                                                                                                                                                                                                                                             ivtmp.28_98 = ivtmp.28_90 + _39;  
> 
> These two IVs are always used as unsigned, so IV ops generates:
> 
>   _73 = stride.3_27 * 4;
>   _80 = (unsigned long) _73;
>   _54 = (unsigned long) stride.3_27;
>   _39 = _54 * 4;
> 
> Which means that in e.g. exchange2 we generate a lot of duplicate code.
> 
> This is because candidate 6 and 8 are structurally equivalent but have different
> signs.
> 
> This patch changes it so that if you have two IVs that are affine equivalent to
> just pick one over the other.  IV already has code for this, so the patch just
> uses affine trees instead of tree for the check.
> 
> With it we get:
> 
> <Invariant Expressions>:
> inv_expr 1:     stride.3_27 * 4
> 
> <Group-candidate Costs>:
> Group 0:
>   cand  cost    compl.  inv.expr.       inv.vars
>   5     0       2       NIL;    NIL;
>   6     0       3       NIL;    NIL;
> 
> Group 1:
>   cand  cost    compl.  inv.expr.       inv.vars
>   1     0       0       NIL;    6
>   2     0       0       NIL;    6
>   3     0       0       NIL;    6
>   4     0       0       NIL;    6
> 
> Initial set of candidates:
>   cost: 16 (complexity 3)
>   reg_cost: 6
>   cand_cost: 10
>   cand_group_cost: 0 (complexity 3)
>   candidates: 1, 6
>    group:0 --> iv_cand:6, cost=(0,3)
>    group:1 --> iv_cand:1, cost=(0,0)
>   invariant variables: 6
>   invariant expressions: 1
> 
> The two patches together results in a 10% performance increase in exchange2 in
> SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in compile
> time. There's also a 5% performance improvement in fotonik3d and similar
> reduction in binary size.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> 	offsets into account.
> 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> 	(record_group_use): Use it.
> 	(constant_multiple_of): Also check equality under
> 	aff_combination_constant_multiple_p.
> 
> gcc/testsuite/ChangeLog:
> 
> 	PR tree-optimization/114932
> 	* gfortran.dg/addressing-modes_2.f90: New test.
> 
> ---
> diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90 b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> new file mode 100644
> index 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c2e24a4973c8539fae
> --- /dev/null
> +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> @@ -0,0 +1,20 @@
> +! { dg-do compile { target aarch64-*-* } }
> +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> +
> +module a
> +integer, parameter :: b=3, c=b
> +contains
> +subroutine d(block)
> +integer f, col   , block(:, :, :), e
> +do f = 1, c
> +   do col = 1, c
> +             block(:f,                          :, e()) = do
> +     end do
> +  end do
> +  end
> +  end
> +
> +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1 ivopts } }
> +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1 ivopts } }
> +
> diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> index d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb286346890ddeab4096 100644
> --- a/gcc/tree-affine.cc
> +++ b/gcc/tree-affine.cc
> @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val, aff_tree *div,
>  				     &mult_set, mult))
>      return false;
>  
> +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> +     mult_set is checked, since that forced the only valid multiple of
> +     val and div to be 0 whereas 1 is also possible.  */
> +  if (known_eq (val->offset, 0)
> +      && known_eq (div->offset, 0))
> +    mult_set = false;
> +

In fact all numbers are possible?  Shouldn't this be better handled
in wide_int_constant_multiple_p by special-casing
known_eq (div, 0) in the known_eq (val, 0) condition by simply
returning 'true' without checking or setting *mult_set?

The docs of wide_int_constant_multiple_p is odd:

/* If VAL != CST * DIV for any constant CST, returns false.

should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
Or s/any/all/?

>    for (i = 0; i < div->n; i++)
>      {
>        class aff_comb_elt *elt
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a648764cd2facd9ddb4914 100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
>    return exit;
>  }
>  
> +/* Compares the given affine tree LEFT with the tree expression RIGHT and return
> +   whether they are the same under affine equality.  */
> +
> +static bool
> +affine_compare_eq (aff_tree &left, tree right)
> +{
> +  aff_tree aff_right;
> +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> +  aff_combination_scale (&aff_right, -1);
> +  aff_combination_add (&aff_right, &left);
> +  return aff_combination_zero_p (&aff_right);
> +}
> +
>  
>  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
>     of the arguments of each expression is a constant and that the type of the
> @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>    tree addr_base = NULL;
>    struct iv_group *group = NULL;
>    poly_uint64 addr_offset = 0;
> +  aff_tree iv_step, iv_addr_base;
> +
> +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
>  
>    /* Record non address type use in a new group.  */
>    if (address_p (type))
> @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>        tree addr_toffset;
>        split_constant_offset (iv->base, &addr_base, &addr_toffset);
>        addr_offset = int_cst_value (addr_toffset);
> +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base), &iv_addr_base);
>        for (i = 0; i < data->vgroups.length (); i++)
>  	{
>  	  struct iv_use *use;
> @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree *use_p,
>  
>  	  /* Check if it has the same stripped base and step.  */
>  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> -	      && operand_equal_p (iv->step, use->iv->step, 0)
> -	      && operand_equal_p (addr_base, use->addr_base, 0))
> +	      && affine_compare_eq (iv_step, use->iv->step)
> +	      && affine_compare_eq (iv_addr_base, use->addr_base))

There's only this use of addr_base so I think the opportunity is to
turn iv_use->addr_base into aff_tree (even though that's a quite big
representation).

For the testcase, what are the two IVs we are comparing?  I wonder
why you need the affine compare for iv->step?

>  	    break;
>  	}
>        if (i == data->vgroups.length ())
> @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int *mul)
>        return true;
>      }
>  
> +  aff_tree aff_top, aff_bot;
> +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> +  poly_widest_int poly_mul;
> +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> +      && poly_mul.is_constant (mul))
> +    return true;
> +

So why does stripping nops not work here?

>    code = TREE_CODE (top);
>    switch (code)
>      {
> 
> 
> 
> 
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-19 11:54 ` Richard Biener
@ 2024-06-19 14:26   ` Tamar Christina
  2024-06-19 14:46     ` Michael Matz
  2024-06-20  7:48     ` Richard Biener
  0 siblings, 2 replies; 8+ messages in thread
From: Tamar Christina @ 2024-06-19 14:26 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, bin.cheng

> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Wednesday, June 19, 2024 12:55 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> On Fri, 14 Jun 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > IVOPTS normally uses affine trees to perform comparisons between different IVs,
> > but these seem to have been missing in two key spots and instead normal tree
> > equivalencies used.
> >
> > In some cases where we have a structural equivalence but not a signedness
> > equivalencies we end up generating both a signed and unsigned IV for the same
> > candidate.
> >
> > This happens quite a lot with fortran but can also happen in C because this came
> > code is unable to figure out when one expression is a multiple of another.
> >
> > As an example in the attached testcase we get:
> >
> > Initial set of candidates:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> > inv_expr 2:     (unsigned long) stride.3_27 * 4
> >
> > These end up being used in the same group:
> >
> > Group 1:
> > cand  cost    compl.  inv.expr.       inv.vars
> > 1     0       0       NIL;    6
> > 2     0       0       NIL;    6
> > 3     0       0       NIL;    6
> >
> > which ends up with IV opts picking the signed and unsigned IVs:
> >
> > Improved to:
> >   cost: 24 (complexity 3)
> >   reg_cost: 9
> >   cand_cost: 15
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6, 8
> >    group:0 --> iv_cand:6, cost=(0,1)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >    group:2 --> iv_cand:8, cost=(0,1)
> >    group:3 --> iv_cand:8, cost=(0,1)
> >   invariant variables: 6
> >   invariant expressions: 1, 2
> >
> > and so generates the same IV as both signed and unsigned:
> >
> > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> 58.2545), maybe hot
> > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> (FALLTHRU,EXECUTABLE)
> > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> (FALLTHRU,DFS_BACK,EXECUTABLE)
> >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> >
> > ...
> >
> > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> 58.2545), maybe hot
> > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> (FALLTHRU)
> ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182)
> (TRUE_VALUE,EXECUTABLE)
> ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455)
> (TRUE_VALUE,EXECUTABLE)
> # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> 
>                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> ivtmp.26_72 = ivtmp.26_51 + _80;
> ivtmp.28_98 = ivtmp.28_90 + _39;
> >
> > These two IVs are always used as unsigned, so IV ops generates:
> >
> >   _73 = stride.3_27 * 4;
> >   _80 = (unsigned long) _73;
> >   _54 = (unsigned long) stride.3_27;
> >   _39 = _54 * 4;
> >
> > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> >
> > This is because candidate 6 and 8 are structurally equivalent but have different
> > signs.
> >
> > This patch changes it so that if you have two IVs that are affine equivalent to
> > just pick one over the other.  IV already has code for this, so the patch just
> > uses affine trees instead of tree for the check.
> >
> > With it we get:
> >
> > <Invariant Expressions>:
> > inv_expr 1:     stride.3_27 * 4
> >
> > <Group-candidate Costs>:
> > Group 0:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   5     0       2       NIL;    NIL;
> >   6     0       3       NIL;    NIL;
> >
> > Group 1:
> >   cand  cost    compl.  inv.expr.       inv.vars
> >   1     0       0       NIL;    6
> >   2     0       0       NIL;    6
> >   3     0       0       NIL;    6
> >   4     0       0       NIL;    6
> >
> > Initial set of candidates:
> >   cost: 16 (complexity 3)
> >   reg_cost: 6
> >   cand_cost: 10
> >   cand_group_cost: 0 (complexity 3)
> >   candidates: 1, 6
> >    group:0 --> iv_cand:6, cost=(0,3)
> >    group:1 --> iv_cand:1, cost=(0,0)
> >   invariant variables: 6
> >   invariant expressions: 1
> >
> > The two patches together results in a 10% performance increase in exchange2 in
> > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> compile
> > time. There's also a 5% performance improvement in fotonik3d and similar
> > reduction in binary size.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > 	offsets into account.
> > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > 	(record_group_use): Use it.
> > 	(constant_multiple_of): Also check equality under
> > 	aff_combination_constant_multiple_p.
> >
> > gcc/testsuite/ChangeLog:
> >
> > 	PR tree-optimization/114932
> > 	* gfortran.dg/addressing-modes_2.f90: New test.
> >
> > ---
> > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > new file mode 100644
> > index
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> 2e24a4973c8539fae
> > --- /dev/null
> > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > @@ -0,0 +1,20 @@
> > +! { dg-do compile { target aarch64-*-* } }
> > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > +
> > +module a
> > +integer, parameter :: b=3, c=b
> > +contains
> > +subroutine d(block)
> > +integer f, col   , block(:, :, :), e
> > +do f = 1, c
> > +   do col = 1, c
> > +             block(:f,                          :, e()) = do
> > +     end do
> > +  end do
> > +  end
> > +  end
> > +
> > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts
> } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> ivopts } }
> > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> ivopts } }
> > +
> > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > index
> d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> 6346890ddeab4096 100644
> > --- a/gcc/tree-affine.cc
> > +++ b/gcc/tree-affine.cc
> > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val,
> aff_tree *div,
> >  				     &mult_set, mult))
> >      return false;
> >
> > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > +     mult_set is checked, since that forced the only valid multiple of
> > +     val and div to be 0 whereas 1 is also possible.  */
> > +  if (known_eq (val->offset, 0)
> > +      && known_eq (div->offset, 0))
> > +    mult_set = false;
> > +
> 
> In fact all numbers are possible?  Shouldn't this be better handled
> in wide_int_constant_multiple_p by special-casing
> known_eq (div, 0) in the known_eq (val, 0) condition by simply
> returning 'true' without checking or setting *mult_set?
> 
> The docs of wide_int_constant_multiple_p is odd:
> 
> /* If VAL != CST * DIV for any constant CST, returns false.
> 
> should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> Or s/any/all/?
> 

Yeah, that condition would always be false.

> >    for (i = 0; i < div->n; i++)
> >      {
> >        class aff_comb_elt *elt
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index
> 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> 64cd2facd9ddb4914 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> >    return exit;
> >  }
> >
> > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> return
> > +   whether they are the same under affine equality.  */
> > +
> > +static bool
> > +affine_compare_eq (aff_tree &left, tree right)
> > +{
> > +  aff_tree aff_right;
> > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > +  aff_combination_scale (&aff_right, -1);
> > +  aff_combination_add (&aff_right, &left);
> > +  return aff_combination_zero_p (&aff_right);
> > +}
> > +
> >
> >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
> >     of the arguments of each expression is a constant and that the type of the
> > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >    tree addr_base = NULL;
> >    struct iv_group *group = NULL;
> >    poly_uint64 addr_offset = 0;
> > +  aff_tree iv_step, iv_addr_base;
> > +
> > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> >
> >    /* Record non address type use in a new group.  */
> >    if (address_p (type))
> > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >        tree addr_toffset;
> >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> >        addr_offset = int_cst_value (addr_toffset);
> > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> &iv_addr_base);
> >        for (i = 0; i < data->vgroups.length (); i++)
> >  	{
> >  	  struct iv_use *use;
> > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> *use_p,
> >
> >  	  /* Check if it has the same stripped base and step.  */
> >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > +	      && affine_compare_eq (iv_step, use->iv->step)
> > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> 
> There's only this use of addr_base so I think the opportunity is to
> turn iv_use->addr_base into aff_tree (even though that's a quite big
> representation).

Ah, true.  Will do.

> 
> For the testcase, what are the two IVs we are comparing?  I wonder
> why you need the affine compare for iv->step?
> 

Because step builds up the IV expressions and can also be an expression.

In this case:

>>> p debug (iv->step)
((unsigned long) stride.3_27) * 4
>>> p debug (use->iv->step)
(sizetype) (stride.3_27 * 4)

Note that the original expressions were both unsigned, but the STRIP_NOPS
made one signed.   They still wouldn't have compared equal though due to
the different cast locations.

For completeness the base here is

>>> p debug (use->addr_base)
(integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
$9 = void

>>> p debug (addr_base)
(integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
$10 = void
>

> >  	    break;
> >  	}
> >        if (i == data->vgroups.length ())
> > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int
> *mul)
> >        return true;
> >      }
> >
> > +  aff_tree aff_top, aff_bot;
> > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > +  poly_widest_int poly_mul;
> > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > +      && poly_mul.is_constant (mul))
> > +    return true;
> > +
> 
> So why does stripping nops not work here?

So this is where we compare different IV expressions to determine which
IVs compute the same thing and thus can be in the same group.

The STRIP_NOPS don't work because while the incoming types are the same
the casts are different.  So:

>>> p debug (ustep)
(unsigned long) stride.3_27 * 4
$3 = void
>>> p debug (cstep)
(unsigned long) (stride.3_27 * 4)
$4 = void

Which is of course stripped to:

>>> p debug (top)
(unsigned long) stride.3_27 * 4
$1 = void
>>> p debug (bot)
stride.3_27 * 4

Both of these compute the same thing so by doing the affine compare they
end up in the same IV groups and can be costed.  Later during candidate selection
these are the steps we're comparing to see if the candidate is the same invariant.

The full list is:

<IV Groups>:
Group 0:
  Type: REFERENCE ADDRESS
  Use 0.0:
    At stmt:    *_4 = _33;
    At pos:     *_4
    IV struct:
      Type:     integer(kind=4) *
      Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 4)
      Step:     (sizetype) (stride.3_27 * 4)
      Object:   (void *) block.0_26
      Biv:      N
      Overflowness wrto loop niter:     Overflow
  Use 0.1:
    At stmt:    *_78 = _33;
    At pos:     *_78
    IV struct:
      Type:     integer(kind=4) *
      Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 8)
      Step:     (unsigned long) stride.3_27 * 4
      Object:   (void *) block.0_26
      Biv:      N
      Overflowness wrto loop niter:     Overflow
  Use 0.2:
    At stmt:    *_45 = _33;
    At pos:     *_45
    IV struct:
      Type:     integer(kind=4) *
      Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 12)
      Step:     (unsigned long) stride.3_27 * 4
      Object:   (void *) block.0_26
      Biv:      N
      Overflowness wrto loop niter:     Overflow

And all these IVs are the same but with a slightly different base offset.  By doing the affine compare here IV ops sees that
The best candidate is:

Candidate 6:
  Depend on inv.exprs: 1
  Var befor: ivtmp.26_51
  Var after: ivtmp.26_72
  Incr POS: before exit test
  IV struct:
    Type:       unsigned long
    Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26
    Step:       (unsigned long) (stride.3_27 * 4)
    Object:     (void *) block.0_26
    Biv:        N
    Overflowness wrto loop niter:       Overflow

And just builds the three IVs from the same candidate.

Does this cover what you wanted?

Cheers,
Tamar

> 
> >    code = TREE_CODE (top);
> >    switch (code)
> >      {
> >
> >
> >
> >
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-19 14:26   ` Tamar Christina
@ 2024-06-19 14:46     ` Michael Matz
  2024-06-19 14:58       ` Tamar Christina
  2024-06-20  7:48     ` Richard Biener
  1 sibling, 1 reply; 8+ messages in thread
From: Michael Matz @ 2024-06-19 14:46 UTC (permalink / raw)
  To: Tamar Christina; +Cc: Richard Biener, gcc-patches, nd, bin.cheng

Hello,

On Wed, 19 Jun 2024, Tamar Christina wrote:

> So this is where we compare different IV expressions to determine which
> IVs compute the same thing and thus can be in the same group.
> 
> The STRIP_NOPS don't work because while the incoming types are the same
> the casts are different.  So:
> 
> >>> p debug (ustep)
> (unsigned long) stride.3_27 * 4
> $3 = void
> >>> p debug (cstep)
> (unsigned long) (stride.3_27 * 4)
> $4 = void
> 
> Which is of course stripped to:
> 
> >>> p debug (top)
> (unsigned long) stride.3_27 * 4
> $1 = void
> >>> p debug (bot)
> stride.3_27 * 4
> 
> Both of these compute the same thing

In isolation these are _not_ computing the same when strides type is 
smaller than ulong, namely when stride is either negative or larger than 
its max-value/4.  I.e. when comparing IVs not only the overflow behaviour 
for the whole {base,+,step} revolution matters, but also the behaviour on 
the constituent expressions.  (It's possible that stride is known to be 
non-problematic here, I haven't checked.  I was just triggered by the 
claim of same-ness :) )

Ciao,
Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-19 14:46     ` Michael Matz
@ 2024-06-19 14:58       ` Tamar Christina
  0 siblings, 0 replies; 8+ messages in thread
From: Tamar Christina @ 2024-06-19 14:58 UTC (permalink / raw)
  To: Michael Matz; +Cc: Richard Biener, gcc-patches, nd, bin.cheng

> -----Original Message-----
> From: Michael Matz <matz@suse.de>
> Sent: Wednesday, June 19, 2024 3:46 PM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: Richard Biener <rguenther@suse.de>; gcc-patches@gcc.gnu.org; nd
> <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> Hello,
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > So this is where we compare different IV expressions to determine which
> > IVs compute the same thing and thus can be in the same group.
> >
> > The STRIP_NOPS don't work because while the incoming types are the same
> > the casts are different.  So:
> >
> > >>> p debug (ustep)
> > (unsigned long) stride.3_27 * 4
> > $3 = void
> > >>> p debug (cstep)
> > (unsigned long) (stride.3_27 * 4)
> > $4 = void
> >
> > Which is of course stripped to:
> >
> > >>> p debug (top)
> > (unsigned long) stride.3_27 * 4
> > $1 = void
> > >>> p debug (bot)
> > stride.3_27 * 4
> >
> > Both of these compute the same thing
> 
> In isolation these are _not_ computing the same when strides type is
> smaller than ulong, namely when stride is either negative or larger than
> its max-value/4.  I.e. when comparing IVs not only the overflow behaviour
> for the whole {base,+,step} revolution matters, but also the behaviour on
> the constituent expressions.  (It's possible that stride is known to be
> non-problematic here, I haven't checked.  I was just triggered by the
> claim of same-ness :) )

The only use of this method is to determine whether the two expressions
can possibly be the same.  After this IVops forcibly converts them to an
unsigned type though an affine fold in get_computation_aff_1.

So in the end it doesn't care about the sign and uses them all as unsigned.

Tamar
> 
> 
> Ciao,
> Michael.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-19 14:26   ` Tamar Christina
  2024-06-19 14:46     ` Michael Matz
@ 2024-06-20  7:48     ` Richard Biener
  2024-06-24 13:30       ` Tamar Christina
  1 sibling, 1 reply; 8+ messages in thread
From: Richard Biener @ 2024-06-20  7:48 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, bin.cheng

On Wed, 19 Jun 2024, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Wednesday, June 19, 2024 12:55 PM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> > selection [PR114932]
> > 
> > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > IVOPTS normally uses affine trees to perform comparisons between different IVs,
> > > but these seem to have been missing in two key spots and instead normal tree
> > > equivalencies used.
> > >
> > > In some cases where we have a structural equivalence but not a signedness
> > > equivalencies we end up generating both a signed and unsigned IV for the same
> > > candidate.
> > >
> > > This happens quite a lot with fortran but can also happen in C because this came
> > > code is unable to figure out when one expression is a multiple of another.
> > >
> > > As an example in the attached testcase we get:
> > >
> > > Initial set of candidates:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > >
> > > These end up being used in the same group:
> > >
> > > Group 1:
> > > cand  cost    compl.  inv.expr.       inv.vars
> > > 1     0       0       NIL;    6
> > > 2     0       0       NIL;    6
> > > 3     0       0       NIL;    6
> > >
> > > which ends up with IV opts picking the signed and unsigned IVs:
> > >
> > > Improved to:
> > >   cost: 24 (complexity 3)
> > >   reg_cost: 9
> > >   cand_cost: 15
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6, 8
> > >    group:0 --> iv_cand:6, cost=(0,1)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >    group:2 --> iv_cand:8, cost=(0,1)
> > >    group:3 --> iv_cand:8, cost=(0,1)
> > >   invariant variables: 6
> > >   invariant expressions: 1, 2
> > >
> > > and so generates the same IV as both signed and unsigned:
> > >
> > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > 58.2545), maybe hot
> > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > (FALLTHRU,EXECUTABLE)
> > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > >
> > > ...
> > >
> > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > 58.2545), maybe hot
> > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > (FALLTHRU)
> > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq 19.4182)
> > (TRUE_VALUE,EXECUTABLE)
> > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq 12.9455)
> > (TRUE_VALUE,EXECUTABLE)
> > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > 
> >                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> > ivtmp.26_72 = ivtmp.26_51 + _80;
> > ivtmp.28_98 = ivtmp.28_90 + _39;
> > >
> > > These two IVs are always used as unsigned, so IV ops generates:
> > >
> > >   _73 = stride.3_27 * 4;
> > >   _80 = (unsigned long) _73;
> > >   _54 = (unsigned long) stride.3_27;
> > >   _39 = _54 * 4;
> > >
> > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > >
> > > This is because candidate 6 and 8 are structurally equivalent but have different
> > > signs.
> > >
> > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > just pick one over the other.  IV already has code for this, so the patch just
> > > uses affine trees instead of tree for the check.
> > >
> > > With it we get:
> > >
> > > <Invariant Expressions>:
> > > inv_expr 1:     stride.3_27 * 4
> > >
> > > <Group-candidate Costs>:
> > > Group 0:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   5     0       2       NIL;    NIL;
> > >   6     0       3       NIL;    NIL;
> > >
> > > Group 1:
> > >   cand  cost    compl.  inv.expr.       inv.vars
> > >   1     0       0       NIL;    6
> > >   2     0       0       NIL;    6
> > >   3     0       0       NIL;    6
> > >   4     0       0       NIL;    6
> > >
> > > Initial set of candidates:
> > >   cost: 16 (complexity 3)
> > >   reg_cost: 6
> > >   cand_cost: 10
> > >   cand_group_cost: 0 (complexity 3)
> > >   candidates: 1, 6
> > >    group:0 --> iv_cand:6, cost=(0,3)
> > >    group:1 --> iv_cand:1, cost=(0,0)
> > >   invariant variables: 6
> > >   invariant expressions: 1
> > >
> > > The two patches together results in a 10% performance increase in exchange2 in
> > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > compile
> > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > reduction in binary size.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > > 	PR tree-optimization/114932
> > > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > > 	offsets into account.
> > > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > 	(record_group_use): Use it.
> > > 	(constant_multiple_of): Also check equality under
> > > 	aff_combination_constant_multiple_p.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 	PR tree-optimization/114932
> > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > >
> > > ---
> > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > new file mode 100644
> > > index
> > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > 2e24a4973c8539fae
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > @@ -0,0 +1,20 @@
> > > +! { dg-do compile { target aarch64-*-* } }
> > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > +
> > > +module a
> > > +integer, parameter :: b=3, c=b
> > > +contains
> > > +subroutine d(block)
> > > +integer f, col   , block(:, :, :), e
> > > +do f = 1, c
> > > +   do col = 1, c
> > > +             block(:f,                          :, e()) = do
> > > +     end do
> > > +  end do
> > > +  end
> > > +  end
> > > +
> > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:} ivopts
> > } }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:} 1
> > ivopts } }
> > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:} 1
> > ivopts } }
> > > +
> > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > index
> > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > 6346890ddeab4096 100644
> > > --- a/gcc/tree-affine.cc
> > > +++ b/gcc/tree-affine.cc
> > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree *val,
> > aff_tree *div,
> > >  				     &mult_set, mult))
> > >      return false;
> > >
> > > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > > +     mult_set is checked, since that forced the only valid multiple of
> > > +     val and div to be 0 whereas 1 is also possible.  */
> > > +  if (known_eq (val->offset, 0)
> > > +      && known_eq (div->offset, 0))
> > > +    mult_set = false;
> > > +
> > 
> > In fact all numbers are possible?  Shouldn't this be better handled
> > in wide_int_constant_multiple_p by special-casing
> > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > returning 'true' without checking or setting *mult_set?
> > 
> > The docs of wide_int_constant_multiple_p is odd:
> > 
> > /* If VAL != CST * DIV for any constant CST, returns false.
> > 
> > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > Or s/any/all/?
> > 
> 
> Yeah, that condition would always be false.

Can you split out a patch to fix this?

> > >    for (i = 0; i < div->n; i++)
> > >      {
> > >        class aff_comb_elt *elt
> > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > index
> > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > 64cd2facd9ddb4914 100644
> > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > >    return exit;
> > >  }
> > >
> > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> > return
> > > +   whether they are the same under affine equality.  */
> > > +
> > > +static bool
> > > +affine_compare_eq (aff_tree &left, tree right)
> > > +{
> > > +  aff_tree aff_right;
> > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > +  aff_combination_scale (&aff_right, -1);
> > > +  aff_combination_add (&aff_right, &left);
> > > +  return aff_combination_zero_p (&aff_right);
> > > +}
> > > +
> > >
> > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if one
> > >     of the arguments of each expression is a constant and that the type of the
> > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> > >    tree addr_base = NULL;
> > >    struct iv_group *group = NULL;
> > >    poly_uint64 addr_offset = 0;
> > > +  aff_tree iv_step, iv_addr_base;
> > > +
> > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > >
> > >    /* Record non address type use in a new group.  */
> > >    if (address_p (type))
> > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> > >        tree addr_toffset;
> > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > >        addr_offset = int_cst_value (addr_toffset);
> > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > &iv_addr_base);
> > >        for (i = 0; i < data->vgroups.length (); i++)
> > >  	{
> > >  	  struct iv_use *use;
> > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > *use_p,
> > >
> > >  	  /* Check if it has the same stripped base and step.  */
> > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > +	      && affine_compare_eq (iv_step, use->iv->step)
> > > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> > 
> > There's only this use of addr_base so I think the opportunity is to
> > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > representation).
> 
> Ah, true.  Will do.
> 
> > 
> > For the testcase, what are the two IVs we are comparing?  I wonder
> > why you need the affine compare for iv->step?
> > 
> 
> Because step builds up the IV expressions and can also be an expression.
> 
> In this case:
> 
> >>> p debug (iv->step)
> ((unsigned long) stride.3_27) * 4
> >>> p debug (use->iv->step)
> (sizetype) (stride.3_27 * 4)
> 
> Note that the original expressions were both unsigned, but the STRIP_NOPS
> made one signed.   They still wouldn't have compared equal though due to
> the different cast locations.
> 
> For completeness the base here is
> 
> >>> p debug (use->addr_base)
> (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
> $9 = void
> 
> >>> p debug (addr_base)
> (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) - (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype) block.0_26)
> $10 = void
> >
> 
> > >  	    break;
> > >  	}
> > >        if (i == data->vgroups.length ())
> > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot, widest_int
> > *mul)
> > >        return true;
> > >      }
> > >
> > > +  aff_tree aff_top, aff_bot;
> > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > +  poly_widest_int poly_mul;
> > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > > +      && poly_mul.is_constant (mul))
> > > +    return true;
> > > +
> > 
> > So why does stripping nops not work here?
> 
> So this is where we compare different IV expressions to determine which
> IVs compute the same thing and thus can be in the same group.
> 
> The STRIP_NOPS don't work because while the incoming types are the same
> the casts are different.  So:
> 
> >>> p debug (ustep)
> (unsigned long) stride.3_27 * 4
> $3 = void
> >>> p debug (cstep)
> (unsigned long) (stride.3_27 * 4)
> $4 = void
> 
> Which is of course stripped to:
> 
> >>> p debug (top)
> (unsigned long) stride.3_27 * 4
> $1 = void
> >>> p debug (bot)
> stride.3_27 * 4

So we're expecting constant_multiple_of to compute *mul == 1, proving
equality?

The function seems to try stripping ops from TOP until it reaches
an expression equal to BOT and that's what fails to trigger here.

What I originally wondered was iff we compute the affine combinations
why not use only aff_combination_constant_multiple_p?

I might also point back to the idea I threw in somewhere, adding
OEP_VALUE (or a better name) to the set of flags accepted by
operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
touching hashing?

> Both of these compute the same thing so by doing the affine compare they
> end up in the same IV groups and can be costed.  Later during candidate selection
> these are the steps we're comparing to see if the candidate is the same invariant.
> 
> The full list is:
> 
> <IV Groups>:
> Group 0:
>   Type: REFERENCE ADDRESS
>   Use 0.0:
>     At stmt:    *_4 = _33;
>     At pos:     *_4
>     IV struct:
>       Type:     integer(kind=4) *
>       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 4)
>       Step:     (sizetype) (stride.3_27 * 4)
>       Object:   (void *) block.0_26
>       Biv:      N
>       Overflowness wrto loop niter:     Overflow
>   Use 0.1:
>     At stmt:    *_78 = _33;
>     At pos:     *_78
>     IV struct:
>       Type:     integer(kind=4) *
>       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 8)
>       Step:     (unsigned long) stride.3_27 * 4
>       Object:   (void *) block.0_26
>       Biv:      N
>       Overflowness wrto loop niter:     Overflow
>   Use 0.2:
>     At stmt:    *_45 = _33;
>     At pos:     *_45
>     IV struct:
>       Type:     integer(kind=4) *
>       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26) + 12)
>       Step:     (unsigned long) stride.3_27 * 4
>       Object:   (void *) block.0_26
>       Biv:      N
>       Overflowness wrto loop niter:     Overflow
> 
> And all these IVs are the same but with a slightly different base offset.  By doing the affine compare here IV ops sees that
> The best candidate is:
> 
> Candidate 6:
>   Depend on inv.exprs: 1
>   Var befor: ivtmp.26_51
>   Var after: ivtmp.26_72
>   Incr POS: before exit test
>   IV struct:
>     Type:       unsigned long
>     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned long) block.0_26
>     Step:       (unsigned long) (stride.3_27 * 4)
>     Object:     (void *) block.0_26
>     Biv:        N
>     Overflowness wrto loop niter:       Overflow
> 
> And just builds the three IVs from the same candidate.
> 
> Does this cover what you wanted?
> 
> Cheers,
> Tamar
> 
> > 
> > >    code = TREE_CODE (top);
> > >    switch (code)
> > >      {
> > >
> > >
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-20  7:48     ` Richard Biener
@ 2024-06-24 13:30       ` Tamar Christina
  2024-06-25 13:21         ` Richard Biener
  0 siblings, 1 reply; 8+ messages in thread
From: Tamar Christina @ 2024-06-24 13:30 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, nd, bin.cheng



> -----Original Message-----
> From: Richard Biener <rguenther@suse.de>
> Sent: Thursday, June 20, 2024 8:49 AM
> To: Tamar Christina <Tamar.Christina@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> selection [PR114932]
> 
> On Wed, 19 Jun 2024, Tamar Christina wrote:
> 
> > > -----Original Message-----
> > > From: Richard Biener <rguenther@suse.de>
> > > Sent: Wednesday, June 19, 2024 12:55 PM
> > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> bin.cheng@linux.alibaba.com
> > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
> candidate
> > > selection [PR114932]
> > >
> > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > >
> > > > Hi All,
> > > >
> > > > IVOPTS normally uses affine trees to perform comparisons between different
> IVs,
> > > > but these seem to have been missing in two key spots and instead normal
> tree
> > > > equivalencies used.
> > > >
> > > > In some cases where we have a structural equivalence but not a signedness
> > > > equivalencies we end up generating both a signed and unsigned IV for the
> same
> > > > candidate.
> > > >
> > > > This happens quite a lot with fortran but can also happen in C because this
> came
> > > > code is unable to figure out when one expression is a multiple of another.
> > > >
> > > > As an example in the attached testcase we get:
> > > >
> > > > Initial set of candidates:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > > >
> > > > These end up being used in the same group:
> > > >
> > > > Group 1:
> > > > cand  cost    compl.  inv.expr.       inv.vars
> > > > 1     0       0       NIL;    6
> > > > 2     0       0       NIL;    6
> > > > 3     0       0       NIL;    6
> > > >
> > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > >
> > > > Improved to:
> > > >   cost: 24 (complexity 3)
> > > >   reg_cost: 9
> > > >   cand_cost: 15
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6, 8
> > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1, 2
> > > >
> > > > and so generates the same IV as both signed and unsigned:
> > > >
> > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > > 58.2545), maybe hot
> > > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > > (FALLTHRU,EXECUTABLE)
> > > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > > >
> > > > ...
> > > >
> > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > > 58.2545), maybe hot
> > > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > > (FALLTHRU)
> > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> 19.4182)
> > > (TRUE_VALUE,EXECUTABLE)
> > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> 12.9455)
> > > (TRUE_VALUE,EXECUTABLE)
> > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > >
> > >                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > >
> > > > These two IVs are always used as unsigned, so IV ops generates:
> > > >
> > > >   _73 = stride.3_27 * 4;
> > > >   _80 = (unsigned long) _73;
> > > >   _54 = (unsigned long) stride.3_27;
> > > >   _39 = _54 * 4;
> > > >
> > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > >
> > > > This is because candidate 6 and 8 are structurally equivalent but have
> different
> > > > signs.
> > > >
> > > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > > just pick one over the other.  IV already has code for this, so the patch just
> > > > uses affine trees instead of tree for the check.
> > > >
> > > > With it we get:
> > > >
> > > > <Invariant Expressions>:
> > > > inv_expr 1:     stride.3_27 * 4
> > > >
> > > > <Group-candidate Costs>:
> > > > Group 0:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   5     0       2       NIL;    NIL;
> > > >   6     0       3       NIL;    NIL;
> > > >
> > > > Group 1:
> > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > >   1     0       0       NIL;    6
> > > >   2     0       0       NIL;    6
> > > >   3     0       0       NIL;    6
> > > >   4     0       0       NIL;    6
> > > >
> > > > Initial set of candidates:
> > > >   cost: 16 (complexity 3)
> > > >   reg_cost: 6
> > > >   cand_cost: 10
> > > >   cand_group_cost: 0 (complexity 3)
> > > >   candidates: 1, 6
> > > >    group:0 --> iv_cand:6, cost=(0,3)
> > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > >   invariant variables: 6
> > > >   invariant expressions: 1
> > > >
> > > > The two patches together results in a 10% performance increase in exchange2
> in
> > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > compile
> > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > reduction in binary size.
> > > >
> > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > >
> > > > Ok for master?
> > > >
> > > > Thanks,
> > > > Tamar
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 	PR tree-optimization/114932
> > > > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > > > 	offsets into account.
> > > > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > > 	(record_group_use): Use it.
> > > > 	(constant_multiple_of): Also check equality under
> > > > 	aff_combination_constant_multiple_p.
> > > >
> > > > gcc/testsuite/ChangeLog:
> > > >
> > > > 	PR tree-optimization/114932
> > > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > > >
> > > > ---
> > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > new file mode 100644
> > > > index
> > >
> 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > 2e24a4973c8539fae
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > @@ -0,0 +1,20 @@
> > > > +! { dg-do compile { target aarch64-*-* } }
> > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > > +
> > > > +module a
> > > > +integer, parameter :: b=3, c=b
> > > > +contains
> > > > +subroutine d(block)
> > > > +integer f, col   , block(:, :, :), e
> > > > +do f = 1, c
> > > > +   do col = 1, c
> > > > +             block(:f,                          :, e()) = do
> > > > +     end do
> > > > +  end do
> > > > +  end
> > > > +  end
> > > > +
> > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:}
> ivopts
> > > } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:}
> 1
> > > ivopts } }
> > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:}
> 1
> > > ivopts } }
> > > > +
> > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > index
> > >
> d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > > 6346890ddeab4096 100644
> > > > --- a/gcc/tree-affine.cc
> > > > +++ b/gcc/tree-affine.cc
> > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree
> *val,
> > > aff_tree *div,
> > > >  				     &mult_set, mult))
> > > >      return false;
> > > >
> > > > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > > > +     mult_set is checked, since that forced the only valid multiple of
> > > > +     val and div to be 0 whereas 1 is also possible.  */
> > > > +  if (known_eq (val->offset, 0)
> > > > +      && known_eq (div->offset, 0))
> > > > +    mult_set = false;
> > > > +
> > >
> > > In fact all numbers are possible?  Shouldn't this be better handled
> > > in wide_int_constant_multiple_p by special-casing
> > > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > > returning 'true' without checking or setting *mult_set?
> > >
> > > The docs of wide_int_constant_multiple_p is odd:
> > >
> > > /* If VAL != CST * DIV for any constant CST, returns false.
> > >
> > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > > Or s/any/all/?
> > >
> >
> > Yeah, that condition would always be false.
> 
> Can you split out a patch to fix this?

Doing now.

> 
> > > >    for (i = 0; i < div->n; i++)
> > > >      {
> > > >        class aff_comb_elt *elt
> > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > > index
> > >
> 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > > 64cd2facd9ddb4914 100644
> > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > > >    return exit;
> > > >  }
> > > >
> > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> > > return
> > > > +   whether they are the same under affine equality.  */
> > > > +
> > > > +static bool
> > > > +affine_compare_eq (aff_tree &left, tree right)
> > > > +{
> > > > +  aff_tree aff_right;
> > > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > > +  aff_combination_scale (&aff_right, -1);
> > > > +  aff_combination_add (&aff_right, &left);
> > > > +  return aff_combination_zero_p (&aff_right);
> > > > +}
> > > > +
> > > >
> > > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if
> one
> > > >     of the arguments of each expression is a constant and that the type of the
> > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >    tree addr_base = NULL;
> > > >    struct iv_group *group = NULL;
> > > >    poly_uint64 addr_offset = 0;
> > > > +  aff_tree iv_step, iv_addr_base;
> > > > +
> > > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > > >
> > > >    /* Record non address type use in a new group.  */
> > > >    if (address_p (type))
> > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >        tree addr_toffset;
> > > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > > >        addr_offset = int_cst_value (addr_toffset);
> > > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > > &iv_addr_base);
> > > >        for (i = 0; i < data->vgroups.length (); i++)
> > > >  	{
> > > >  	  struct iv_use *use;
> > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > > *use_p,
> > > >
> > > >  	  /* Check if it has the same stripped base and step.  */
> > > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > > +	      && affine_compare_eq (iv_step, use->iv->step)
> > > > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> > >
> > > There's only this use of addr_base so I think the opportunity is to
> > > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > > representation).
> >
> > Ah, true.  Will do.
> >
> > >
> > > For the testcase, what are the two IVs we are comparing?  I wonder
> > > why you need the affine compare for iv->step?
> > >
> >
> > Because step builds up the IV expressions and can also be an expression.
> >
> > In this case:
> >
> > >>> p debug (iv->step)
> > ((unsigned long) stride.3_27) * 4
> > >>> p debug (use->iv->step)
> > (sizetype) (stride.3_27 * 4)
> >
> > Note that the original expressions were both unsigned, but the STRIP_NOPS
> > made one signed.   They still wouldn't have compared equal though due to
> > the different cast locations.
> >
> > For completeness the base here is
> >
> > >>> p debug (use->addr_base)
> > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> block.0_26)
> > $9 = void
> >
> > >>> p debug (addr_base)
> > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> block.0_26)
> > $10 = void
> > >
> >
> > > >  	    break;
> > > >  	}
> > > >        if (i == data->vgroups.length ())
> > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot,
> widest_int
> > > *mul)
> > > >        return true;
> > > >      }
> > > >
> > > > +  aff_tree aff_top, aff_bot;
> > > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > > +  poly_widest_int poly_mul;
> > > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > > > +      && poly_mul.is_constant (mul))
> > > > +    return true;
> > > > +
> > >
> > > So why does stripping nops not work here?
> >
> > So this is where we compare different IV expressions to determine which
> > IVs compute the same thing and thus can be in the same group.
> >
> > The STRIP_NOPS don't work because while the incoming types are the same
> > the casts are different.  So:
> >
> > >>> p debug (ustep)
> > (unsigned long) stride.3_27 * 4
> > $3 = void
> > >>> p debug (cstep)
> > (unsigned long) (stride.3_27 * 4)
> > $4 = void
> >
> > Which is of course stripped to:
> >
> > >>> p debug (top)
> > (unsigned long) stride.3_27 * 4
> > $1 = void
> > >>> p debug (bot)
> > stride.3_27 * 4
> 
> So we're expecting constant_multiple_of to compute *mul == 1, proving
> equality?
> 

Indeed

> The function seems to try stripping ops from TOP until it reaches
> an expression equal to BOT and that's what fails to trigger here.
> 
> What I originally wondered was iff we compute the affine combinations
> why not use only aff_combination_constant_multiple_p?
> 

I think that's probably easier, The rest of the code seems to indeed be
repeating the work of aff_combination_constant_multiple_p.

I can try replacing the whole thing with aff_combination_constant_multiple_p
and see?

> I might also point back to the idea I threw in somewhere, adding
> OEP_VALUE (or a better name) to the set of flags accepted by
> operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> touching hashing?
> 

Yes, That can indeed be done with this approach.  The hashing was that before I
was trying to prevent the "duplicate" IV expressions from being created in the
first place by modifying get_loop_invariant_expr.

This function looks up if we have already seen a particular IV expression and if
we have it just returns that expression.  However after reading more of the code
I realized this wasn't the right approach, as without also dealing with the candidates
we'd end up creating IV expression that can't be handled by any candidate.

IVops would just give up then.   Reading the code it seems that get_loop_invariant_expr
is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as the same.

This is also why I think that everywhere else *has* to continue stripping the expression.

On a note from Richard S that he thought IVopts already had some code to deal with
expressions that differ only in signs led me to take a different approach.

The goal wasn't to remove the different sign/unsigned IV expressions, but instead get
Then to be servable by the same candidates. i.e. we want them in the same candidate
groups and then candidate optimization will just do its thing.

That seemed a more natural fit to how it worked.

Do you want me to try the operand_equal_p approach? Though in this case the issue
is we not only need to know they're equal, but also need to know the scale factor.

get_computation_aff_1 scales the common type IV by the scale we determined,
so I think operand_equal_p would not be very useful here.  But it does look like
constant_multiple_of can just be implemented with aff_combination_constant_multiple_p.

Should I try?

Thanks,
Tamar


> > Both of these compute the same thing so by doing the affine compare they
> > end up in the same IV groups and can be costed.  Later during candidate selection
> > these are the steps we're comparing to see if the candidate is the same invariant.
> >
> > The full list is:
> >
> > <IV Groups>:
> > Group 0:
> >   Type: REFERENCE ADDRESS
> >   Use 0.0:
> >     At stmt:    *_4 = _33;
> >     At pos:     *_4
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 4)
> >       Step:     (sizetype) (stride.3_27 * 4)
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >   Use 0.1:
> >     At stmt:    *_78 = _33;
> >     At pos:     *_78
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 8)
> >       Step:     (unsigned long) stride.3_27 * 4
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >   Use 0.2:
> >     At stmt:    *_45 = _33;
> >     At pos:     *_45
> >     IV struct:
> >       Type:     integer(kind=4) *
> >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> _36) * 4 + (unsigned long) block.0_26) + 12)
> >       Step:     (unsigned long) stride.3_27 * 4
> >       Object:   (void *) block.0_26
> >       Biv:      N
> >       Overflowness wrto loop niter:     Overflow
> >
> > And all these IVs are the same but with a slightly different base offset.  By doing
> the affine compare here IV ops sees that
> > The best candidate is:
> >
> > Candidate 6:
> >   Depend on inv.exprs: 1
> >   Var befor: ivtmp.26_51
> >   Var after: ivtmp.26_72
> >   Incr POS: before exit test
> >   IV struct:
> >     Type:       unsigned long
> >     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned
> long) block.0_26
> >     Step:       (unsigned long) (stride.3_27 * 4)
> >     Object:     (void *) block.0_26
> >     Biv:        N
> >     Overflowness wrto loop niter:       Overflow
> >
> > And just builds the three IVs from the same candidate.
> >
> > Does this cover what you wanted?
> >
> > Cheers,
> > Tamar
> >
> > >
> > > >    code = TREE_CODE (top);
> > > >    switch (code)
> > > >      {
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> > > --
> > > Richard Biener <rguenther@suse.de>
> > > SUSE Software Solutions Germany GmbH,
> > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> Nuernberg)
> >
> 
> --
> Richard Biener <rguenther@suse.de>
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932]
  2024-06-24 13:30       ` Tamar Christina
@ 2024-06-25 13:21         ` Richard Biener
  0 siblings, 0 replies; 8+ messages in thread
From: Richard Biener @ 2024-06-25 13:21 UTC (permalink / raw)
  To: Tamar Christina; +Cc: gcc-patches, nd, bin.cheng

On Mon, 24 Jun 2024, Tamar Christina wrote:

> 
> 
> > -----Original Message-----
> > From: Richard Biener <rguenther@suse.de>
> > Sent: Thursday, June 20, 2024 8:49 AM
> > To: Tamar Christina <Tamar.Christina@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; bin.cheng@linux.alibaba.com
> > Subject: RE: [PATCH][ivopts]: use affine_tree when comparing IVs during candidate
> > selection [PR114932]
> > 
> > On Wed, 19 Jun 2024, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguenther@suse.de>
> > > > Sent: Wednesday, June 19, 2024 12:55 PM
> > > > To: Tamar Christina <Tamar.Christina@arm.com>
> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>;
> > bin.cheng@linux.alibaba.com
> > > > Subject: Re: [PATCH][ivopts]: use affine_tree when comparing IVs during
> > candidate
> > > > selection [PR114932]
> > > >
> > > > On Fri, 14 Jun 2024, Tamar Christina wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > IVOPTS normally uses affine trees to perform comparisons between different
> > IVs,
> > > > > but these seem to have been missing in two key spots and instead normal
> > tree
> > > > > equivalencies used.
> > > > >
> > > > > In some cases where we have a structural equivalence but not a signedness
> > > > > equivalencies we end up generating both a signed and unsigned IV for the
> > same
> > > > > candidate.
> > > > >
> > > > > This happens quite a lot with fortran but can also happen in C because this
> > came
> > > > > code is unable to figure out when one expression is a multiple of another.
> > > > >
> > > > > As an example in the attached testcase we get:
> > > > >
> > > > > Initial set of candidates:
> > > > >   cost: 24 (complexity 3)
> > > > >   reg_cost: 9
> > > > >   cand_cost: 15
> > > > >   cand_group_cost: 0 (complexity 3)
> > > > >   candidates: 1, 6, 8
> > > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > > >   invariant variables: 6
> > > > >   invariant expressions: 1, 2
> > > > >
> > > > > <Invariant Expressions>:
> > > > > inv_expr 1:     stride.3_27 * 4
> > > > > inv_expr 2:     (unsigned long) stride.3_27 * 4
> > > > >
> > > > > These end up being used in the same group:
> > > > >
> > > > > Group 1:
> > > > > cand  cost    compl.  inv.expr.       inv.vars
> > > > > 1     0       0       NIL;    6
> > > > > 2     0       0       NIL;    6
> > > > > 3     0       0       NIL;    6
> > > > >
> > > > > which ends up with IV opts picking the signed and unsigned IVs:
> > > > >
> > > > > Improved to:
> > > > >   cost: 24 (complexity 3)
> > > > >   reg_cost: 9
> > > > >   cand_cost: 15
> > > > >   cand_group_cost: 0 (complexity 3)
> > > > >   candidates: 1, 6, 8
> > > > >    group:0 --> iv_cand:6, cost=(0,1)
> > > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > > >    group:2 --> iv_cand:8, cost=(0,1)
> > > > >    group:3 --> iv_cand:8, cost=(0,1)
> > > > >   invariant variables: 6
> > > > >   invariant expressions: 1, 2
> > > > >
> > > > > and so generates the same IV as both signed and unsigned:
> > > > >
> > > > > ;;   basic block 21, loop depth 3, count 214748368 (estimated locally, freq
> > > > 58.2545), maybe hot
> > > > > ;;    prev block 28, next block 31, flags: (NEW, REACHABLE, VISITED)
> > > > > ;;    pred:       28 [always]  count:23622320 (estimated locally, freq 6.4080)
> > > > (FALLTHRU,EXECUTABLE)
> > > > > ;;                25 [always]  count:191126046 (estimated locally, freq 51.8465)
> > > > (FALLTHRU,DFS_BACK,EXECUTABLE)
> > > > >   # .MEM_66 = PHI <.MEM_34(28), .MEM_22(25)>
> > > > >   # ivtmp.22_41 = PHI <0(28), ivtmp.22_82(25)>
> > > > >   # ivtmp.26_51 = PHI <ivtmp.26_55(28), ivtmp.26_72(25)>
> > > > >   # ivtmp.28_90 = PHI <ivtmp.28_99(28), ivtmp.28_98(25)>
> > > > >
> > > > > ...
> > > > >
> > > > > ;;   basic block 24, loop depth 3, count 214748366 (estimated locally, freq
> > > > 58.2545), maybe hot
> > > > > ;;    prev block 22, next block 25, flags: (NEW, REACHABLE, VISITED)'
> > > > > ;;    pred:       22 [always]  count:95443719 (estimated locally, freq 25.8909)
> > > > (FALLTHRU)
> > > > ;;                21 [33.3% (guessed)]  count:71582790 (estimated locally, freq
> > 19.4182)
> > > > (TRUE_VALUE,EXECUTABLE)
> > > > ;;                31 [33.3% (guessed)]  count:47721860 (estimated locally, freq
> > 12.9455)
> > > > (TRUE_VALUE,EXECUTABLE)
> > > > # .MEM_22 = PHI <.MEM_44(22), .MEM_31(21), .MEM_79(31)>
> > > >
> > > >                                                                                    ivtmp.22_82 = ivtmp.22_41 + 1;
> > > > ivtmp.26_72 = ivtmp.26_51 + _80;
> > > > ivtmp.28_98 = ivtmp.28_90 + _39;
> > > > >
> > > > > These two IVs are always used as unsigned, so IV ops generates:
> > > > >
> > > > >   _73 = stride.3_27 * 4;
> > > > >   _80 = (unsigned long) _73;
> > > > >   _54 = (unsigned long) stride.3_27;
> > > > >   _39 = _54 * 4;
> > > > >
> > > > > Which means that in e.g. exchange2 we generate a lot of duplicate code.
> > > > >
> > > > > This is because candidate 6 and 8 are structurally equivalent but have
> > different
> > > > > signs.
> > > > >
> > > > > This patch changes it so that if you have two IVs that are affine equivalent to
> > > > > just pick one over the other.  IV already has code for this, so the patch just
> > > > > uses affine trees instead of tree for the check.
> > > > >
> > > > > With it we get:
> > > > >
> > > > > <Invariant Expressions>:
> > > > > inv_expr 1:     stride.3_27 * 4
> > > > >
> > > > > <Group-candidate Costs>:
> > > > > Group 0:
> > > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > > >   5     0       2       NIL;    NIL;
> > > > >   6     0       3       NIL;    NIL;
> > > > >
> > > > > Group 1:
> > > > >   cand  cost    compl.  inv.expr.       inv.vars
> > > > >   1     0       0       NIL;    6
> > > > >   2     0       0       NIL;    6
> > > > >   3     0       0       NIL;    6
> > > > >   4     0       0       NIL;    6
> > > > >
> > > > > Initial set of candidates:
> > > > >   cost: 16 (complexity 3)
> > > > >   reg_cost: 6
> > > > >   cand_cost: 10
> > > > >   cand_group_cost: 0 (complexity 3)
> > > > >   candidates: 1, 6
> > > > >    group:0 --> iv_cand:6, cost=(0,3)
> > > > >    group:1 --> iv_cand:1, cost=(0,0)
> > > > >   invariant variables: 6
> > > > >   invariant expressions: 1
> > > > >
> > > > > The two patches together results in a 10% performance increase in exchange2
> > in
> > > > > SPECCPU 2017 and a 4% reduction in binary size and a 5% improvement in
> > > > compile
> > > > > time. There's also a 5% performance improvement in fotonik3d and similar
> > > > > reduction in binary size.
> > > > >
> > > > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > > > >
> > > > > Ok for master?
> > > > >
> > > > > Thanks,
> > > > > Tamar
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 	PR tree-optimization/114932
> > > > > 	* tree-affine.cc (aff_combination_constant_multiple_p): Take zero
> > > > > 	offsets into account.
> > > > > 	* tree-ssa-loop-ivopts.cc (affine_compare_eq): New.
> > > > > 	(record_group_use): Use it.
> > > > > 	(constant_multiple_of): Also check equality under
> > > > > 	aff_combination_constant_multiple_p.
> > > > >
> > > > > gcc/testsuite/ChangeLog:
> > > > >
> > > > > 	PR tree-optimization/114932
> > > > > 	* gfortran.dg/addressing-modes_2.f90: New test.
> > > > >
> > > > > ---
> > > > > diff --git a/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > > new file mode 100644
> > > > > index
> > > >
> > 0000000000000000000000000000000000000000..8eee4be3dc4d69fecfacd4c
> > > > 2e24a4973c8539fae
> > > > > --- /dev/null
> > > > > +++ b/gcc/testsuite/gfortran.dg/addressing-modes_2.f90
> > > > > @@ -0,0 +1,20 @@
> > > > > +! { dg-do compile { target aarch64-*-* } }
> > > > > +! { dg-additional-options "-w -Ofast -fdump-tree-ivopts-all" }
> > > > > +
> > > > > +module a
> > > > > +integer, parameter :: b=3, c=b
> > > > > +contains
> > > > > +subroutine d(block)
> > > > > +integer f, col   , block(:, :, :), e
> > > > > +do f = 1, c
> > > > > +   do col = 1, c
> > > > > +             block(:f,                          :, e()) = do
> > > > > +     end do
> > > > > +  end do
> > > > > +  end
> > > > > +  end
> > > > > +
> > > > > +! { dg-final { scan-tree-dump-not {Selected IV set for loop .+ niters, 3 IVs:}
> > ivopts
> > > > } }
> > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 2 IVs:}
> > 1
> > > > ivopts } }
> > > > > +! { dg-final { scan-tree-dump-times {Selected IV set for loop .+ niters, 1 IVs:}
> > 1
> > > > ivopts } }
> > > > > +
> > > > > diff --git a/gcc/tree-affine.cc b/gcc/tree-affine.cc
> > > > > index
> > > >
> > d6309c4390362b680f0aa97a41fac3281ade66fd..b141bf23c1bbea1001b1bb28
> > > > 6346890ddeab4096 100644
> > > > > --- a/gcc/tree-affine.cc
> > > > > +++ b/gcc/tree-affine.cc
> > > > > @@ -941,6 +941,13 @@ aff_combination_constant_multiple_p (aff_tree
> > *val,
> > > > aff_tree *div,
> > > > >  				     &mult_set, mult))
> > > > >      return false;
> > > > >
> > > > > +  /* Everything is a multiple of 0, which means we shoudn't enforce that
> > > > > +     mult_set is checked, since that forced the only valid multiple of
> > > > > +     val and div to be 0 whereas 1 is also possible.  */
> > > > > +  if (known_eq (val->offset, 0)
> > > > > +      && known_eq (div->offset, 0))
> > > > > +    mult_set = false;
> > > > > +
> > > >
> > > > In fact all numbers are possible?  Shouldn't this be better handled
> > > > in wide_int_constant_multiple_p by special-casing
> > > > known_eq (div, 0) in the known_eq (val, 0) condition by simply
> > > > returning 'true' without checking or setting *mult_set?
> > > >
> > > > The docs of wide_int_constant_multiple_p is odd:
> > > >
> > > > /* If VAL != CST * DIV for any constant CST, returns false.
> > > >
> > > > should that be 'If VAL == CST * DIV for no constant CST, returns false.'?
> > > > Or s/any/all/?
> > > >
> > >
> > > Yeah, that condition would always be false.
> > 
> > Can you split out a patch to fix this?
> 
> Doing now.
> 
> > 
> > > > >    for (i = 0; i < div->n; i++)
> > > > >      {
> > > > >        class aff_comb_elt *elt
> > > > > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > > > > index
> > > >
> > 7a277aaf18a9e0a32b8ac0d23332b7cd9945ef98..b11bd62a86092ba972a6487
> > > > 64cd2facd9ddb4914 100644
> > > > > --- a/gcc/tree-ssa-loop-ivopts.cc
> > > > > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > > > > @@ -757,6 +757,19 @@ single_dom_exit (class loop *loop)
> > > > >    return exit;
> > > > >  }
> > > > >
> > > > > +/* Compares the given affine tree LEFT with the tree expression RIGHT and
> > > > return
> > > > > +   whether they are the same under affine equality.  */
> > > > > +
> > > > > +static bool
> > > > > +affine_compare_eq (aff_tree &left, tree right)
> > > > > +{
> > > > > +  aff_tree aff_right;
> > > > > +  tree_to_aff_combination (right, TREE_TYPE (right), &aff_right);
> > > > > +  aff_combination_scale (&aff_right, -1);
> > > > > +  aff_combination_add (&aff_right, &left);
> > > > > +  return aff_combination_zero_p (&aff_right);
> > > > > +}
> > > > > +
> > > > >
> > > > >  /* Given a nested expression in ARG consisting of PLUS or MULT try to see if
> > one
> > > > >     of the arguments of each expression is a constant and that the type of the
> > > > > @@ -1673,6 +1686,9 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > > >    tree addr_base = NULL;
> > > > >    struct iv_group *group = NULL;
> > > > >    poly_uint64 addr_offset = 0;
> > > > > +  aff_tree iv_step, iv_addr_base;
> > > > > +
> > > > > +  tree_to_aff_combination (iv->step, TREE_TYPE (iv->step), &iv_step);
> > > > >
> > > > >    /* Record non address type use in a new group.  */
> > > > >    if (address_p (type))
> > > > > @@ -1683,6 +1699,7 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > > >        tree addr_toffset;
> > > > >        split_constant_offset (iv->base, &addr_base, &addr_toffset);
> > > > >        addr_offset = int_cst_value (addr_toffset);
> > > > > +      tree_to_aff_combination (addr_base, TREE_TYPE (addr_base),
> > > > &iv_addr_base);
> > > > >        for (i = 0; i < data->vgroups.length (); i++)
> > > > >  	{
> > > > >  	  struct iv_use *use;
> > > > > @@ -1694,8 +1711,8 @@ record_group_use (struct ivopts_data *data, tree
> > > > *use_p,
> > > > >
> > > > >  	  /* Check if it has the same stripped base and step.  */
> > > > >  	  if (operand_equal_p (iv->base_object, use->iv->base_object, 0)
> > > > > -	      && operand_equal_p (iv->step, use->iv->step, 0)
> > > > > -	      && operand_equal_p (addr_base, use->addr_base, 0))
> > > > > +	      && affine_compare_eq (iv_step, use->iv->step)
> > > > > +	      && affine_compare_eq (iv_addr_base, use->addr_base))
> > > >
> > > > There's only this use of addr_base so I think the opportunity is to
> > > > turn iv_use->addr_base into aff_tree (even though that's a quite big
> > > > representation).
> > >
> > > Ah, true.  Will do.
> > >
> > > >
> > > > For the testcase, what are the two IVs we are comparing?  I wonder
> > > > why you need the affine compare for iv->step?
> > > >
> > >
> > > Because step builds up the IV expressions and can also be an expression.
> > >
> > > In this case:
> > >
> > > >>> p debug (iv->step)
> > > ((unsigned long) stride.3_27) * 4
> > > >>> p debug (use->iv->step)
> > > (sizetype) (stride.3_27 * 4)
> > >
> > > Note that the original expressions were both unsigned, but the STRIP_NOPS
> > > made one signed.   They still wouldn't have compared equal though due to
> > > the different cast locations.
> > >
> > > For completeness the base here is
> > >
> > > >>> p debug (use->addr_base)
> > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> > block.0_26)
> > > $9 = void
> > >
> > > >>> p debug (addr_base)
> > > (integer(kind=4) *) ((((((sizetype) prephitmp_64 - (sizetype) stride.3_27) -
> > (sizetype) stride.5_29) + (sizetype) _7) + (sizetype) stride.3_27) * 4 + (sizetype)
> > block.0_26)
> > > $10 = void
> > > >
> > >
> > > > >  	    break;
> > > > >  	}
> > > > >        if (i == data->vgroups.length ())
> > > > > @@ -2231,6 +2248,14 @@ constant_multiple_of (tree top, tree bot,
> > widest_int
> > > > *mul)
> > > > >        return true;
> > > > >      }
> > > > >
> > > > > +  aff_tree aff_top, aff_bot;
> > > > > +  tree_to_aff_combination (top, TREE_TYPE (top), &aff_top);
> > > > > +  tree_to_aff_combination (bot, TREE_TYPE (bot), &aff_bot);
> > > > > +  poly_widest_int poly_mul;
> > > > > +  if (aff_combination_constant_multiple_p (&aff_top, &aff_bot, &poly_mul)
> > > > > +      && poly_mul.is_constant (mul))
> > > > > +    return true;
> > > > > +
> > > >
> > > > So why does stripping nops not work here?
> > >
> > > So this is where we compare different IV expressions to determine which
> > > IVs compute the same thing and thus can be in the same group.
> > >
> > > The STRIP_NOPS don't work because while the incoming types are the same
> > > the casts are different.  So:
> > >
> > > >>> p debug (ustep)
> > > (unsigned long) stride.3_27 * 4
> > > $3 = void
> > > >>> p debug (cstep)
> > > (unsigned long) (stride.3_27 * 4)
> > > $4 = void
> > >
> > > Which is of course stripped to:
> > >
> > > >>> p debug (top)
> > > (unsigned long) stride.3_27 * 4
> > > $1 = void
> > > >>> p debug (bot)
> > > stride.3_27 * 4
> > 
> > So we're expecting constant_multiple_of to compute *mul == 1, proving
> > equality?
> > 
> 
> Indeed
> 
> > The function seems to try stripping ops from TOP until it reaches
> > an expression equal to BOT and that's what fails to trigger here.
> > 
> > What I originally wondered was iff we compute the affine combinations
> > why not use only aff_combination_constant_multiple_p?
> > 
> 
> I think that's probably easier, The rest of the code seems to indeed be
> repeating the work of aff_combination_constant_multiple_p.
> 
> I can try replacing the whole thing with aff_combination_constant_multiple_p
> and see?

Yes.

> > I might also point back to the idea I threw in somewhere, adding
> > OEP_VALUE (or a better name) to the set of flags accepted by
> > operand_equal_p.  You mentioned hashing IIRC but I don't see the patches
> > touching hashing?
> > 
> 
> Yes, That can indeed be done with this approach.  The hashing was that before I
> was trying to prevent the "duplicate" IV expressions from being created in the
> first place by modifying get_loop_invariant_expr.
> 
> This function looks up if we have already seen a particular IV expression and if
> we have it just returns that expression.  However after reading more of the code
> I realized this wasn't the right approach, as without also dealing with the candidates
> we'd end up creating IV expression that can't be handled by any candidate.
> 
> IVops would just give up then.   Reading the code it seems that get_loop_invariant_expr
> is just there to prevent blatant duplicates.  i.e. it treats `(signed) a` and `a` as the same.
> 
> This is also why I think that everywhere else *has* to continue stripping the expression.
> 
> On a note from Richard S that he thought IVopts already had some code to deal with
> expressions that differ only in signs led me to take a different approach.
> 
> The goal wasn't to remove the different sign/unsigned IV expressions, but instead get
> Then to be servable by the same candidates. i.e. we want them in the same candidate
> groups and then candidate optimization will just do its thing.
> 
> That seemed a more natural fit to how it worked.

Yeah, I agree that sounds like the better strathegy.

> Do you want me to try the operand_equal_p approach? Though in this case the issue
> is we not only need to know they're equal, but also need to know the scale factor.

For this case yes, but if you'd keep the code as-is, the equal with scale
factor one case would be fixed.  Not a case with different scale factors
though - but conversions "elsewhere" should be handled via the stripping.
So it would work to simply adjust the operand_equal_p check here?

> get_computation_aff_1 scales the common type IV by the scale we determined,
> so I think operand_equal_p would not be very useful here.  But it does look like
> constant_multiple_of can just be implemented with aff_combination_constant_multiple_p.
> 
> Should I try?

You've had the other places where you replace operand_equal_p with
affine-compute and compare.  As said that has some associated cost
as well as a limit on the number of elements after which it resorts
back to operand_equal_p.  So for strict equality tests implementing
a weaker operand_equal_p might be a better solution.

Richard.

> Thanks,
> Tamar
> 
> 
> > > Both of these compute the same thing so by doing the affine compare they
> > > end up in the same IV groups and can be costed.  Later during candidate selection
> > > these are the steps we're comparing to see if the candidate is the same invariant.
> > >
> > > The full list is:
> > >
> > > <IV Groups>:
> > > Group 0:
> > >   Type: REFERENCE ADDRESS
> > >   Use 0.0:
> > >     At stmt:    *_4 = _33;
> > >     At pos:     *_4
> > >     IV struct:
> > >       Type:     integer(kind=4) *
> > >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> > _36) * 4 + (unsigned long) block.0_26) + 4)
> > >       Step:     (sizetype) (stride.3_27 * 4)
> > >       Object:   (void *) block.0_26
> > >       Biv:      N
> > >       Overflowness wrto loop niter:     Overflow
> > >   Use 0.1:
> > >     At stmt:    *_78 = _33;
> > >     At pos:     *_78
> > >     IV struct:
> > >       Type:     integer(kind=4) *
> > >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> > _36) * 4 + (unsigned long) block.0_26) + 8)
> > >       Step:     (unsigned long) stride.3_27 * 4
> > >       Object:   (void *) block.0_26
> > >       Biv:      N
> > >       Overflowness wrto loop niter:     Overflow
> > >   Use 0.2:
> > >     At stmt:    *_45 = _33;
> > >     At pos:     *_45
> > >     IV struct:
> > >       Type:     integer(kind=4) *
> > >       Base:     (integer(kind=4) *) ((((unsigned long) stride.3_27 + (unsigned long)
> > _36) * 4 + (unsigned long) block.0_26) + 12)
> > >       Step:     (unsigned long) stride.3_27 * 4
> > >       Object:   (void *) block.0_26
> > >       Biv:      N
> > >       Overflowness wrto loop niter:     Overflow
> > >
> > > And all these IVs are the same but with a slightly different base offset.  By doing
> > the affine compare here IV ops sees that
> > > The best candidate is:
> > >
> > > Candidate 6:
> > >   Depend on inv.exprs: 1
> > >   Var befor: ivtmp.26_51
> > >   Var after: ivtmp.26_72
> > >   Incr POS: before exit test
> > >   IV struct:
> > >     Type:       unsigned long
> > >     Base:       ((unsigned long) stride.3_27 + (unsigned long) _36) * 4 + (unsigned
> > long) block.0_26
> > >     Step:       (unsigned long) (stride.3_27 * 4)
> > >     Object:     (void *) block.0_26
> > >     Biv:        N
> > >     Overflowness wrto loop niter:       Overflow
> > >
> > > And just builds the three IVs from the same candidate.
> > >
> > > Does this cover what you wanted?
> > >
> > > Cheers,
> > > Tamar
> > >
> > > >
> > > > >    code = TREE_CODE (top);
> > > > >    switch (code)
> > > > >      {
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguenther@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguenther@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-06-25 13:21 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-14 11:49 [PATCH][ivopts]: use affine_tree when comparing IVs during candidate selection [PR114932] Tamar Christina
2024-06-19 11:54 ` Richard Biener
2024-06-19 14:26   ` Tamar Christina
2024-06-19 14:46     ` Michael Matz
2024-06-19 14:58       ` Tamar Christina
2024-06-20  7:48     ` Richard Biener
2024-06-24 13:30       ` Tamar Christina
2024-06-25 13:21         ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).