public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH] Fix PR46556 (poor address generation)
@ 2011-10-05 16:16 William J. Schmidt
  2011-10-05 16:24 ` Paolo Bonzini
                   ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: William J. Schmidt @ 2011-10-05 16:16 UTC (permalink / raw)
  To: gcc-patches; +Cc: bergner, rguenth

This patch addresses the poor code generation in PR46556 for the
following code:

struct x
{
  int a[16];
  int b[16];
  int c[16];
};

extern void foo (int, int, int);

void
f (struct x *p, unsigned int n)
{
  foo (p->a[n], p->c[n], p->b[n]);
}

Prior to the fix for PR32698, gcc calculated the offset for accessing
the array elements as:  n*4; 64+n*4; 128+n*4.

Following that fix, the offsets are calculated as:  n*4; (n+16)*4; (n
+32)*4.  This led to poor code generation on powerpc64 targets, among
others.

The poor code generation was observed to not occur in loops, as the
IVOPTS code does a good job of lowering these expressions to MEM_REFs.
It was previously suggested that perhaps a general pass to lower memory
accesses to MEM_REFs in GIMPLE would solve not only this, but other
similar problems.  I spent some time looking into various approaches to
this, and reviewing some previous attempts to do similar things.  In the
end, I've concluded that this is a bad idea in practice because of the
loss of useful aliasing information.  In particular, early lowering of
component references causes us to lose the ability to disambiguate
non-overlapping references in the same structure, and there is no simple
way to carry the necessary aliasing information along with the
replacement MEM_REFs to avoid this.  While some performance gains are
available with GIMPLE lowering of memory accesses, there are also
offsetting performance losses, and I suspect this would just be a
continuous source of bug reports into the future.

Therefore the current patch is a much simpler approach to solve the
specific problem noted in the PR.  There are two pieces to the patch:

 * The offending addressing pattern is matched in GIMPLE and transformed
into a restructured MEM_REF that distributes the multiply, so that (n
+32)*4 becomes 4*n+128 as before.  This is done during the reassociation
pass, for reasons described below.  The transformation only occurs in
non-loop blocks, since IVOPTS does a good job on such things within
loops.
 * A tweak is added to the RTL forward-propagator to avoid propagating
into memory references based on a single base register with no offset,
under certain circumstances.  This improves sharing of base registers
for accesses within the same structure and slightly lowers register
pressure.

It would be possible to separate these into two patches if that's
preferred.  I chose to combine them because together they provide the
ideal code generation that the new test cases test for.

I initially implemented the pattern matcher during expand, but I found
that the expanded code for two accesses to the same structure was often
not being CSEd well.  So I moved it back into the GIMPLE phases prior to
DOM to take advantage of its CSE.  To avoid an additional complete pass
over the IL, I chose to piggyback on the reassociation pass.  This
transformation is not technically a reassociation, but it is related
enough to not be a complete wart.

One noob question about this:  It would probably be preferable to have
this transformation only take place during the second reassociation
pass, so the ARRAY_REFs are seen by earlier optimization phases.  Is
there an easy way to detect that it's the second pass without having to
generate a separate pass entry point?

One other general question about the pattern-match transformation:  Is
this an appropriate transformation for all targets, or should it be
somehow gated on available addressing modes on the target processor?

Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
performance degradations on that target for SPEC CPU2000 and CPU2006.

I'm looking for eventual approval for trunk after any comments are
resolved.  Thanks!

Bill


2011-10-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:
	
	PR rtl-optimization/46556
	* fwprop.c (fwprop_bb_aux_d): New struct.
	(MEM_PLUS_REGS): New macro.
	(record_mem_plus_reg): New function.
	(record_mem_plus_regs): Likewise.
	(single_def_use_enter_block): Record mem-plus-reg patterns.
	(build_single_def_use_links): Allocate aux storage.
	(locally_poor_mem_replacement): New function.
	(forward_propagate_and_simplify): Call
	locally_poor_mem_replacement.
	(fwprop_init): Free storage.
	* tree.h (copy_ref_info): Expose existing function.
	* tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
	(restructure_mem_ref): Likewise.
	(reassociate_bb): Look for opportunities to call
	restructure_mem_ref; clean up immediate use lists.

gcc/testsuite:
	
	PR rtl-optimization/46556
	* gcc.target/powerpc/ppc-pr46556-1.c: New testcase.
	* gcc.target/powerpc/ppc-pr46556-2.c: Likewise.
	* gcc.target/powerpc/ppc-pr46556-3.c: Likewise.
	* gcc.target/powerpc/ppc-pr46556-4.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-1.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.


Index: gcc/fwprop.c
===================================================================
--- gcc/fwprop.c	(revision 179317)
+++ gcc/fwprop.c	(working copy)
@@ -131,6 +131,15 @@ static VEC(df_ref,heap) *reg_defs_stack;
 static bitmap local_md;
 static bitmap local_lr;
 
+/* Auxiliary information for each block for this pass.  */
+typedef struct GTY(()) fwprop_bb_aux_d
+{
+  /* Registers appearing in (mem (plus (reg ... patterns in this block.  */
+  bitmap mem_plus_regs;
+} fwprop_bb_aux, *fwprop_bb_aux_t;
+
+#define MEM_PLUS_REGS(BB) ((fwprop_bb_aux_t) ((BB)->aux))->mem_plus_regs
+
 /* Return the only def in USE's use-def chain, or NULL if there is
    more than one def in the chain.  */
 
@@ -212,7 +221,43 @@ process_uses (df_ref *use_rec, int top_flag)
 }
 
 
+/* Record whether EXPR in block BB is of the form (mem (plus (reg ...  */
 static void
+record_mem_plus_reg (basic_block bb, rtx expr)
+{
+  rtx addr, reg;
+
+  if (GET_CODE (expr) != MEM)
+    return;
+
+  addr = XEXP (expr, 0);
+  if (GET_CODE (addr) != PLUS)
+    return;
+
+  reg = XEXP (addr, 0);
+  if (!REG_P (reg))
+    return;
+
+  (void)bitmap_set_bit (MEM_PLUS_REGS (bb), REGNO (reg));
+}
+
+
+/* Record whether INSN in block BB contains any patterns of the form
+   (mem (plus (reg ...  */
+static void
+record_mem_plus_regs (basic_block bb, rtx insn)
+{
+  rtx insn_set = single_set (insn);
+
+  if (insn_set)
+    {
+      record_mem_plus_reg (bb, SET_DEST (insn_set));
+      record_mem_plus_reg (bb, SET_SRC (insn_set));
+    }
+}
+
+
+static void
 single_def_use_enter_block (struct dom_walk_data *walk_data ATTRIBUTE_UNUSED,
 			    basic_block bb)
 {
@@ -230,6 +275,8 @@ single_def_use_enter_block (struct dom_walk_data *
   process_uses (df_get_artificial_uses (bb_index), DF_REF_AT_TOP);
   process_defs (df_get_artificial_defs (bb_index), DF_REF_AT_TOP);
 
+  MEM_PLUS_REGS (bb) = BITMAP_ALLOC (NULL);
+
   /* We don't call df_simulate_initialize_forwards, as it may overestimate
      the live registers if there are unused artificial defs.  We prefer
      liveness to be underestimated.  */
@@ -242,8 +289,15 @@ single_def_use_enter_block (struct dom_walk_data *
         process_uses (DF_INSN_UID_EQ_USES (uid), 0);
         process_defs (DF_INSN_UID_DEFS (uid), 0);
 	df_simulate_one_insn_forwards (bb, insn, local_lr);
+	record_mem_plus_regs (bb, insn);
       }
 
+  if (dump_file)
+    {
+      fprintf (dump_file, "mem_plus_regs (%d): ", bb->index);
+      dump_bitmap (dump_file, MEM_PLUS_REGS (bb));
+    }
+
   process_uses (df_get_artificial_uses (bb_index), 0);
   process_defs (df_get_artificial_defs (bb_index), 0);
 }
@@ -295,6 +349,8 @@ build_single_def_use_links (void)
   local_md = BITMAP_ALLOC (NULL);
   local_lr = BITMAP_ALLOC (NULL);
 
+  alloc_aux_for_blocks (sizeof (fwprop_bb_aux));
+
   /* Walk the dominator tree looking for single reaching definitions
      dominating the uses.  This is similar to how SSA form is built.  */
   walk_data.dom_direction = CDI_DOMINATORS;
@@ -1207,6 +1263,69 @@ forward_propagate_asm (df_ref use, rtx def_insn, r
   return true;
 }
 
+/* We are proposing to replace a USE of REG in USE_SET.  Determine
+   whether we have a situation where two storage uses of REG occur
+   in the same block as follows.  Each use can be either a memory
+   store or a memory load.
+
+     ... (mem (reg REG))
+
+     ... (mem (plus (reg REG)
+                    (...)))
+
+   The problem is that it will look profitable to propagate something
+   like
+
+     (set (reg REG)
+          (plus (reg X)
+                (reg Y)))
+
+   into the first use, but not into the second one.  This breaks a 
+   CSE opportunity and raises register pressure by extending the
+   lifetimes of X and Y.  To avoid this, don't propagate into the
+   first use when this very specific situation arises.  */
+static bool
+locally_poor_mem_replacement (df_ref use, rtx use_set, rtx reg, rtx def_set)
+{
+  rtx rhs, addend, mem, base;
+  unsigned i;
+
+  /* We're only concerned if the RHS of def_set is the sum of two
+     registers.  */
+  rhs = SET_SRC (def_set);
+
+  if (GET_CODE (rhs) != PLUS)
+    return false;
+
+  for (i = 0; i < 2; i++)
+    {
+      addend = XEXP (rhs, i);
+      if (!REG_P (addend))
+	return false;
+    }
+
+  /* The use must have a single SET and be used in addressing.  */
+  if (!use_set || !DF_REF_REG_MEM_P (use))
+    return false;
+
+  if (DF_REF_REG_MEM_STORE_P (use))
+    mem = SET_DEST (use_set);
+  else
+    mem = SET_SRC (use_set);
+
+  if (GET_CODE (mem) != MEM)
+    return false;
+
+  /* We are specifically interested in the base address.  */
+  base = XEXP (mem, 0);
+  if (reg != base)
+    return false;
+
+  /* We've found a use of the first form.  Check whether uses of the
+     second form exist in the same block.  If found, don't replace.  */
+  return bitmap_bit_p (MEM_PLUS_REGS (DF_REF_BB (use)), DF_REF_REGNO (use));
+}
+
 /* Try to replace USE with SRC (defined in DEF_INSN) and simplify the
    result.  */
 
@@ -1273,6 +1392,11 @@ forward_propagate_and_simplify (df_ref use, rtx de
       return false;
     }
 
+  /* Check for a specific unprofitable propagation into a memory
+     reference, where the propagation will break a CSE opportunity.  */
+  if (locally_poor_mem_replacement (use, use_set, reg, def_set))
+    return false;
+
   if (asm_use >= 0)
     return forward_propagate_asm (use, def_insn, def_set, reg);
 
@@ -1403,6 +1527,12 @@ fwprop_init (void)
 static void
 fwprop_done (void)
 {
+  basic_block bb;
+
+  FOR_EACH_BB (bb)
+    BITMAP_FREE (MEM_PLUS_REGS (bb));
+  free_aux_for_blocks ();
+
   loop_optimizer_finalize ();
 
   VEC_free (df_ref, heap, use_def_ref);
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 179317)
+++ gcc/tree.h	(working copy)
@@ -5775,6 +5775,9 @@ tree target_for_debug_bind (tree);
 extern tree tree_mem_ref_addr (tree, tree);
 extern void copy_mem_ref_info (tree, tree);
 
+/* In tree-ssa-loop-ivopts.c.  */
+extern void copy_ref_info (tree, tree);
+
 /* In tree-vrp.c */
 extern bool ssa_name_nonnegative_p (const_tree);
 
Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-1.c	(revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-assembler-times "lw.*3," 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*4,128\\(.*\\)" 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*5,64\\(.*\\)" 1 { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-2.c	(revision 0)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.target/powerpc/ppc-pr46556-3.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-assembler-times "lw.*3,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
+/* { dg-final { scan-assembler-times "lw.*4,0\\(.*\\)" 1 { target powerpc*-*-* } } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "reassoc2" } } */
+/* { dg-final { cleanup-tree-dump "reassoc2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */
+/* { dg-final { cleanup-tree-dump "reassoc2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "reassoc2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "reassoc2" } } */
+/* { dg-final { cleanup-tree-dump "reassoc2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-fwprop1" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem.\*:.\* \\(reg.\*p_1\\(D\\)->a" 1 "fwprop1" } } */
+/* { dg-final { cleanup-rtl-dump "fwprop1" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 179317)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -6280,7 +6280,7 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da
 
 /* Copies the reference information from OLD_REF to NEW_REF.  */
 
-static void
+void
 copy_ref_info (tree new_ref, tree old_ref)
 {
   tree new_ptr_base = NULL_TREE;
Index: gcc/tree-ssa-reassoc.c
===================================================================
--- gcc/tree-ssa-reassoc.c	(revision 179317)
+++ gcc/tree-ssa-reassoc.c	(working copy)
@@ -2361,6 +2361,116 @@ break_up_subtract_bb (basic_block bb)
     break_up_subtract_bb (son);
 }
 
+/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
+   determine whether there is a profitable opportunity to restructure
+   address arithmetic within BASE and OFFSET.  If so, produce such
+   a restructuring in *MEM_REF and return true; else return false.  */
+
+static tree
+restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,
+			     tree base, tree offset, HOST_WIDE_INT bitpos)
+{
+  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
+  HOST_WIDE_INT c, c1, c2, c3, c4;
+
+  /* Look for the following pattern:
+
+       base   = MEM_REF (T1, C1);
+       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+       bitpos = C4 * BITS_PER_UNIT
+
+     If found, convert it to:
+
+       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                C1 + (C2 * C3) + C4)
+   */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
+  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
+  c3 = TREE_INT_CST_LOW (mult_op1);
+  c4 = bitpos / BITS_PER_UNIT;
+  c = c1 + c2 * c3 + c4;
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+  
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2,
+			   build_int_cst (sizetype, c3));
+  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
+					true, GSI_SAME_STMT);
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
+				       true, GSI_SAME_STMT);
+  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
+			 build_int_cst (offset_type, c));
+  return mem_ref;
+}
+
+/* If *EXPR contains a memory reference, determine whether there is an
+   opportunity to restructure the base and offset expressions for
+   improved performance.  */
+
+static bool
+restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
+{
+  enum tree_code code = TREE_CODE (*expr);
+  tree base, offset, mem_ref;
+  HOST_WIDE_INT bitsize, bitpos;
+  enum machine_mode mode;
+  int unsignedp, volatilep;
+
+  /* Only expressions that reference memory are of interest.  Bitfield
+     references should eventually be lowered prior to this point and
+     are not dealt with.  */
+  if (! handled_component_p (*expr)
+      || code == BIT_FIELD_REF
+      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1))))
+    return false;
+
+  /* Determine whether the expression can be represented with base and
+     offset components.  */
+  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
+			      &unsignedp, &volatilep, false);
+  if (!base || !offset)
+    return false;
+
+  /* Look for a restructuring opportunity.  */
+  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
+					      offset, bitpos)) == NULL_TREE)
+    return false;
+
+  /* Found one.  Record the replacement in the dump file and complete
+     the replacement.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nOriginal_expr = ");
+      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\nmem_ref = ");
+      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\n\n");
+    }
+
+  copy_ref_info (mem_ref, *expr);
+  *expr = mem_ref;
+
+  return true;
+}
+
 /* Reassociate expressions in basic block BB and its post-dominator as
    children.  */
 
@@ -2369,14 +2479,30 @@ reassociate_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
+  bool chgd_mem_ref = false;
+  bool in_loop = loop_depth (bb->loop_father) != 0;
 
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       gimple stmt = gsi_stmt (gsi);
 
-      if (is_gimple_assign (stmt)
-	  && !stmt_could_throw_p (stmt))
+      /* Look for restructuring opportunities within an expression
+	 that references memory.  We only do this for blocks not
+         contained in loops, since the ivopts machinery does a 
+         good job on loop expressions, and we don't want to interfere
+	 with other loop optimizations.  */
+      if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt))
 	{
+	  tree *lhs, *rhs;
+	  lhs = gimple_assign_lhs_ptr (stmt);
+	  chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref;
+	  rhs = gimple_assign_rhs1_ptr (stmt);
+	  chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref;
+	}
+
+      else if (is_gimple_assign (stmt)
+	       && !stmt_could_throw_p (stmt))
+	{
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 
@@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb)
 	    }
 	}
     }
+  /* If memory references have been restructured, immediate uses need
+     to be cleaned up.  */
+  if (chgd_mem_ref)
+    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+      update_stmt (gsi_stmt (gsi));
+
   for (son = first_dom_son (CDI_POST_DOMINATORS, bb);
        son;
        son = next_dom_son (CDI_POST_DOMINATORS, son))


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 16:16 [PATCH] Fix PR46556 (poor address generation) William J. Schmidt
@ 2011-10-05 16:24 ` Paolo Bonzini
  2011-10-05 16:28   ` Paolo Bonzini
  2011-10-05 17:48   ` William J. Schmidt
  2011-10-05 16:41 ` Steven Bosscher
  2011-10-06 10:21 ` Richard Guenther
  2 siblings, 2 replies; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-05 16:24 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

On 10/05/2011 06:13 PM, William J. Schmidt wrote:
> One other general question about the pattern-match transformation:  Is
> this an appropriate transformation for all targets, or should it be
> somehow gated on available addressing modes on the target processor?
>
> Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
> performance degradations on that target for SPEC CPU2000 and CPU2006.
>
> I'm looking for eventual approval for trunk after any comments are
> resolved.  Thanks!

How do the costs look like for the two transforms you mention in the 
head comment of locally_poor_mem_replacement?

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 16:24 ` Paolo Bonzini
@ 2011-10-05 16:28   ` Paolo Bonzini
  2011-10-05 17:48   ` William J. Schmidt
  1 sibling, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-05 16:28 UTC (permalink / raw)
  To: gcc-patches; +Cc: gcc-patches, bergner, rguenth

On 10/05/2011 06:13 PM, William J. Schmidt wrote:
> One other general question about the pattern-match transformation:  Is
> this an appropriate transformation for all targets, or should it be
> somehow gated on available addressing modes on the target processor?
>
> Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
> performance degradations on that target for SPEC CPU2000 and CPU2006.
>
> I'm looking for eventual approval for trunk after any comments are
> resolved.  Thanks!

How do the costs look like for the two transforms you mention in the 
head comment of locally_poor_mem_replacement?

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 16:16 [PATCH] Fix PR46556 (poor address generation) William J. Schmidt
  2011-10-05 16:24 ` Paolo Bonzini
@ 2011-10-05 16:41 ` Steven Bosscher
  2011-10-05 17:19   ` William J. Schmidt
  2011-10-06 10:21 ` Richard Guenther
  2 siblings, 1 reply; 40+ messages in thread
From: Steven Bosscher @ 2011-10-05 16:41 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
>        * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.

Rather than this, why not move the function to common code somewhere?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 16:41 ` Steven Bosscher
@ 2011-10-05 17:19   ` William J. Schmidt
  0 siblings, 0 replies; 40+ messages in thread
From: William J. Schmidt @ 2011-10-05 17:19 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc-patches, bergner, rguenth

On Wed, 2011-10-05 at 18:29 +0200, Steven Bosscher wrote:
> On Wed, Oct 5, 2011 at 6:13 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:
> >        * tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
> 
> Rather than this, why not move the function to common code somewhere?
> 
> Ciao!
> Steven

An alternative would be to move it into tree-ssa-address.c, where there
is already a simpler version called copy_mem_ref_info.  I'm open to that
if it's preferable.

Bill

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 16:24 ` Paolo Bonzini
  2011-10-05 16:28   ` Paolo Bonzini
@ 2011-10-05 17:48   ` William J. Schmidt
  2011-10-05 19:50     ` Paolo Bonzini
  1 sibling, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-05 17:48 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches, bergner, rguenth

On Wed, 2011-10-05 at 18:21 +0200, Paolo Bonzini wrote:
> On 10/05/2011 06:13 PM, William J. Schmidt wrote:
> > One other general question about the pattern-match transformation:  Is
> > this an appropriate transformation for all targets, or should it be
> > somehow gated on available addressing modes on the target processor?
> >
> > Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
> > performance degradations on that target for SPEC CPU2000 and CPU2006.
> >
> > I'm looking for eventual approval for trunk after any comments are
> > resolved.  Thanks!
> 
> How do the costs look like for the two transforms you mention in the 
> head comment of locally_poor_mem_replacement?
> 
> Paolo
> 

I don't know off the top of my head -- I'll have to gather that
information.  The issue is that the profitability is really
context-sensitive, so just the isolated costs of insns aren't enough.
The forward propagation of the add into (mem (reg REG)) looks like a
slam dunk in the absence of other information, but if there are other
nearby references using nonzero offsets from REG, this just extends the
lifetimes of X and Y without eliminating the need for REG.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 17:48   ` William J. Schmidt
@ 2011-10-05 19:50     ` Paolo Bonzini
  2011-10-05 21:04       ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-05 19:50 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

On 10/05/2011 07:22 PM, William J. Schmidt wrote:
> I don't know off the top of my head -- I'll have to gather that
> information.  The issue is that the profitability is really
> context-sensitive, so just the isolated costs of insns aren't enough.
> The forward propagation of the add into (mem (reg REG)) looks like a
> slam dunk in the absence of other information, but if there are other
> nearby references using nonzero offsets from REG, this just extends the
> lifetimes of X and Y without eliminating the need for REG.

True, however there are other passes that do this kind of un-CSE and 
lifetime reduction.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 19:50     ` Paolo Bonzini
@ 2011-10-05 21:04       ` William J. Schmidt
  2011-10-06  8:57         ` Paolo Bonzini
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-05 21:04 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches, bergner, rguenth

On Wed, 2011-10-05 at 21:01 +0200, Paolo Bonzini wrote:
> On 10/05/2011 07:22 PM, William J. Schmidt wrote:
> > I don't know off the top of my head -- I'll have to gather that
> > information.  The issue is that the profitability is really
> > context-sensitive, so just the isolated costs of insns aren't enough.
> > The forward propagation of the add into (mem (reg REG)) looks like a
> > slam dunk in the absence of other information, but if there are other
> > nearby references using nonzero offsets from REG, this just extends the
> > lifetimes of X and Y without eliminating the need for REG.
> 
> True, however there are other passes that do this kind of un-CSE and 
> lifetime reduction.
> 
> Paolo

OK, I see.  If there's a better place downstream to make a swizzle, I'm
certainly fine with that.

I disabled locally_poor_mem_replacement and added some dump information
in should_replace_address to show the costs for the replacement I'm
trying to avoid:

In should_replace_address:
  old_rtx = (reg/f:DI 125 [ D.2036 ])
  new_rtx = (plus:DI (reg/v/f:DI 126 [ p ])
        (reg:DI 128))
  address_cost (old_rtx) = 0
  address_cost (new_rtx) = 0
  set_src_cost (old_rtx) = 0
  set_src_cost (new_rtx) = 4

In insn 11, replacing
 (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)->a S4 A32])
 with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ])
            (reg:DI 128)) [2 p_1(D)->a S4 A32])
Changed insn 11
deferring rescan insn with uid = 11.
deferring rescan insn with uid = 11.

Hope this helps,
Bill

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 21:04       ` William J. Schmidt
@ 2011-10-06  8:57         ` Paolo Bonzini
  2011-10-06 13:23           ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-06  8:57 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

[-- Attachment #1: Type: text/plain, Size: 1499 bytes --]

On 10/05/2011 10:16 PM, William J. Schmidt wrote:
> OK, I see.  If there's a better place downstream to make a swizzle, I'm
> certainly fine with that.
>
> I disabled locally_poor_mem_replacement and added some dump information
> in should_replace_address to show the costs for the replacement I'm
> trying to avoid:
>
> In should_replace_address:
>    old_rtx = (reg/f:DI 125 [ D.2036 ])
>    new_rtx = (plus:DI (reg/v/f:DI 126 [ p ])
>          (reg:DI 128))
>    address_cost (old_rtx) = 0
>    address_cost (new_rtx) = 0
>    set_src_cost (old_rtx) = 0
>    set_src_cost (new_rtx) = 4
>
> In insn 11, replacing
>   (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)->a S4 A32])
>   with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ])
>              (reg:DI 128)) [2 p_1(D)->a S4 A32])
> Changed insn 11
> deferring rescan insn with uid = 11.
> deferring rescan insn with uid = 11.

And IIUC the other address is based on pseudo 125 as well, but the 
combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and 
cannot be represented on ppc.  I think _this_ is the problem, so I'm 
afraid your patch could cause pessimizations on x86 for example.  On 
x86, which has a cheap REG+REG+CONST addressing mode, it is much better 
to propagate pseudo 125 so that you can delete the set altogether.

However, indeed there is no downstream pass that undoes the 
transformation.  Perhaps we can do it in CSE, since this _is_ CSE after 
all. :)  The attached untested (uncompiled) patch is an attempt.

Paolo

[-- Attachment #2: cse-addrs.patch --]
[-- Type: text/x-patch, Size: 3215 bytes --]

Index: cse.c
===================================================================
--- cse.c	(revision 177688)
+++ cse.c	(working copy)
@@ -3136,6 +3136,75 @@ find_comparison_args (enum rtx_code code
   return code;
 }
 \f
+static rtx
+lookup_addr (rtx insn, rtx *loc, enum machine_mode mode)
+{
+  struct table_elt *elt, *p;
+  int regno;
+  int hash;
+  int base_cost;
+  rtx addr = *loc;
+  rtx exp;
+
+  /* Try to reuse existing registers for addresses, in hope of shortening
+     live ranges for the registers that compose the addresses.  This happens
+     when you have
+
+         (set (reg C) (plus (reg A) (reg B))
+         (set (reg D) (mem (reg C)))
+         (set (reg E) (mem (plus (reg C) (const_int X))))
+
+     In this case fwprop will try to propagate into the addresses, but
+     if propagation into reg E fails, the only result will have been to
+     uselessly lengthen the live range of A and B.  */
+
+  if (!REG_P (addr))
+    return;
+
+  regno = REGNO (addr);
+  if (regno == FRAME_POINTER_REGNUM
+      || regno == HARD_FRAME_POINTER_REGNUM
+      || regno == ARG_POINTER_REGNUM)
+    return;
+
+   /* If this address is not in the hash table, we can't look for equivalences
+      of the whole address.  Also, ignore if volatile.  */
+
+  {
+    int save_do_not_record = do_not_record;
+    int save_hash_arg_in_memory = hash_arg_in_memory;
+    int addr_volatile;
+
+    do_not_record = 0;
+    hash = HASH (addr, Pmode);
+    addr_volatile = do_not_record;
+    do_not_record = save_do_not_record;
+    hash_arg_in_memory = save_hash_arg_in_memory;
+
+    if (addr_volatile)
+      return;
+  }
+
+  /* Try to find a REG that holds the same address.  */
+
+  elt = lookup (addr, hash, Pmode);
+  if (!elt)
+    return;
+
+  base_cost = address_cost (*loc, mode);
+  for (p = elt->first_same_value; p; p = p->next_same_value)
+    {
+      exp = p->exp;
+      if (REG_P (exp)
+          && exp_equiv_p (exp, exp, 1, false)
+          && address_cost (exp, mode) > base_cost)
+        break;
+    }
+
+  if (p)
+    validate_change (insn, loc, canon_reg (copy_rtx (exp), NULL_RTX), 0));
+}
+
 /* If X is a nontrivial arithmetic operation on an argument for which
    a constant value can be determined, return the result of operating
    on that value, as a constant.  Otherwise, return X, possibly with
@@ -3180,6 +3249,12 @@ fold_rtx (rtx x, rtx insn)
   switch (code)
     {
     case MEM:
+      if ((new_rtx = equiv_constant (x)) != NULL_RTX)
+        return new_rtx;
+      if (insn)
+        lookup_addr (insn, &XEXP (x, 0), GET_MODE (x));
+      return x;
+
     case SUBREG:
       if ((new_rtx = equiv_constant (x)) != NULL_RTX)
         return new_rtx;
Index: passes.c
===================================================================
--- passes.c	(revision 177688)
+++ passes.c	(working copy)
@@ -1448,9 +1448,9 @@ init_optimization_passes (void)
 	}
       NEXT_PASS (pass_web);
       NEXT_PASS (pass_rtl_cprop);
+      NEXT_PASS (pass_rtl_fwprop_addr);
       NEXT_PASS (pass_cse2);
       NEXT_PASS (pass_rtl_dse1);
-      NEXT_PASS (pass_rtl_fwprop_addr);
       NEXT_PASS (pass_inc_dec);
       NEXT_PASS (pass_initialize_regs);
       NEXT_PASS (pass_ud_rtl_dce);

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-05 16:16 [PATCH] Fix PR46556 (poor address generation) William J. Schmidt
  2011-10-05 16:24 ` Paolo Bonzini
  2011-10-05 16:41 ` Steven Bosscher
@ 2011-10-06 10:21 ` Richard Guenther
  2011-10-06 13:49   ` William J. Schmidt
  2011-10-06 17:47   ` Jeff Law
  2 siblings, 2 replies; 40+ messages in thread
From: Richard Guenther @ 2011-10-06 10:21 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

On Wed, 5 Oct 2011, William J. Schmidt wrote:

> This patch addresses the poor code generation in PR46556 for the
> following code:
> 
> struct x
> {
>   int a[16];
>   int b[16];
>   int c[16];
> };
> 
> extern void foo (int, int, int);
> 
> void
> f (struct x *p, unsigned int n)
> {
>   foo (p->a[n], p->c[n], p->b[n]);
> }
> 
> Prior to the fix for PR32698, gcc calculated the offset for accessing
> the array elements as:  n*4; 64+n*4; 128+n*4.
> 
> Following that fix, the offsets are calculated as:  n*4; (n+16)*4; (n
> +32)*4.  This led to poor code generation on powerpc64 targets, among
> others.
> 
> The poor code generation was observed to not occur in loops, as the
> IVOPTS code does a good job of lowering these expressions to MEM_REFs.
> It was previously suggested that perhaps a general pass to lower memory
> accesses to MEM_REFs in GIMPLE would solve not only this, but other
> similar problems.  I spent some time looking into various approaches to
> this, and reviewing some previous attempts to do similar things.  In the
> end, I've concluded that this is a bad idea in practice because of the
> loss of useful aliasing information.  In particular, early lowering of
> component references causes us to lose the ability to disambiguate
> non-overlapping references in the same structure, and there is no simple
> way to carry the necessary aliasing information along with the
> replacement MEM_REFs to avoid this.  While some performance gains are
> available with GIMPLE lowering of memory accesses, there are also
> offsetting performance losses, and I suspect this would just be a
> continuous source of bug reports into the future.
> 
> Therefore the current patch is a much simpler approach to solve the
> specific problem noted in the PR.  There are two pieces to the patch:
> 
>  * The offending addressing pattern is matched in GIMPLE and transformed
> into a restructured MEM_REF that distributes the multiply, so that (n
> +32)*4 becomes 4*n+128 as before.  This is done during the reassociation
> pass, for reasons described below.  The transformation only occurs in
> non-loop blocks, since IVOPTS does a good job on such things within
> loops.
>  * A tweak is added to the RTL forward-propagator to avoid propagating
> into memory references based on a single base register with no offset,
> under certain circumstances.  This improves sharing of base registers
> for accesses within the same structure and slightly lowers register
> pressure.
> 
> It would be possible to separate these into two patches if that's
> preferred.  I chose to combine them because together they provide the
> ideal code generation that the new test cases test for.
> 
> I initially implemented the pattern matcher during expand, but I found
> that the expanded code for two accesses to the same structure was often
> not being CSEd well.  So I moved it back into the GIMPLE phases prior to
> DOM to take advantage of its CSE.  To avoid an additional complete pass
> over the IL, I chose to piggyback on the reassociation pass.  This
> transformation is not technically a reassociation, but it is related
> enough to not be a complete wart.
> 
> One noob question about this:  It would probably be preferable to have
> this transformation only take place during the second reassociation
> pass, so the ARRAY_REFs are seen by earlier optimization phases.  Is
> there an easy way to detect that it's the second pass without having to
> generate a separate pass entry point?
> 
> One other general question about the pattern-match transformation:  Is
> this an appropriate transformation for all targets, or should it be
> somehow gated on available addressing modes on the target processor?
> 
> Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
> performance degradations on that target for SPEC CPU2000 and CPU2006.
> 
> I'm looking for eventual approval for trunk after any comments are
> resolved.  Thanks!

People have already commented on pieces, so I'm looking only
at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
on IVOPTs instead?  The idea is to expose additional CSE
opportunities, right?  So it's sort-of a strength-reduction
optimization on scalar code (classically strength reduction
in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
That might be worth in general, even for non-address cases.
So - if you rename that thing to tree-ssa-strength-reduce.c you
can get away without piggy-backing on anything ;)  If you
structure it to detect a strength reduction opportunity
(thus, you'd need to match two/multiple of the patterns at the same time)
that would be a bonus ... generalizing it a little bit would be
another.

Now some comments on the patch ...

> Bill
> 
> 
> 2011-10-05  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> gcc:
> 	
> 	PR rtl-optimization/46556
> 	* fwprop.c (fwprop_bb_aux_d): New struct.
> 	(MEM_PLUS_REGS): New macro.
> 	(record_mem_plus_reg): New function.
> 	(record_mem_plus_regs): Likewise.
> 	(single_def_use_enter_block): Record mem-plus-reg patterns.
> 	(build_single_def_use_links): Allocate aux storage.
> 	(locally_poor_mem_replacement): New function.
> 	(forward_propagate_and_simplify): Call
> 	locally_poor_mem_replacement.
> 	(fwprop_init): Free storage.
> 	* tree.h (copy_ref_info): Expose existing function.
> 	* tree-ssa-loop-ivopts.c (copy_ref_info): Remove static token.
> 	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
> 	(restructure_mem_ref): Likewise.
> 	(reassociate_bb): Look for opportunities to call
> 	restructure_mem_ref; clean up immediate use lists.
> 
> gcc/testsuite:
> 	
> 	PR rtl-optimization/46556
> 	* gcc.target/powerpc/ppc-pr46556-1.c: New testcase.
> 	* gcc.target/powerpc/ppc-pr46556-2.c: Likewise.
> 	* gcc.target/powerpc/ppc-pr46556-3.c: Likewise.
> 	* gcc.target/powerpc/ppc-pr46556-4.c: Likewise.
> 	* gcc.dg/tree-ssa/pr46556-1.c: Likewise.
> 	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.

> Index: gcc/tree-ssa-reassoc.c
> ===================================================================
> --- gcc/tree-ssa-reassoc.c	(revision 179317)
> +++ gcc/tree-ssa-reassoc.c	(working copy)
> @@ -2361,6 +2361,116 @@ break_up_subtract_bb (basic_block bb)
>      break_up_subtract_bb (son);
>  }
>  
> +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
> +   determine whether there is a profitable opportunity to restructure
> +   address arithmetic within BASE and OFFSET.  If so, produce such
> +   a restructuring in *MEM_REF and return true; else return false.  */
> +
> +static tree
> +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,
> +			     tree base, tree offset, HOST_WIDE_INT bitpos)
> +{
> +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> +  HOST_WIDE_INT c, c1, c2, c3, c4;
> +
> +  /* Look for the following pattern:
> +
> +       base   = MEM_REF (T1, C1);
> +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +       bitpos = C4 * BITS_PER_UNIT
> +
> +     If found, convert it to:
> +
> +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                C1 + (C2 * C3) + C4)
> +   */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
> +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
> +  c3 = TREE_INT_CST_LOW (mult_op1);

Before accessing TREE_INT_CST_LOW you need to make sure the
constants fit into a HWI using host_integerp () (which
conveniently includes the check for INTEGER_CST).

Note that you need to sign-extend the MEM_REF offset,
thus use mem_ref_offset (base).low instead of
TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
to add a testcase with negative offset ;)

> +  c4 = bitpos / BITS_PER_UNIT;
> +  c = c1 + c2 * c3 + c4;

And you don't know whether this operation overflows.  Thus it's
probably easiest to use double_ints instead of HOST_WIDE_INTs
in all of the code.

> +
> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +  
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2,
> +			   build_int_cst (sizetype, c3));
> +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> +					true, GSI_SAME_STMT);
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> +				       true, GSI_SAME_STMT);
> +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> +			 build_int_cst (offset_type, c));
> +  return mem_ref;
> +}
> +
> +/* If *EXPR contains a memory reference, determine whether there is an
> +   opportunity to restructure the base and offset expressions for
> +   improved performance.  */
> +
> +static bool
> +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
> +{
> +  enum tree_code code = TREE_CODE (*expr);
> +  tree base, offset, mem_ref;
> +  HOST_WIDE_INT bitsize, bitpos;
> +  enum machine_mode mode;
> +  int unsignedp, volatilep;
> +
> +  /* Only expressions that reference memory are of interest.  Bitfield
> +     references should eventually be lowered prior to this point and
> +     are not dealt with.  */
> +  if (! handled_component_p (*expr)
> +      || code == BIT_FIELD_REF
> +      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1))))
> +    return false;

I'd say you want to handle references with a non-BLKmode only here,
thus || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode

> +  /* Determine whether the expression can be represented with base and
> +     offset components.  */
> +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> +			      &unsignedp, &volatilep, false);
> +  if (!base || !offset)
> +    return false;
> +
> +  /* Look for a restructuring opportunity.  */
> +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> +					      offset, bitpos)) == NULL_TREE)
> +    return false;

What I'm missing is a check whether the old address computation stmts
will be dead after the transform.

> +  /* Found one.  Record the replacement in the dump file and complete
> +     the replacement.  */
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "\nOriginal_expr = ");
> +      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\nmem_ref = ");
> +      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\n\n");
> +    }
> +
> +  copy_ref_info (mem_ref, *expr);
> +  *expr = mem_ref;
> +
> +  return true;
> +}
> +
>  /* Reassociate expressions in basic block BB and its post-dominator as
>     children.  */
>  
> @@ -2369,14 +2479,30 @@ reassociate_bb (basic_block bb)
>  {
>    gimple_stmt_iterator gsi;
>    basic_block son;
> +  bool chgd_mem_ref = false;
> +  bool in_loop = loop_depth (bb->loop_father) != 0;
>  
>    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
>      {
>        gimple stmt = gsi_stmt (gsi);
>  
> -      if (is_gimple_assign (stmt)
> -	  && !stmt_could_throw_p (stmt))
> +      /* Look for restructuring opportunities within an expression
> +	 that references memory.  We only do this for blocks not
> +         contained in loops, since the ivopts machinery does a 
> +         good job on loop expressions, and we don't want to interfere
> +	 with other loop optimizations.  */
> +      if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt))
>  	{
> +	  tree *lhs, *rhs;
> +	  lhs = gimple_assign_lhs_ptr (stmt);
> +	  chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref;
> +	  rhs = gimple_assign_rhs1_ptr (stmt);
> +	  chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref;

It will either be a store or a load, but never both (unless it's an
aggregate copy which I think we should not handle).  So ...

  if (gimple_vdef (stmt))
    ... lhs
  else if (gimple_vuse (stmt))
    ... rhs

> +	}
> +
> +      else if (is_gimple_assign (stmt)
> +	       && !stmt_could_throw_p (stmt))
> +	{
>  	  tree lhs, rhs1, rhs2;
>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
>  
> @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb)
>  	    }
>  	}
>      }
> +  /* If memory references have been restructured, immediate uses need
> +     to be cleaned up.  */
> +  if (chgd_mem_ref)
> +    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> +      update_stmt (gsi_stmt (gsi));

ICK.  Definitely a no ;)

Why does a update_stmt () after the restructure_mem_ref call not work?

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06  8:57         ` Paolo Bonzini
@ 2011-10-06 13:23           ` William J. Schmidt
  0 siblings, 0 replies; 40+ messages in thread
From: William J. Schmidt @ 2011-10-06 13:23 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: gcc-patches, bergner, rguenth

On Thu, 2011-10-06 at 09:47 +0200, Paolo Bonzini wrote:
> And IIUC the other address is based on pseudo 125 as well, but the 
> combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and 
> cannot be represented on ppc.  I think _this_ is the problem, so I'm 
> afraid your patch could cause pessimizations on x86 for example.  On 
> x86, which has a cheap REG+REG+CONST addressing mode, it is much better 
> to propagate pseudo 125 so that you can delete the set altogether.
> 
> However, indeed there is no downstream pass that undoes the 
> transformation.  Perhaps we can do it in CSE, since this _is_ CSE after 
> all. :)  The attached untested (uncompiled) patch is an attempt.
> 
> Paolo

Thanks, Paolo!  This makes good sense.  I will play with your (second :)
patch and let you know how it goes.

Bill

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 10:21 ` Richard Guenther
@ 2011-10-06 13:49   ` William J. Schmidt
  2011-10-06 14:24     ` Richard Guenther
  2011-10-06 17:47   ` Jeff Law
  1 sibling, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-06 13:49 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, rguenth

On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote:
> People have already commented on pieces, so I'm looking only
> at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
> on IVOPTs instead?  The idea is to expose additional CSE
> opportunities, right?  So it's sort-of a strength-reduction
> optimization on scalar code (classically strength reduction
> in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
> That might be worth in general, even for non-address cases.
> So - if you rename that thing to tree-ssa-strength-reduce.c you
> can get away without piggy-backing on anything ;)  If you
> structure it to detect a strength reduction opportunity
> (thus, you'd need to match two/multiple of the patterns at the same time)
> that would be a bonus ... generalizing it a little bit would be
> another.

These are all good ideas.  I will think about casting this as a more
general strength reduction over extended basic blocks outside of loops.
First I'll put together some simple tests to see whether we're currently
missing some non-address opportunities.

<snip>

> > +  mult_op0 = TREE_OPERAND (offset, 0);
> > +  mult_op1 = TREE_OPERAND (offset, 1);
> > +
> > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > +      || TREE_CODE (mult_op1) != INTEGER_CST
> > +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > +    return NULL_TREE;
> > +
> > +  t1 = TREE_OPERAND (base, 0);
> > +  t2 = TREE_OPERAND (mult_op0, 0);
> > +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
> > +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
> > +  c3 = TREE_INT_CST_LOW (mult_op1);
> 
> Before accessing TREE_INT_CST_LOW you need to make sure the
> constants fit into a HWI using host_integerp () (which
> conveniently includes the check for INTEGER_CST).
> 
> Note that you need to sign-extend the MEM_REF offset,
> thus use mem_ref_offset (base).low instead of
> TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
> to add a testcase with negative offset ;)

D'oh! >.<

> 
> > +  c4 = bitpos / BITS_PER_UNIT;
> > +  c = c1 + c2 * c3 + c4;
> 
> And you don't know whether this operation overflows.  Thus it's
> probably easiest to use double_ints instead of HOST_WIDE_INTs
> in all of the code.

OK, thanks, will do.

<snip>

> 
> > +  /* Determine whether the expression can be represented with base and
> > +     offset components.  */
> > +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> > +			      &unsignedp, &volatilep, false);
> > +  if (!base || !offset)
> > +    return false;
> > +
> > +  /* Look for a restructuring opportunity.  */
> > +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> > +					      offset, bitpos)) == NULL_TREE)
> > +    return false;
> 
> What I'm missing is a check whether the old address computation stmts
> will be dead after the transform.

Hm, not quite sure what to do here.  Prior to the transformation I'll
have an assignment with something like:

ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td)

on LHS or RHS.  Ta and Td will be part of the replacement.  What should
I be checking for?

<snip>

> >  
> > -      if (is_gimple_assign (stmt)
> > -	  && !stmt_could_throw_p (stmt))
> > +      /* Look for restructuring opportunities within an expression
> > +	 that references memory.  We only do this for blocks not
> > +         contained in loops, since the ivopts machinery does a 
> > +         good job on loop expressions, and we don't want to interfere
> > +	 with other loop optimizations.  */
> > +      if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt))
> >  	{
> > +	  tree *lhs, *rhs;
> > +	  lhs = gimple_assign_lhs_ptr (stmt);
> > +	  chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref;
> > +	  rhs = gimple_assign_rhs1_ptr (stmt);
> > +	  chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref;
> 
> It will either be a store or a load, but never both (unless it's an
> aggregate copy which I think we should not handle).  So ...
> 
>   if (gimple_vdef (stmt))
>     ... lhs
>   else if (gimple_vuse (stmt))
>     ... rhs

OK, with your suggested gating on non-BLKmode I agree.

> > +	}
> > +
> > +      else if (is_gimple_assign (stmt)
> > +	       && !stmt_could_throw_p (stmt))
> > +	{
> >  	  tree lhs, rhs1, rhs2;
> >  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
> >  
> > @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb)
> >  	    }
> >  	}
> >      }
> > +  /* If memory references have been restructured, immediate uses need
> > +     to be cleaned up.  */
> > +  if (chgd_mem_ref)
> > +    for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +      update_stmt (gsi_stmt (gsi));
> 
> ICK.  Definitely a no ;)
> 
> Why does a update_stmt () after the restructure_mem_ref call not work?

Ah, yeah, I meant to check again on that before submitting.  >.< 

IIRC, at some point the update_stmt () following restructure_mem_ref was
still giving me verify errors.  I thought perhaps the statements created
by force_gimple_operand_gsi might be giving me trouble -- threw this in
to work around it and missed it on my pre-submission review.

I'll re-check this.  If for some reason it is the inserted statements, I
can more carefully update just those -- but I just need to go back and
sort it out.  Sorry for letting that slip through.

Thanks for the review!

Bill

> 
> Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 13:49   ` William J. Schmidt
@ 2011-10-06 14:24     ` Richard Guenther
  2011-10-06 17:27       ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Guenther @ 2011-10-06 14:24 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

On Thu, 6 Oct 2011, William J. Schmidt wrote:

> On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote:
> > People have already commented on pieces, so I'm looking only
> > at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
> > on IVOPTs instead?  The idea is to expose additional CSE
> > opportunities, right?  So it's sort-of a strength-reduction
> > optimization on scalar code (classically strength reduction
> > in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
> > That might be worth in general, even for non-address cases.
> > So - if you rename that thing to tree-ssa-strength-reduce.c you
> > can get away without piggy-backing on anything ;)  If you
> > structure it to detect a strength reduction opportunity
> > (thus, you'd need to match two/multiple of the patterns at the same time)
> > that would be a bonus ... generalizing it a little bit would be
> > another.
> 
> These are all good ideas.  I will think about casting this as a more
> general strength reduction over extended basic blocks outside of loops.
> First I'll put together some simple tests to see whether we're currently
> missing some non-address opportunities.
> 
> <snip>
> 
> > > +  mult_op0 = TREE_OPERAND (offset, 0);
> > > +  mult_op1 = TREE_OPERAND (offset, 1);
> > > +
> > > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > > +      || TREE_CODE (mult_op1) != INTEGER_CST
> > > +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > > +    return NULL_TREE;
> > > +
> > > +  t1 = TREE_OPERAND (base, 0);
> > > +  t2 = TREE_OPERAND (mult_op0, 0);
> > > +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
> > > +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
> > > +  c3 = TREE_INT_CST_LOW (mult_op1);
> > 
> > Before accessing TREE_INT_CST_LOW you need to make sure the
> > constants fit into a HWI using host_integerp () (which
> > conveniently includes the check for INTEGER_CST).
> > 
> > Note that you need to sign-extend the MEM_REF offset,
> > thus use mem_ref_offset (base).low instead of
> > TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
> > to add a testcase with negative offset ;)
> 
> D'oh! >.<
> 
> > 
> > > +  c4 = bitpos / BITS_PER_UNIT;
> > > +  c = c1 + c2 * c3 + c4;
> > 
> > And you don't know whether this operation overflows.  Thus it's
> > probably easiest to use double_ints instead of HOST_WIDE_INTs
> > in all of the code.
> 
> OK, thanks, will do.
> 
> <snip>
> 
> > 
> > > +  /* Determine whether the expression can be represented with base and
> > > +     offset components.  */
> > > +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> > > +			      &unsignedp, &volatilep, false);
> > > +  if (!base || !offset)
> > > +    return false;
> > > +
> > > +  /* Look for a restructuring opportunity.  */
> > > +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> > > +					      offset, bitpos)) == NULL_TREE)
> > > +    return false;
> > 
> > What I'm missing is a check whether the old address computation stmts
> > will be dead after the transform.
> 
> Hm, not quite sure what to do here.  Prior to the transformation I'll
> have an assignment with something like:
> 
> ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td)
> 
> on LHS or RHS.  Ta and Td will be part of the replacement.  What should
> I be checking for?

Doh, I thought you were matching gimple stmts that do the address
computation.  But now I see you are matching the tree returned from
get_inner_reference.  So no need to check anything for that case.

But that keeps me wondering what you'll do if the accesses were
all pointer arithmetic, not arrays.  Thus,

extern void foo (int, int, int);

void
f (int *p, unsigned int n)
{
 foo (p[n], p[n+64], p[n+128]);
}

wouldn't that have the same issue and you wouldn't handle it?

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 14:24     ` Richard Guenther
@ 2011-10-06 17:27       ` William J. Schmidt
  2011-10-07  8:28         ` Richard Guenther
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-06 17:27 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, rguenth

On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote:

<snip>

> 
> Doh, I thought you were matching gimple stmts that do the address
> computation.  But now I see you are matching the tree returned from
> get_inner_reference.  So no need to check anything for that case.
> 
> But that keeps me wondering what you'll do if the accesses were
> all pointer arithmetic, not arrays.  Thus,
> 
> extern void foo (int, int, int);
> 
> void
> f (int *p, unsigned int n)
> {
>  foo (p[n], p[n+64], p[n+128]);
> }
> 
> wouldn't that have the same issue and you wouldn't handle it?
> 
> Richard.
> 

Good point.  This indeed gets missed here, and that's more fuel for
doing a generalized strength reduction along with the special cases like
p->a[n] that are only exposed with get_inner_reference.

(The pointer arithmetic cases were picked up in my earlier "big-hammer"
approach using the aff-comb machinery, but that had too many problems in
the end, as you know.)

So for the long term I will look into a full strength reducer for
non-loop code.  For the short term, what do you think about keeping this
single transformation in reassoc to make sure it gets into 4.7?  I would
plan to strip it back out and fold it into the strength reducer
thereafter, which might or might not make 4.7 depending on my other
responsibilities and how the 4.7 schedule goes.  I haven't seen anything
official, but I'm guessing we're getting towards the end of 4.7 stage 1?

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 10:21 ` Richard Guenther
  2011-10-06 13:49   ` William J. Schmidt
@ 2011-10-06 17:47   ` Jeff Law
  2011-10-06 18:04     ` William J. Schmidt
  2011-10-07  8:53     ` Richard Guenther
  1 sibling, 2 replies; 40+ messages in thread
From: Jeff Law @ 2011-10-06 17:47 UTC (permalink / raw)
  To: Richard Guenther; +Cc: William J. Schmidt, gcc-patches, bergner, rguenth

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/06/11 04:13, Richard Guenther wrote:

> 
> People have already commented on pieces, so I'm looking only at the
> tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
> instead?  The idea is to expose additional CSE opportunities,
> right?  So it's sort-of a strength-reduction optimization on scalar
> code (classically strength reduction in loops transforms for (i) {
> z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
> general, even for non-address cases. So - if you rename that thing
> to tree-ssa-strength-reduce.c you can get away without
> piggy-backing on anything ;)  If you structure it to detect a
> strength reduction opportunity (thus, you'd need to match
> two/multiple of the patterns at the same time) that would be a
> bonus ... generalizing it a little bit would be another.
There's a variety of literature that uses PRE to detect and optimize
straightline code strength reduction.  I poked at it at one time (RTL
gcse framework) and it looked reasonably promising.  Never pushed it
all the way through.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOjebJAAoJEBRtltQi2kC71ogH/AkMNzXpYK1GXp2EhoS+3Dhn
T1mWDKdHT5+ozpuAxRFzuCSQ8HmkbLJk8fGpOyUuLr15zEnT1isE7cU3i4ZzY3o0
lduo9Ck23rMWNroYgxbV+zPvArW5MG9qrGO6XSBynfipmlpznEo8zQPiaoaASlHz
8G7gd9P2la1QHha9OVtiCMKs0zgckU55RqiwV7d8DMi5tgoq5wkN+qcKCoSI7+b0
jxAukIcp6O8QZ6ADcHyAdav+zZzGDBycEhgakam71WifjFlysah2TG05SsK75Dxi
h3S13yPpx/A8zBuex5osL0qOGn0H7L93uAsTxcv4dTEpUl4Jx7Y5FoPOEp5D1Z4=
=LcZy
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 17:47   ` Jeff Law
@ 2011-10-06 18:04     ` William J. Schmidt
  2011-10-06 18:15       ` Jeff Law
  2011-10-07  8:53     ` Richard Guenther
  1 sibling, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-06 18:04 UTC (permalink / raw)
  To: Jeff Law; +Cc: Richard Guenther, gcc-patches, bergner, rguenth

On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 10/06/11 04:13, Richard Guenther wrote:
> 
> > 
> > People have already commented on pieces, so I'm looking only at the
> > tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
> > instead?  The idea is to expose additional CSE opportunities,
> > right?  So it's sort-of a strength-reduction optimization on scalar
> > code (classically strength reduction in loops transforms for (i) {
> > z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
> > general, even for non-address cases. So - if you rename that thing
> > to tree-ssa-strength-reduce.c you can get away without
> > piggy-backing on anything ;)  If you structure it to detect a
> > strength reduction opportunity (thus, you'd need to match
> > two/multiple of the patterns at the same time) that would be a
> > bonus ... generalizing it a little bit would be another.
> There's a variety of literature that uses PRE to detect and optimize
> straightline code strength reduction.  I poked at it at one time (RTL
> gcse framework) and it looked reasonably promising.  Never pushed it
> all the way through.
> 
> jeff

I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this
question has come up before.  The former also suggested a PRE-based
approach.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 18:04     ` William J. Schmidt
@ 2011-10-06 18:15       ` Jeff Law
  0 siblings, 0 replies; 40+ messages in thread
From: Jeff Law @ 2011-10-06 18:15 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: Richard Guenther, gcc-patches, bergner, rguenth

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/06/11 12:02, William J. Schmidt wrote:
> On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 10/06/11 04:13, Richard Guenther wrote:
>> 
>>> 
>>> People have already commented on pieces, so I'm looking only at
>>> the tree-ssa-reassoc.c pieces (did you consider piggy-backing
>>> on IVOPTs instead?  The idea is to expose additional CSE
>>> opportunities, right?  So it's sort-of a strength-reduction
>>> optimization on scalar code (classically strength reduction in
>>> loops transforms for (i) { z = i*x; } to z = 0; for (i) { z +=
>>> x }). That might be worth in general, even for non-address
>>> cases. So - if you rename that thing to
>>> tree-ssa-strength-reduce.c you can get away without 
>>> piggy-backing on anything ;)  If you structure it to detect a 
>>> strength reduction opportunity (thus, you'd need to match 
>>> two/multiple of the patterns at the same time) that would be a 
>>> bonus ... generalizing it a little bit would be another.
>> There's a variety of literature that uses PRE to detect and
>> optimize straightline code strength reduction.  I poked at it at
>> one time (RTL gcse framework) and it looked reasonably promising.
>> Never pushed it all the way through.
>> 
>> jeff
> 
> I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this 
> question has come up before.  The former also suggested a
> PRE-based approach.
Yea.  We've kicked it around several times over the last 15 or so
years.

When I briefly looked at it, I was doing so more in the context of
eliminating all the optimize_related_values crap, purely as a cleanup
and utlimately couldn't justify the time.

IIRC Morgan & Muchnick both had write-ups on the basic concepts.
There's probably other literature as well.

jeff
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOje4wAAoJEBRtltQi2kC7q9UIAIXdiGG5Reu75PBkMPO9KKhn
RRGKQMMNSinGDyyORGqxfqwrirtFuQqzn+ITRsfjegHydUTwwDDaAtTyqEqFmpt0
2phGYBOS5pN4VImKzrd2fxlwbuW0jUlOpDuWFK+K10W8jU3SlJSIZhfaSMh1PwC5
IQm6FLDiuRuNSgyYattUnI5KZ2chN2QEkfUBQgDvbxHXfPDqjNQymIfv1K9iymrG
j3Wq7i47fBkYbnPYtAQ9GCYsmT6Wo2v+2/ZeFWE417FYYhgCdBeYu2iZPE6Nm8Pb
SypPDyi1AQ3QRfg+LPiN1bdQk40MhfPlMhHZtnVtq8nEa9+fLTgO/ERzCD0G+r8=
=XWLf
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 17:27       ` William J. Schmidt
@ 2011-10-07  8:28         ` Richard Guenther
  2011-10-07  9:45           ` Paolo Bonzini
  2011-10-08 15:44           ` William J. Schmidt
  0 siblings, 2 replies; 40+ messages in thread
From: Richard Guenther @ 2011-10-07  8:28 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner, rguenth

On Thu, 6 Oct 2011, William J. Schmidt wrote:

> On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote:
> 
> <snip>
> 
> > 
> > Doh, I thought you were matching gimple stmts that do the address
> > computation.  But now I see you are matching the tree returned from
> > get_inner_reference.  So no need to check anything for that case.
> > 
> > But that keeps me wondering what you'll do if the accesses were
> > all pointer arithmetic, not arrays.  Thus,
> > 
> > extern void foo (int, int, int);
> > 
> > void
> > f (int *p, unsigned int n)
> > {
> >  foo (p[n], p[n+64], p[n+128]);
> > }
> > 
> > wouldn't that have the same issue and you wouldn't handle it?
> > 
> > Richard.
> > 
> 
> Good point.  This indeed gets missed here, and that's more fuel for
> doing a generalized strength reduction along with the special cases like
> p->a[n] that are only exposed with get_inner_reference.
> 
> (The pointer arithmetic cases were picked up in my earlier "big-hammer"
> approach using the aff-comb machinery, but that had too many problems in
> the end, as you know.)

Yeah, I know.  It's a very tricky area ;)

> So for the long term I will look into a full strength reducer for
> non-loop code.  For the short term, what do you think about keeping this
> single transformation in reassoc to make sure it gets into 4.7?  I would
> plan to strip it back out and fold it into the strength reducer
> thereafter, which might or might not make 4.7 depending on my other
> responsibilities and how the 4.7 schedule goes.  I haven't seen anything
> official, but I'm guessing we're getting towards the end of 4.7 stage 1?

It's a reasonable plan - you'd have to introduce a late reassoc
pass though.  Can you separate out the RTL fwprop changes?  So
we can iterate over the tree parts separately.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-06 17:47   ` Jeff Law
  2011-10-06 18:04     ` William J. Schmidt
@ 2011-10-07  8:53     ` Richard Guenther
  1 sibling, 0 replies; 40+ messages in thread
From: Richard Guenther @ 2011-10-07  8:53 UTC (permalink / raw)
  To: Jeff Law; +Cc: William J. Schmidt, gcc-patches, bergner

On Thu, 6 Oct 2011, Jeff Law wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 10/06/11 04:13, Richard Guenther wrote:
> 
> > 
> > People have already commented on pieces, so I'm looking only at the
> > tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
> > instead?  The idea is to expose additional CSE opportunities,
> > right?  So it's sort-of a strength-reduction optimization on scalar
> > code (classically strength reduction in loops transforms for (i) {
> > z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
> > general, even for non-address cases. So - if you rename that thing
> > to tree-ssa-strength-reduce.c you can get away without
> > piggy-backing on anything ;)  If you structure it to detect a
> > strength reduction opportunity (thus, you'd need to match
> > two/multiple of the patterns at the same time) that would be a
> > bonus ... generalizing it a little bit would be another.
> There's a variety of literature that uses PRE to detect and optimize
> straightline code strength reduction.  I poked at it at one time (RTL
> gcse framework) and it looked reasonably promising.  Never pushed it
> all the way through.

I poked at it in the tree PRE framework at some point but never
pushed it through either.  At least it didn't feel like it "fits" PRE
(and PRE should also be able to handle the loop case).

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-07  8:28         ` Richard Guenther
@ 2011-10-07  9:45           ` Paolo Bonzini
  2011-10-07 12:38             ` William J. Schmidt
  2011-10-08 15:44           ` William J. Schmidt
  1 sibling, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-07  9:45 UTC (permalink / raw)
  To: Richard Guenther; +Cc: William J. Schmidt, gcc-patches, bergner, rguenth

On 10/07/2011 10:00 AM, Richard Guenther wrote:
> It's a reasonable plan - you'd have to introduce a late reassoc
> pass though.  Can you separate out the RTL fwprop changes?  So
> we can iterate over the tree parts separately.

That's indeed better, also because they became CSE changes.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-07  9:45           ` Paolo Bonzini
@ 2011-10-07 12:38             ` William J. Schmidt
  0 siblings, 0 replies; 40+ messages in thread
From: William J. Schmidt @ 2011-10-07 12:38 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Richard Guenther, gcc-patches, bergner, rguenth

On Fri, 2011-10-07 at 11:17 +0200, Paolo Bonzini wrote:
> On 10/07/2011 10:00 AM, Richard Guenther wrote:
> > It's a reasonable plan - you'd have to introduce a late reassoc
> > pass though.  Can you separate out the RTL fwprop changes?  So
> > we can iterate over the tree parts separately.
> 
> That's indeed better, also because they became CSE changes.
> 
> Paolo
> 

Yes, I'll plan to work these two separately.

Paolo, I should be able to shepherd your suggested patch through.
Normally company policy frowns on committing other people's patches, but
in a case like this where you're no longer actively doing GCC
development, I should be able to overcome that.

BTW, Richard, whatever my former issues with update_stmt were are gone,
so I'll be able to clean up that wart.

Thanks,
Bill

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-07  8:28         ` Richard Guenther
  2011-10-07  9:45           ` Paolo Bonzini
@ 2011-10-08 15:44           ` William J. Schmidt
  2011-10-11 12:15             ` Richard Guenther
  1 sibling, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-08 15:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner, rguenth

Greetings,

Here are the revised changes for the tree portions of the patch.  I've
attempted to resolve all comments to date on those portions.  Per
Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me
know if there's a better place, or whether you'd prefer to leave it
where it was.

I looked into changing the second reassoc pass to use a different
pass_late_reassoc entry, but this impacted the test suite.  There are
about 20 tests that rely on -fdump-tree-reassoc being associated with
two dump files named reassoc1 and reassoc2.  Rather than change all
these test cases for a temporary solution, I chose to use the deprecated
first_pass_instance boolean to distinguish between the two passes.  I
marked this as a Bad Thing and it will be removed once I have time to
work on the straight-line strength reducer.

I looked into adding a test case with a negative offset, but was unable
to come up with a construct that would have a negative offset on the
base MEM_REF and still be recognized by this particular pattern matcher.
In any case, the use of double_ints throughout should remove that
concern.

Thanks,
Bill


2011-10-08  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:

	PR rtl-optimization/46556
	* tree.h (copy_ref_info): Expose existing function.
	* tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to...
	* tree-ssa-address.c (copy_ref_info): ...here, and remove static token.
	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
	(restructure_mem_ref): Likewise.
	(reassociate_bb): Look for opportunities to call restructure_mem_ref.

gcc/testsuite:

	PR rtl-optimization/46556
	* gcc.dg/tree-ssa/pr46556-1.c: New testcase.
	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.


Index: gcc/tree.h
===================================================================
--- gcc/tree.h	(revision 179708)
+++ gcc/tree.h	(working copy)
@@ -5777,6 +5777,7 @@ tree target_for_debug_bind (tree);
 /* In tree-ssa-address.c.  */
 extern tree tree_mem_ref_addr (tree, tree);
 extern void copy_mem_ref_info (tree, tree);
+extern void copy_ref_info (tree, tree);
 
 /* In tree-vrp.c */
 extern bool ssa_name_nonnegative_p (const_tree);
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-dom2" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
+/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
+/* { dg-final { cleanup-tree-dump "dom2" } } */
Index: gcc/tree-ssa-loop-ivopts.c
===================================================================
--- gcc/tree-ssa-loop-ivopts.c	(revision 179708)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -6278,64 +6278,6 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da
     }
 }
 
-/* Copies the reference information from OLD_REF to NEW_REF.  */
-
-static void
-copy_ref_info (tree new_ref, tree old_ref)
-{
-  tree new_ptr_base = NULL_TREE;
-
-  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
-  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
-
-  new_ptr_base = TREE_OPERAND (new_ref, 0);
-
-  /* We can transfer points-to information from an old pointer
-     or decl base to the new one.  */
-  if (new_ptr_base
-      && TREE_CODE (new_ptr_base) == SSA_NAME
-      && !SSA_NAME_PTR_INFO (new_ptr_base))
-    {
-      tree base = get_base_address (old_ref);
-      if (!base)
-	;
-      else if ((TREE_CODE (base) == MEM_REF
-		|| TREE_CODE (base) == TARGET_MEM_REF)
-	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
-	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
-	{
-	  struct ptr_info_def *new_pi;
-	  duplicate_ssa_name_ptr_info
-	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
-	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
-	  /* We have to be careful about transfering alignment information.  */
-	  if (TREE_CODE (old_ref) == MEM_REF
-	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
-		   && (TMR_INDEX2 (new_ref)
-		       || (TMR_STEP (new_ref)
-			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
-			       < new_pi->align)))))
-	    {
-	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
-						  mem_ref_offset (new_ref)).low;
-	      new_pi->misalign &= (new_pi->align - 1);
-	    }
-	  else
-	    {
-	      new_pi->align = 1;
-	      new_pi->misalign = 0;
-	    }
-	}
-      else if (TREE_CODE (base) == VAR_DECL
-	       || TREE_CODE (base) == PARM_DECL
-	       || TREE_CODE (base) == RESULT_DECL)
-	{
-	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
-	  pt_solution_set_var (&pi->pt, base);
-	}
-    }
-}
-
 /* Performs a peephole optimization to reorder the iv update statement with
    a mem ref to enable instruction combining in later phases. The mem ref uses
    the iv value before the update, so the reordering transformation requires
Index: gcc/tree-ssa-address.c
===================================================================
--- gcc/tree-ssa-address.c	(revision 179708)
+++ gcc/tree-ssa-address.c	(working copy)
@@ -832,6 +832,64 @@ copy_mem_ref_info (tree to, tree from)
   TREE_THIS_VOLATILE (to) = TREE_THIS_VOLATILE (from);
 }
 
+/* Copies the reference information from OLD_REF to NEW_REF.  */
+
+void
+copy_ref_info (tree new_ref, tree old_ref)
+{
+  tree new_ptr_base = NULL_TREE;
+
+  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
+  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
+
+  new_ptr_base = TREE_OPERAND (new_ref, 0);
+
+  /* We can transfer points-to information from an old pointer
+     or decl base to the new one.  */
+  if (new_ptr_base
+      && TREE_CODE (new_ptr_base) == SSA_NAME
+      && !SSA_NAME_PTR_INFO (new_ptr_base))
+    {
+      tree base = get_base_address (old_ref);
+      if (!base)
+	;
+      else if ((TREE_CODE (base) == MEM_REF
+		|| TREE_CODE (base) == TARGET_MEM_REF)
+	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
+	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
+	{
+	  struct ptr_info_def *new_pi;
+	  duplicate_ssa_name_ptr_info
+	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
+	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
+	  /* We have to be careful about transfering alignment information.  */
+	  if (TREE_CODE (old_ref) == MEM_REF
+	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
+		   && (TMR_INDEX2 (new_ref)
+		       || (TMR_STEP (new_ref)
+			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
+			       < new_pi->align)))))
+	    {
+	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
+						  mem_ref_offset (new_ref)).low;
+	      new_pi->misalign &= (new_pi->align - 1);
+	    }
+	  else
+	    {
+	      new_pi->align = 1;
+	      new_pi->misalign = 0;
+	    }
+	}
+      else if (TREE_CODE (base) == VAR_DECL
+	       || TREE_CODE (base) == PARM_DECL
+	       || TREE_CODE (base) == RESULT_DECL)
+	{
+	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
+	  pt_solution_set_var (&pi->pt, base);
+	}
+    }
+}
+
 /* Move constants in target_mem_ref REF to offset.  Returns the new target
    mem ref if anything changes, NULL_TREE otherwise.  */
 
Index: gcc/tree-ssa-reassoc.c
===================================================================
--- gcc/tree-ssa-reassoc.c	(revision 179708)
+++ gcc/tree-ssa-reassoc.c	(working copy)
@@ -2815,6 +2815,121 @@ break_up_subtract_bb (basic_block bb)
     break_up_subtract_bb (son);
 }
 
+/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
+   determine whether there is a profitable opportunity to restructure
+   address arithmetic within BASE and OFFSET.  If so, produce such
+   a restructuring and return it.  */
+
+static tree
+restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,
+			     tree base, tree offset, HOST_WIDE_INT bitpos)
+{
+  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
+  double_int c, c1, c2, c3, c4;
+
+  /* Look for the following pattern:
+
+       base   = MEM_REF (T1, C1);
+       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+       bitpos = C4 * BITS_PER_UNIT
+
+     If found, convert it to:
+
+       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                C1 + (C2 * C3) + C4)
+   */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = TREE_INT_CST (TREE_OPERAND (base, 1));
+  c2 = TREE_INT_CST (TREE_OPERAND (mult_op0, 1));
+  c3 = TREE_INT_CST (mult_op1);
+  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
+  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
+
+  if (!double_int_fits_in_shwi_p (c)
+      || !double_int_fits_in_shwi_p (c3))
+    return NULL_TREE;
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
+			   build_int_cst (sizetype, double_int_to_shwi (c3)));
+  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
+					true, GSI_SAME_STMT);
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
+				       true, GSI_SAME_STMT);
+  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
+			 build_int_cst (offset_type, double_int_to_shwi (c)));
+  return mem_ref;
+}
+
+/* If *EXPR contains a memory reference, determine whether there is an
+   opportunity to restructure the base and offset expressions for
+   improved performance.  */
+
+static bool
+restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
+{
+  enum tree_code code = TREE_CODE (*expr);
+  tree base, offset, mem_ref;
+  HOST_WIDE_INT bitsize, bitpos;
+  enum machine_mode mode;
+  int unsignedp, volatilep;
+
+  /* Only expressions that reference memory are of interest.  Bitfield
+     references should eventually be lowered prior to this point and
+     are not dealt with.  Skip BLKmode aggregates as well.  */
+  if (! handled_component_p (*expr)
+      || code == BIT_FIELD_REF
+      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))
+      || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode)
+    return false;
+
+  /* Determine whether the expression can be represented with base and
+     offset components.  */
+  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
+			      &unsignedp, &volatilep, false);
+  if (!base || !offset)
+    return false;
+
+  /* Look for a restructuring opportunity.  */
+  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
+					      offset, bitpos)) == NULL_TREE)
+    return false;
+
+  /* Found one.  Record the replacement in the dump file and complete
+     the replacement.  */
+  if (dump_file && (dump_flags & TDF_DETAILS))
+    {
+      fprintf (dump_file, "\nOriginal_expr = ");
+      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\nmem_ref = ");
+      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
+      fprintf (dump_file, "\n\n");
+    }
+
+  copy_ref_info (mem_ref, *expr);
+  *expr = mem_ref;
+
+  return true;
+}
+
 /* Reassociate expressions in basic block BB and its post-dominator as
    children.  */
 
@@ -2823,14 +2938,43 @@ reassociate_bb (basic_block bb)
 {
   gimple_stmt_iterator gsi;
   basic_block son;
+  bool in_loop = loop_depth (bb->loop_father) != 0;
 
   for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
     {
       gimple stmt = gsi_stmt (gsi);
 
-      if (is_gimple_assign (stmt)
-	  && !stmt_could_throw_p (stmt))
+      /* During late reassociation only, look for restructuring
+	 opportunities within an expression that references memory.
+	 We only do this for blocks not contained in loops, since the
+	 ivopts machinery does a good job on loop expressions, and we
+	 don't want to interfere with other loop optimizations.  */
+      /* TODO: This belongs more properly in a separate pass that
+	 performs general strength reduction on straight-line code.
+	 Eventually move this there.  */
+      if (!first_pass_instance /* TODO: deprecated  */
+	  && !in_loop
+	  && gimple_vuse (stmt)
+	  && gimple_assign_single_p (stmt))
 	{
+	  tree *lhs, *rhs;
+	  if (gimple_vdef (stmt))
+	    {
+	      lhs = gimple_assign_lhs_ptr (stmt);
+	      if (restructure_mem_ref (lhs, &gsi))
+		update_stmt (stmt);
+	    }
+	  else if (gimple_vuse (stmt))
+	    {
+	      rhs = gimple_assign_rhs1_ptr (stmt);
+	      if (restructure_mem_ref (rhs, &gsi))
+		update_stmt (stmt);
+	    }
+	}
+
+      else if (is_gimple_assign (stmt)
+	       && !stmt_could_throw_p (stmt))
+	{
 	  tree lhs, rhs1, rhs2;
 	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-08 15:44           ` William J. Schmidt
@ 2011-10-11 12:15             ` Richard Guenther
  2011-10-11 13:57               ` Paolo Bonzini
                                 ` (2 more replies)
  0 siblings, 3 replies; 40+ messages in thread
From: Richard Guenther @ 2011-10-11 12:15 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner

On Sat, 8 Oct 2011, William J. Schmidt wrote:

> Greetings,
> 
> Here are the revised changes for the tree portions of the patch.  I've
> attempted to resolve all comments to date on those portions.  Per
> Steven's comment, I moved copy_ref_info into tree-ssa-address.c; let me
> know if there's a better place, or whether you'd prefer to leave it
> where it was.
> 
> I looked into changing the second reassoc pass to use a different
> pass_late_reassoc entry, but this impacted the test suite.  There are
> about 20 tests that rely on -fdump-tree-reassoc being associated with
> two dump files named reassoc1 and reassoc2.  Rather than change all
> these test cases for a temporary solution, I chose to use the deprecated
> first_pass_instance boolean to distinguish between the two passes.  I
> marked this as a Bad Thing and it will be removed once I have time to
> work on the straight-line strength reducer.
> 
> I looked into adding a test case with a negative offset, but was unable
> to come up with a construct that would have a negative offset on the
> base MEM_REF and still be recognized by this particular pattern matcher.
> In any case, the use of double_ints throughout should remove that
> concern.

Comments below.

> Thanks,
> Bill
> 
> 
> 2011-10-08  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> gcc:
> 
> 	PR rtl-optimization/46556
> 	* tree.h (copy_ref_info): Expose existing function.
> 	* tree-ssa-loop-ivopts.c (copy_ref_info): Move this function to...
> 	* tree-ssa-address.c (copy_ref_info): ...here, and remove static token.
> 	* tree-ssa-reassoc.c (restructure_base_and_offset): New function.
> 	(restructure_mem_ref): Likewise.
> 	(reassociate_bb): Look for opportunities to call restructure_mem_ref.
> 
> gcc/testsuite:
> 
> 	PR rtl-optimization/46556
> 	* gcc.dg/tree-ssa/pr46556-1.c: New testcase.
> 	* gcc.dg/tree-ssa/pr46556-2.c: Likewise.
> 	* gcc.dg/tree-ssa/pr46556-3.c: Likewise.
> 
> 
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h	(revision 179708)
> +++ gcc/tree.h	(working copy)
> @@ -5777,6 +5777,7 @@ tree target_for_debug_bind (tree);
>  /* In tree-ssa-address.c.  */
>  extern tree tree_mem_ref_addr (tree, tree);
>  extern void copy_mem_ref_info (tree, tree);
> +extern void copy_ref_info (tree, tree);
>  
>  /* In tree-vrp.c */
>  extern bool ssa_name_nonnegative_p (const_tree);
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 2 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 12)
> +    foo (p->a[n], p->c[n], p->b[n]);
> +  else if (n > 3)
> +    foo (p->b[n], p->a[n], p->c[n]);
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-dom2" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 3)
> +    {
> +      foo (p->a[n], p->c[n], p->b[n]);
> +      if (n > 12)
> +	foo (p->b[n], p->a[n], p->c[n]);
> +    }
> +}
> +
> +/* { dg-final { scan-tree-dump-times "\\* 4;" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "p_1\\(D\\) \\+ D" 1 "dom2" } } */
> +/* { dg-final { scan-tree-dump-times "MEM\\\[\\(struct x \\*\\)D" 6 "dom2" } } */
> +/* { dg-final { cleanup-tree-dump "dom2" } } */
> Index: gcc/tree-ssa-loop-ivopts.c
> ===================================================================
> --- gcc/tree-ssa-loop-ivopts.c	(revision 179708)
> +++ gcc/tree-ssa-loop-ivopts.c	(working copy)
> @@ -6278,64 +6278,6 @@ rewrite_use_nonlinear_expr (struct ivopts_data *da
>      }
>  }
>  
> -/* Copies the reference information from OLD_REF to NEW_REF.  */
> -
> -static void
> -copy_ref_info (tree new_ref, tree old_ref)
> -{
> -  tree new_ptr_base = NULL_TREE;
> -
> -  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
> -  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);
> -
> -  new_ptr_base = TREE_OPERAND (new_ref, 0);
> -
> -  /* We can transfer points-to information from an old pointer
> -     or decl base to the new one.  */
> -  if (new_ptr_base
> -      && TREE_CODE (new_ptr_base) == SSA_NAME
> -      && !SSA_NAME_PTR_INFO (new_ptr_base))
> -    {
> -      tree base = get_base_address (old_ref);
> -      if (!base)
> -	;
> -      else if ((TREE_CODE (base) == MEM_REF
> -		|| TREE_CODE (base) == TARGET_MEM_REF)
> -	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> -	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
> -	{
> -	  struct ptr_info_def *new_pi;
> -	  duplicate_ssa_name_ptr_info
> -	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
> -	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
> -	  /* We have to be careful about transfering alignment information.  */
> -	  if (TREE_CODE (old_ref) == MEM_REF
> -	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
> -		   && (TMR_INDEX2 (new_ref)
> -		       || (TMR_STEP (new_ref)
> -			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
> -			       < new_pi->align)))))
> -	    {
> -	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
> -						  mem_ref_offset (new_ref)).low;
> -	      new_pi->misalign &= (new_pi->align - 1);
> -	    }
> -	  else
> -	    {
> -	      new_pi->align = 1;
> -	      new_pi->misalign = 0;
> -	    }
> -	}
> -      else if (TREE_CODE (base) == VAR_DECL
> -	       || TREE_CODE (base) == PARM_DECL
> -	       || TREE_CODE (base) == RESULT_DECL)
> -	{
> -	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
> -	  pt_solution_set_var (&pi->pt, base);
> -	}
> -    }
> -}
> -
>  /* Performs a peephole optimization to reorder the iv update statement with
>     a mem ref to enable instruction combining in later phases. The mem ref uses
>     the iv value before the update, so the reordering transformation requires
> Index: gcc/tree-ssa-address.c
> ===================================================================
> --- gcc/tree-ssa-address.c	(revision 179708)
> +++ gcc/tree-ssa-address.c	(working copy)
> @@ -832,6 +832,64 @@ copy_mem_ref_info (tree to, tree from)
>    TREE_THIS_VOLATILE (to) = TREE_THIS_VOLATILE (from);
>  }
>  
> +/* Copies the reference information from OLD_REF to NEW_REF.  */

Please add something like "NEW_REF is supposed to be either a MEM_REF
or a TARGET_MEM_REF.".  You can checkin the patch moving this
function separately.

> +
> +void
> +copy_ref_info (tree new_ref, tree old_ref)
> +{
> +  tree new_ptr_base = NULL_TREE;
> +
> +  TREE_SIDE_EFFECTS (new_ref) = TREE_SIDE_EFFECTS (old_ref);
> +  TREE_THIS_VOLATILE (new_ref) = TREE_THIS_VOLATILE (old_ref);

this function misses to transfer TREE_THIS_NOTRAP which is supposed
to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
If you make the function generic please adjust it to at least do ...

> +  new_ptr_base = TREE_OPERAND (new_ref, 0);
> +
> +  /* We can transfer points-to information from an old pointer
> +     or decl base to the new one.  */
> +  if (new_ptr_base
> +      && TREE_CODE (new_ptr_base) == SSA_NAME
> +      && !SSA_NAME_PTR_INFO (new_ptr_base))
> +    {
> +      tree base = get_base_address (old_ref);
> +      if (!base)
> +	;
> +      else if ((TREE_CODE (base) == MEM_REF
> +		|| TREE_CODE (base) == TARGET_MEM_REF)
> +	       && TREE_CODE (TREE_OPERAND (base, 0)) == SSA_NAME
> +	       && SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)))
> +	{
> +	  struct ptr_info_def *new_pi;
> +	  duplicate_ssa_name_ptr_info
> +	    (new_ptr_base, SSA_NAME_PTR_INFO (TREE_OPERAND (base, 0)));
> +	  new_pi = SSA_NAME_PTR_INFO (new_ptr_base);
> +	  /* We have to be careful about transfering alignment information.  */
> +	  if (TREE_CODE (old_ref) == MEM_REF
> +	      && !(TREE_CODE (new_ref) == TARGET_MEM_REF
> +		   && (TMR_INDEX2 (new_ref)
> +		       || (TMR_STEP (new_ref)
> +			   && (TREE_INT_CST_LOW (TMR_STEP (new_ref))
> +			       < new_pi->align)))))
> +	    {
> +	      new_pi->misalign += double_int_sub (mem_ref_offset (old_ref),
> +						  mem_ref_offset (new_ref)).low;
> +	      new_pi->misalign &= (new_pi->align - 1);
> +	    }
> +	  else
> +	    {
> +	      new_pi->align = 1;
> +	      new_pi->misalign = 0;
> +	    }

...

          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);

> +	}
> +      else if (TREE_CODE (base) == VAR_DECL
> +	       || TREE_CODE (base) == PARM_DECL
> +	       || TREE_CODE (base) == RESULT_DECL)
> +	{
> +	  struct ptr_info_def *pi = get_ptr_info (new_ptr_base);
> +	  pt_solution_set_var (&pi->pt, base);
> +	}
> +    }
> +}
> +
>  /* Move constants in target_mem_ref REF to offset.  Returns the new target
>     mem ref if anything changes, NULL_TREE otherwise.  */
>  
> Index: gcc/tree-ssa-reassoc.c
> ===================================================================
> --- gcc/tree-ssa-reassoc.c	(revision 179708)
> +++ gcc/tree-ssa-reassoc.c	(working copy)
> @@ -2815,6 +2815,121 @@ break_up_subtract_bb (basic_block bb)
>      break_up_subtract_bb (son);
>  }
>  
> +/* Given BASE and OFFSET derived from *EXPR occurring in stmt *GSI,
> +   determine whether there is a profitable opportunity to restructure
> +   address arithmetic within BASE and OFFSET.  If so, produce such
> +   a restructuring and return it.  */
> +
> +static tree
> +restructure_base_and_offset (tree *expr, gimple_stmt_iterator *gsi,

You are not changing *expr, so don't pass it as reference.

> +			     tree base, tree offset, HOST_WIDE_INT bitpos)
> +{
> +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> +  double_int c, c1, c2, c3, c4;
> +
> +  /* Look for the following pattern:
> +
> +       base   = MEM_REF (T1, C1);
> +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +       bitpos = C4 * BITS_PER_UNIT
> +
> +     If found, convert it to:
> +
> +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                C1 + (C2 * C3) + C4)
> +   */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (TREE_OPERAND (base, 1)) != INTEGER_CST

operand 1 of a MEM_REF is always an INTEGER_CST.

> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = TREE_INT_CST (TREE_OPERAND (base, 1));

mem_ref_offset (base)

> +  c2 = TREE_INT_CST (TREE_OPERAND (mult_op0, 1));
> +  c3 = TREE_INT_CST (mult_op1);

tree_to_double_int ()

> +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);

You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

> +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> +
> +  if (!double_int_fits_in_shwi_p (c)
> +      || !double_int_fits_in_shwi_p (c3))
> +    return NULL_TREE;

I think those tests are not necessary if you use ...

> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> +			   build_int_cst (sizetype, double_int_to_shwi (c3)));

double_int_to_tree (sizetype, c3)

> +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> +					true, GSI_SAME_STMT);
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> +				       true, GSI_SAME_STMT);
> +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> +			 build_int_cst (offset_type, double_int_to_shwi (c)));

double_int_to_tree (offset_type, c)

Please delay gimplification to the caller, that way this function
solely operates on the trees returned from get_inner_reference.
Or are you concerned that fold might undo your association?

> +  return mem_ref;
> +}
> +
> +/* If *EXPR contains a memory reference, determine whether there is an
> +   opportunity to restructure the base and offset expressions for
> +   improved performance.  */
> +
> +static bool
> +restructure_mem_ref (tree *expr, gimple_stmt_iterator *gsi)
> +{
> +  enum tree_code code = TREE_CODE (*expr);
> +  tree base, offset, mem_ref;
> +  HOST_WIDE_INT bitsize, bitpos;
> +  enum machine_mode mode;
> +  int unsignedp, volatilep;
> +
> +  /* Only expressions that reference memory are of interest.  Bitfield
> +     references should eventually be lowered prior to this point and
> +     are not dealt with.  Skip BLKmode aggregates as well.  */
> +  if (! handled_component_p (*expr)
> +      || code == BIT_FIELD_REF
> +      || (code == COMPONENT_REF && DECL_BIT_FIELD (TREE_OPERAND (*expr, 1)))
> +      || TYPE_MODE (TREE_TYPE (*expr)) == BLKmode)
> +    return false;
> +
> +  /* Determine whether the expression can be represented with base and
> +     offset components.  */
> +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> +			      &unsignedp, &volatilep, false);
> +  if (!base || !offset)
> +    return false;
> +
> +  /* Look for a restructuring opportunity.  */
> +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> +					      offset, bitpos)) == NULL_TREE)
> +    return false;
> +
> +  /* Found one.  Record the replacement in the dump file and complete
> +     the replacement.  */
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +    {
> +      fprintf (dump_file, "\nOriginal_expr = ");
> +      print_generic_expr (dump_file, *expr, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\nmem_ref = ");
> +      print_generic_expr (dump_file, mem_ref, TDF_VOPS|TDF_MEMSYMS);
> +      fprintf (dump_file, "\n\n");
> +    }

Thus gimplify and add the statements here.

> +  copy_ref_info (mem_ref, *expr);
> +  *expr = mem_ref;
> +
> +  return true;
> +}
> +
>  /* Reassociate expressions in basic block BB and its post-dominator as
>     children.  */
>  
> @@ -2823,14 +2938,43 @@ reassociate_bb (basic_block bb)
>  {
>    gimple_stmt_iterator gsi;
>    basic_block son;
> +  bool in_loop = loop_depth (bb->loop_father) != 0;
>  
>    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
>      {
>        gimple stmt = gsi_stmt (gsi);
>  
> -      if (is_gimple_assign (stmt)
> -	  && !stmt_could_throw_p (stmt))
> +      /* During late reassociation only, look for restructuring
> +	 opportunities within an expression that references memory.
> +	 We only do this for blocks not contained in loops, since the
> +	 ivopts machinery does a good job on loop expressions, and we
> +	 don't want to interfere with other loop optimizations.  */

I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
which you don't handle.  Did you do any measurements what happens if
you enable it generally?

> +      /* TODO: This belongs more properly in a separate pass that
> +	 performs general strength reduction on straight-line code.
> +	 Eventually move this there.  */
> +      if (!first_pass_instance /* TODO: deprecated  */
> +	  && !in_loop
> +	  && gimple_vuse (stmt)
> +	  && gimple_assign_single_p (stmt))
>  	{
> +	  tree *lhs, *rhs;
> +	  if (gimple_vdef (stmt))
> +	    {
> +	      lhs = gimple_assign_lhs_ptr (stmt);
> +	      if (restructure_mem_ref (lhs, &gsi))
> +		update_stmt (stmt);
> +	    }
> +	  else if (gimple_vuse (stmt))

That will be always true.


> +	    {
> +	      rhs = gimple_assign_rhs1_ptr (stmt);
> +	      if (restructure_mem_ref (rhs, &gsi))
> +		update_stmt (stmt);
> +	    }
> +	}
> +
> +      else if (is_gimple_assign (stmt)
> +	       && !stmt_could_throw_p (stmt))
> +	{
>  	  tree lhs, rhs1, rhs2;
>  	  enum tree_code rhs_code = gimple_assign_rhs_code (stmt);

You verified the patch has no performance degradations on ppc64
for SPEC CPU, did you see any improvements?

The pattern matching is still very ad-hoc and doesn't consider
statements that feed the base address.  There is conceptually
no difference between p->a[n] and *(p + n * 4).  You placed this
lowering in reassoc to catch CSE opportunities with DOM, right?
Does RTL CSE not do it's job or is the transform undone by
fwprop before it gets a chance to do it?

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 12:15             ` Richard Guenther
@ 2011-10-11 13:57               ` Paolo Bonzini
  2011-10-11 14:15                 ` Paolo Bonzini
  2011-10-11 14:58               ` William J. Schmidt
  2011-10-11 21:34               ` Ian Lance Taylor
  2 siblings, 1 reply; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-11 13:57 UTC (permalink / raw)
  To: Richard Guenther; +Cc: William J. Schmidt, gcc-patches, bergner

On 10/11/2011 01:40 PM, Richard Guenther wrote:
> The pattern matching is still very ad-hoc and doesn't consider
> statements that feed the base address.  There is conceptually
> no difference between p->a[n] and *(p + n * 4).  You placed this
> lowering in reassoc to catch CSE opportunities with DOM, right?
> Does RTL CSE not do it's job or is the transform undone by
> fwprop before it gets a chance to do it?

The idea (with the patch CSE I sent William) is that fwprop will try to 
be as greedy as possible, within the bounds of addresses allowed by the 
target.  Then CSE will fix the cases where fwprop could only do half of 
the job (and hopefully, scheduling will produce good code even when CSE 
chooses to go back to simple addressing modes).

If I understand correctly, these reassociations are all working on 
addressing modes that the target does not support, hence they fall 
outside the scope of both fwprop and CSE.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 13:57               ` Paolo Bonzini
@ 2011-10-11 14:15                 ` Paolo Bonzini
  0 siblings, 0 replies; 40+ messages in thread
From: Paolo Bonzini @ 2011-10-11 14:15 UTC (permalink / raw)
  To: gcc-patches; +Cc: William J. Schmidt, gcc-patches, bergner

On 10/11/2011 01:40 PM, Richard Guenther wrote:
> The pattern matching is still very ad-hoc and doesn't consider
> statements that feed the base address.  There is conceptually
> no difference between p->a[n] and *(p + n * 4).  You placed this
> lowering in reassoc to catch CSE opportunities with DOM, right?
> Does RTL CSE not do it's job or is the transform undone by
> fwprop before it gets a chance to do it?

The idea (with the patch CSE I sent William) is that fwprop will try to 
be as greedy as possible, within the bounds of addresses allowed by the 
target.  Then CSE will fix the cases where fwprop could only do half of 
the job (and hopefully, scheduling will produce good code even when CSE 
chooses to go back to simple addressing modes).

If I understand correctly, these reassociations are all working on 
addressing modes that the target does not support, hence they fall 
outside the scope of both fwprop and CSE.

Paolo

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 12:15             ` Richard Guenther
  2011-10-11 13:57               ` Paolo Bonzini
@ 2011-10-11 14:58               ` William J. Schmidt
  2011-10-11 15:33                 ` William J. Schmidt
  2011-10-11 21:34               ` Ian Lance Taylor
  2 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-11 14:58 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

Hi Richard,

Thanks for the comments -- a few responses below.

On Tue, 2011-10-11 at 13:40 +0200, Richard Guenther wrote:
> On Sat, 8 Oct 2011, William J. Schmidt wrote:
> 

<snip>

> 
> > +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> 
> You don't verify that bitpos % BITS_PER_UNIT is zero anywhere.

I'll add a check in the caller.  I was thinking this was unnecessary
since I had excluded bitfield operations, but on reflection that may not
be sufficient.

<snip>

> 
> > +  mult_expr = force_gimple_operand_gsi (gsi, mult_expr, true, NULL,
> > +					true, GSI_SAME_STMT);
> > +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> > +  add_expr = force_gimple_operand_gsi (gsi, add_expr, true, NULL,
> > +				       true, GSI_SAME_STMT);
> > +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (*expr), add_expr,
> > +			 build_int_cst (offset_type, double_int_to_shwi (c)));
> 
> double_int_to_tree (offset_type, c)
> 
> Please delay gimplification to the caller, that way this function
> solely operates on the trees returned from get_inner_reference.
> Or are you concerned that fold might undo your association?

I'll try that.  I was just basing this on some suggestions you had made
earlier; I don't believe there is any problem with delaying it.

<snip>

> >  
> >    for (gsi = gsi_last_bb (bb); !gsi_end_p (gsi); gsi_prev (&gsi))
> >      {
> >        gimple stmt = gsi_stmt (gsi);
> >  
> > -      if (is_gimple_assign (stmt)
> > -	  && !stmt_could_throw_p (stmt))
> > +      /* During late reassociation only, look for restructuring
> > +	 opportunities within an expression that references memory.
> > +	 We only do this for blocks not contained in loops, since the
> > +	 ivopts machinery does a good job on loop expressions, and we
> > +	 don't want to interfere with other loop optimizations.  */
> 
> I'm not sure I buy this.  IVOPTs would have produced [TARGET_]MEM_REFs
> which you don't handle.  Did you do any measurements what happens if
> you enable it generally?

Actually I agree with you -- in an earlier iteration this was still
enabled for reassoc1 ahead of loop optimizations and was causing
degradations.  So long as it doesn't occur early it should be fine to do
everywhere now, and catch non-ivar cases in loops.

<snip>

> You verified the patch has no performance degradations on ppc64
> for SPEC CPU, did you see any improvements?
> 

Yes, a few in the 2-3% range.  Nothing stellar.

> The pattern matching is still very ad-hoc and doesn't consider
> statements that feed the base address.  There is conceptually
> no difference between p->a[n] and *(p + n * 4).

That's true.  Since we abandoned the general address-lowering approach,
this was aimed at the specific pattern that comes up frequently in
practice.  I would expect the *(p + n * 4) cases to be handled by the
general straight-line strength reduction, which is the correct long-term
approach.  (Cases like p->a[n], where the multiplication is not yet
explicit, will be a bit of a wart as part of strength reduction, too,
but that's still the right place for it eventually.)

>   You placed this
> lowering in reassoc to catch CSE opportunities with DOM, right?
> Does RTL CSE not do it's job or is the transform undone by
> fwprop before it gets a chance to do it?

I think with Paolo's suggested patch for RTL CSE, this could be moved
back to expand.  I will have to experiment with it again to make sure.
If so, that would certainly be my preference as well.

(Or having the whole problem just disappear might be my preference on
some days... :)

Thanks,
Bill

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 14:58               ` William J. Schmidt
@ 2011-10-11 15:33                 ` William J. Schmidt
  2011-10-18 14:43                   ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-11 15:33 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

On Tue, 2011-10-11 at 09:12 -0500, William J. Schmidt wrote:

> > The pattern matching is still very ad-hoc and doesn't consider
> > statements that feed the base address.  There is conceptually
> > no difference between p->a[n] and *(p + n * 4).
> 
> That's true.  Since we abandoned the general address-lowering approach,
> this was aimed at the specific pattern that comes up frequently in
> practice.  I would expect the *(p + n * 4) cases to be handled by the
> general straight-line strength reduction, which is the correct long-term
> approach.  (Cases like p->a[n], where the multiplication is not yet
> explicit, will be a bit of a wart as part of strength reduction, too,
> but that's still the right place for it eventually.)

Going through my notes, I do have some code for the *(p + n * 4) case
lying around from the last time I tried this in expand, so I'll try to
get this back in place (either in reassoc2 or expand, depending on how
the CSE works out).


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 12:15             ` Richard Guenther
  2011-10-11 13:57               ` Paolo Bonzini
  2011-10-11 14:58               ` William J. Schmidt
@ 2011-10-11 21:34               ` Ian Lance Taylor
  2011-10-11 21:38                 ` William J. Schmidt
  2011-10-12 11:28                 ` Richard Guenther
  2 siblings, 2 replies; 40+ messages in thread
From: Ian Lance Taylor @ 2011-10-11 21:34 UTC (permalink / raw)
  To: Richard Guenther; +Cc: William J. Schmidt, gcc-patches, bergner

On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther <rguenther@suse.de> wrote:

> this function misses to transfer TREE_THIS_NOTRAP which is supposed
> to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
> If you make the function generic please adjust it to at least do ...
> ...
>
>          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);

This line was indeed added to the patch as committed.  This appears to have
broken the build of libgo.  I now get this:

../../../gccgo3/libgo/go/image/png/writer.go: In function
‘png.writeIDATs.pN23_libgo_image.png.encoder’:
../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
marked for throw, but doesn’t
# .MEM_775 = VDEF <.MEM_774>
MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPR<struct
{
  uint8 * __values;
  int __count;
  int __capacity;
}>(GOTMP.495);

../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
marked for throw, but doesn’t
# .MEM_776 = VDEF <.MEM_775>
D.7574 = MEM[base: D.8325_1069, offset: 0B];

../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
error: verify_gimple failed
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://gcc.gnu.org/bugs.html> for instructions.


I have not yet done a full investigation, but it appears that this
function is now marking
a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
The new instruction is within an exception region, and the tree-cfg
checker insists that
instructions in exception region are permitted to trap.  It may be
that the ivopts pass
now requires TODO_cleanup_cfg, or it may be something more complicated.

You should be able to recreate the problem yourself by using
--enable-languages=go when you run configure.

Ian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 21:34               ` Ian Lance Taylor
@ 2011-10-11 21:38                 ` William J. Schmidt
  2011-10-12 11:28                 ` Richard Guenther
  1 sibling, 0 replies; 40+ messages in thread
From: William J. Schmidt @ 2011-10-11 21:38 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: Richard Guenther, gcc-patches, bergner

Thanks -- I will back that line out for now and investigate.  Regression
test was fine for the languages we normally build, but go and ada aren't
among those.  Sorry for the trouble!

On Tue, 2011-10-11 at 14:00 -0700, Ian Lance Taylor wrote:
> On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther <rguenther@suse.de> wrote:
> 
> > this function misses to transfer TREE_THIS_NOTRAP which is supposed
> > to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
> > If you make the function generic please adjust it to at least do ...
> > ...
> >
> >          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);
> 
> This line was indeed added to the patch as committed.  This appears to have
> broken the build of libgo.  I now get this:
> 
> ../../../gccgo3/libgo/go/image/png/writer.go: In function
> ‘png.writeIDATs.pN23_libgo_image.png.encoder’:
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesn’t
> # .MEM_775 = VDEF <.MEM_774>
> MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPR<struct
> {
>   uint8 * __values;
>   int __count;
>   int __capacity;
> }>(GOTMP.495);
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesn’t
> # .MEM_776 = VDEF <.MEM_775>
> D.7574 = MEM[base: D.8325_1069, offset: 0B];
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
> error: verify_gimple failed
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> 
> 
> I have not yet done a full investigation, but it appears that this
> function is now marking
> a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
> The new instruction is within an exception region, and the tree-cfg
> checker insists that
> instructions in exception region are permitted to trap.  It may be
> that the ivopts pass
> now requires TODO_cleanup_cfg, or it may be something more complicated.
> 
> You should be able to recreate the problem yourself by using
> --enable-languages=go when you run configure.
> 
> Ian
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 21:34               ` Ian Lance Taylor
  2011-10-11 21:38                 ` William J. Schmidt
@ 2011-10-12 11:28                 ` Richard Guenther
  2011-10-13  7:08                   ` Ian Lance Taylor
  1 sibling, 1 reply; 40+ messages in thread
From: Richard Guenther @ 2011-10-12 11:28 UTC (permalink / raw)
  To: Ian Lance Taylor; +Cc: William J. Schmidt, gcc-patches, bergner

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2252 bytes --]

On Tue, 11 Oct 2011, Ian Lance Taylor wrote:

> On Tue, Oct 11, 2011 at 4:40 AM, Richard Guenther <rguenther@suse.de> wrote:
> 
> > this function misses to transfer TREE_THIS_NOTRAP which is supposed
> > to be set on the base of old_ref or any contained ARRAY[_RANGE]_REF.
> > If you make the function generic please adjust it to at least do ...
> > ...
> >
> >          TREE_THIS_NOTRAP (new_ref) = TREE_THIS_NOTRAP (base);
> 
> This line was indeed added to the patch as committed.  This appears to have
> broken the build of libgo.  I now get this:
> 
> ../../../gccgo3/libgo/go/image/png/writer.go: In function
> ‘png.writeIDATs.pN23_libgo_image.png.encoder’:
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesnÂ’t
> # .MEM_775 = VDEF <.MEM_774>
> MEM[base: D.8326_1070, offset: 0B] = VIEW_CONVERT_EXPR<struct
> {
>   uint8 * __values;
>   int __count;
>   int __capacity;
> }>(GOTMP.495);
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: error: statement
> marked for throw, but doesnÂ’t
> # .MEM_776 = VDEF <.MEM_775>
> D.7574 = MEM[base: D.8325_1069, offset: 0B];
> 
> ../../../gccgo3/libgo/go/image/png/writer.go:403:1: internal compiler
> error: verify_gimple failed
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See <http://gcc.gnu.org/bugs.html> for instructions.
> 
> 
> I have not yet done a full investigation, but it appears that this
> function is now marking
> a newly created reference as TREE_THIS_NOTRAP, which it did not previously do.
> The new instruction is within an exception region, and the tree-cfg
> checker insists that
> instructions in exception region are permitted to trap.  It may be

No, the new stmt seems to have an associated EH region but doesn't
trap.  Now, the old stmt also didn't trap (TREE_THIS_NOTRAP (base) was
true), so the question is why did the checker not complain on it
or, why did it not have an EH region associated but the new stmt does.

> that the ivopts pass
> now requires TODO_cleanup_cfg, or it may be something more complicated.
>
> You should be able to recreate the problem yourself by using
> --enable-languages=go when you run configure.

I suppose you can create a C testcase?

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-12 11:28                 ` Richard Guenther
@ 2011-10-13  7:08                   ` Ian Lance Taylor
  0 siblings, 0 replies; 40+ messages in thread
From: Ian Lance Taylor @ 2011-10-13  7:08 UTC (permalink / raw)
  To: Richard Guenther; +Cc: William J. Schmidt, gcc-patches, bergner

Richard Guenther <rguenther@suse.de> writes:

>> You should be able to recreate the problem yourself by using
>> --enable-languages=go when you run configure.
>
> I suppose you can create a C testcase?

I don't know whether I can, as the Go frontend is generating
VIEW_CONVERT_EXPR here.  I will try to take another look, but not before
next week.

Ian

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-11 15:33                 ` William J. Schmidt
@ 2011-10-18 14:43                   ` William J. Schmidt
  2011-10-21  9:53                     ` Richard Guenther
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-18 14:43 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

Greetings,

Here is a new revision of the tree portions of this patch.  I moved the
pattern recognizer to expand, and added additional logic to look for the
same pattern in gimple form.  I added two more tests to verify the new
logic.

I didn't run into any problems with the RTL CSE phases.  I can't recall
for certain what caused me to abandon the expand version previously.
There may not have been good reason; too many versions to keep track of
and too many interruptions, I suppose.  In any case, I'm much happier
having this code in the expander.

Paolo's RTL logic for unpropagating the zero-offset case is not going to
work out as is.  It causes a number of performance degradations, which I
suspect are due to the pass reordering.  That's a separate issue,
though, and not needed for this patch.

Bootstrapped and regression-tested on powerpc64-linux.  SPEC cpu2000
shows a number of small improvements and no significant degradations.
SPEC cpu2006 testing is pending.

Thanks,
Bill


2011-10-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:

	PR rtl-optimization/46556
	* expr.c (tree-pretty-print.h): New include.
	(restructure_base_and_offset): New function.
	(restructure_mem_ref): New function.
	(expand_expr_real_1): In MEM_REF case, attempt restructure_mem_ref
	first.  In normal_inner_ref case, attempt restructure_base_and_offset
	first.
	* Makefile.in: Update dependences for expr.o.

gcc/testsuite:

	PR rtl-optimization/46556
	* gcc.dg/tree-ssa-pr46556-1.c: New test.
	* gcc.dg/tree-ssa-pr46556-2.c: Likewise.
	* gcc.dg/tree-ssa-pr46556-3.c: Likewise.
	* gcc.dg/tree-ssa-pr46556-4.c: Likewise.
	* gcc.dg/tree-ssa-pr46556-5.c: Likewise.


Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (*((int *)p + n*4), *((int *)p + 32 + n*4), *((int *)p + 16 + n*4));
+  if (n > 3)
+    {
+      foo (*((int *)p + n*4), *((int *)p + 32 + n*4), *((int *)p + 16 + n*4));
+      if (n > 12)
+	foo (*((int *)p + 16 + n*4), *((int *)p + n*4), *((int *)p + 32 + n*4));
+    }
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem:SI \\(plus:" 6 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-5.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-5.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-5.c	(revision 0)
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+/* { dg-skip-if "" { lp64 } { "*" } { "-m32" } } */
+/* { dg-skip-if "" { llp64 } { "*" } { "-m32" } } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (*((int *)p + n * 4),
+       *((int *)p + (n + 8) * 4),
+       *((int *)p + (n + 4) * 4));
+  if (n > 3)
+    {
+      foo (*((int *)p + n * 4),
+	   *((int *)p + (n + 8) * 4),
+	   *((int *)p + (n + 4) * 4));
+      if (n > 12)
+	foo (*((int *)p + (n + 4) * 4),
+	     *((int *)p + n * 4),
+	     *((int *)p + (n + 8) * 4));
+    }
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem:SI \\(plus:" 6 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 180138)
+++ gcc/expr.c	(working copy)
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ssaexpand.h"
 #include "target-globals.h"
 #include "params.h"
+#include "tree-pretty-print.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -7648,7 +7649,157 @@ expand_constructor (tree exp, rtx target, enum exp
   return target;
 }
 
+/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
+   there is a profitable opportunity to restructure address arithmetic
+   within BASE and OFFSET.  If so, produce such a restructuring and
+   return it.  */
+/* TODO: This belongs more properly in a separate pass that performs
+   general strength reduction on straight-line code.  Eventually move
+   this there.  */
 
+static tree
+restructure_base_and_offset (tree expr, tree base, tree offset,
+			     HOST_WIDE_INT bitpos)
+{
+  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
+  double_int c, c1, c2, c3, c4;
+
+  /* Look for the following pattern:
+
+       base   = MEM_REF (T1, C1);
+       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+       bitpos = C4 * BITS_PER_UNIT
+
+     If found, convert it to:
+
+       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                C1 + (C2 * C3) + C4)
+   */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = mem_ref_offset (base);
+  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
+  c3 = tree_to_double_int (mult_op1);
+  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
+  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
+			   double_int_to_tree (sizetype, c3));
+
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+
+  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
+			 double_int_to_tree (offset_type, c));
+  return mem_ref;
+}
+
+/* Look for a hidden array reference in memory reference EXP.  If found,
+   return a revised memory reference that restructures addressability
+   to combine constants.  */
+/* TODO: This belongs more properly in a separate pass that performs
+   general strength reduction on straight-line code.  Eventually move
+   this there.  */
+
+static tree
+restructure_mem_ref (tree exp)
+{
+  tree s2, s3, t1, t2, offset_type, mult_expr, add_expr;
+  tree s2_op0, s2_op1, s3_op0, s3_op1, result;
+  gimple s1_stmt, s2_stmt, s3_stmt;
+  double_int c1, c2, c3, c;
+
+  /* Currently we look for the following pattern, where each Si is an
+     SSA name, each Tj is an arbitrary tree, and each Ck is an integer
+     constant:
+
+       S3 = T2 + C2;
+       S2 = S3 * C3;
+       S1 = T1 ptr+ S2;
+       exp = MEM_REF (S1, C1);
+
+     This is rewritten as:
+
+       exp = MEM_REF (T1 ptr+ (T2 * C3)), C1 + C2 * C3);
+
+  */
+  if (TREE_CODE (TREE_OPERAND (exp, 0)) != SSA_NAME)
+    return NULL_TREE;
+
+  /* We don't use get_def_for_expr for S1 because TER doesn't forward
+     S1 in some situations where this transform is useful, such as
+     when S1 is the base of two MEM_REFs fitting the pattern.  */
+  s1_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0));
+  
+  if (!s1_stmt
+      || !is_gimple_assign (s1_stmt)
+      || gimple_assign_rhs_code (s1_stmt) != POINTER_PLUS_EXPR)
+    return NULL_TREE;
+
+  c1 = mem_ref_offset (exp);
+  t1 = gimple_assign_rhs1 (s1_stmt);
+  s2 = gimple_assign_rhs2 (s1_stmt);
+
+  if (TREE_CODE (s2) != SSA_NAME)
+    return NULL_TREE;
+
+  s2_stmt = get_def_for_expr (s2, MULT_EXPR);
+
+  if (!s2_stmt)
+    return NULL_TREE;
+
+  s2_op0 = gimple_assign_rhs1 (s2_stmt);
+  s2_op1 = gimple_assign_rhs2 (s2_stmt);
+
+  if (TREE_CODE (s2_op1) != INTEGER_CST)
+    return NULL_TREE;
+
+  s3 = s2_op0;
+  c3 = tree_to_double_int (s2_op1);
+  s3_stmt = get_def_for_expr (s3, PLUS_EXPR);
+
+  if (!s3_stmt)
+    return NULL_TREE;
+
+  s3_op0 = gimple_assign_rhs1 (s3_stmt);
+  s3_op1 = gimple_assign_rhs2 (s3_stmt);
+
+  if (TREE_CODE (s3_op1) != INTEGER_CST)
+    return NULL_TREE;
+
+  t2 = s3_op0;
+  c2 = tree_to_double_int (s3_op1);
+  c = double_int_add (c1, double_int_mul (c2, c3));
+
+  offset_type = TREE_TYPE (TREE_OPERAND (exp, 1));
+
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2,
+			   double_int_to_tree (sizetype, c3));
+
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+
+  result = build2 (MEM_REF, TREE_TYPE (exp), add_expr,
+		   double_int_to_tree (offset_type, c));
+
+  return result;
+}
+
+
 /* expand_expr: generate code for computing expression EXP.
    An rtx for the computed value is returned.  The value is never null.
    In the case of a void EXP, const0_rtx is returned.
@@ -9313,6 +9464,28 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 	gimple def_stmt;
 	enum insn_code icode;
 	unsigned align;
+	tree tem;
+
+	/* Attempt to restructure a hidden array reference, such
+	   as *(p + (n + k) * s).  This generates dead code, so
+	   require -O2.  */
+	if (optimize > 1
+	    && mode != BLKmode
+	    && ((tem = restructure_mem_ref (exp)) != NULL_TREE))
+	  {
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "\nRestructured memory reference:\n");
+		fprintf (dump_file, "  Original reference: ");
+		print_generic_expr (dump_file, exp, TDF_VOPS|TDF_MEMSYMS);
+		fprintf (dump_file, "\n  Replacement: ");
+		print_generic_expr (dump_file, tem, TDF_VOPS|TDF_MEMSYMS);
+	      }
+	    copy_ref_info (tem, exp);
+	    exp = tem;
+	    base = TREE_OPERAND (exp, 0);
+	  }
+
 	/* Handle expansion of non-aliased memory with non-BLKmode.  That
 	   might end up in a register.  */
 	if (TREE_CODE (base) == ADDR_EXPR)
@@ -9584,7 +9757,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
       {
 	enum machine_mode mode1, mode2;
 	HOST_WIDE_INT bitsize, bitpos;
-	tree offset;
+	tree offset, tem2;
 	int volatilep = 0, must_force_mem;
 	bool packedp = false;
 	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
@@ -9601,6 +9774,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
 	  packedp = true;
 
+	/* Attempt to restructure the base and offset to combine constants.
+	   Bitfield references should eventually be lowered prior to this
+	   point and are not dealt with.  Skip BLKmode aggregates as well.
+	   This generates dead code, so require -O2.  */
+	if (optimize > 1
+	    && code != BIT_FIELD_REF
+	    && (code != COMPONENT_REF 
+		|| !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
+	    && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
+	    && bitpos % BITS_PER_UNIT == 0
+	    && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
+		!= NULL_TREE))
+	  {
+	    copy_ref_info (tem2, exp);
+	    tem = tem2;
+	    /* The offset and bitpos were absorbed into the new MEM_REF, so 
+	       make sure we don't add them in again.  */
+	    offset = NULL_TREE;
+	    bitpos = 0;
+
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "\nRestructured inner reference:\n");
+		fprintf (dump_file, "  Original reference: ");
+		print_generic_expr (dump_file, exp, TDF_VOPS|TDF_MEMSYMS);
+		fprintf (dump_file, "\n  Replacement: ");
+		print_generic_expr (dump_file, tem, TDF_VOPS|TDF_MEMSYMS);
+	      }
+	  }
+
 	/* If TEM's type is a union of variable size, pass TARGET to the inner
 	   computation, since it will need a temporary and TARGET is known
 	   to have to do.  This occurs in unchecked conversion in Ada.  */
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 180138)
+++ gcc/Makefile.in	(working copy)
@@ -2926,7 +2926,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) coretypes.
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
-   $(PARAMS_H) $(COMMON_TARGET_H)
+   $(PARAMS_H) $(COMMON_TARGET_H) tree-pretty-print.h
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-18 14:43                   ` William J. Schmidt
@ 2011-10-21  9:53                     ` Richard Guenther
  2011-10-21 12:44                       ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Guenther @ 2011-10-21  9:53 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: Richard Guenther, gcc-patches, bergner

On Tue, Oct 18, 2011 at 4:14 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
> Greetings,
>
> Here is a new revision of the tree portions of this patch.  I moved the
> pattern recognizer to expand, and added additional logic to look for the
> same pattern in gimple form.  I added two more tests to verify the new
> logic.
>
> I didn't run into any problems with the RTL CSE phases.  I can't recall
> for certain what caused me to abandon the expand version previously.
> There may not have been good reason; too many versions to keep track of
> and too many interruptions, I suppose.  In any case, I'm much happier
> having this code in the expander.
>
> Paolo's RTL logic for unpropagating the zero-offset case is not going to
> work out as is.  It causes a number of performance degradations, which I
> suspect are due to the pass reordering.  That's a separate issue,
> though, and not needed for this patch.
>
> Bootstrapped and regression-tested on powerpc64-linux.  SPEC cpu2000
> shows a number of small improvements and no significant degradations.
> SPEC cpu2006 testing is pending.
>
> Thanks,
> Bill
>
>
> 2011-10-18  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
>
> gcc:
>
>        PR rtl-optimization/46556
>        * expr.c (tree-pretty-print.h): New include.
>        (restructure_base_and_offset): New function.
>        (restructure_mem_ref): New function.
>        (expand_expr_real_1): In MEM_REF case, attempt restructure_mem_ref
>        first.  In normal_inner_ref case, attempt restructure_base_and_offset
>        first.
>        * Makefile.in: Update dependences for expr.o.
>
> gcc/testsuite:
>
>        PR rtl-optimization/46556
>        * gcc.dg/tree-ssa-pr46556-1.c: New test.
>        * gcc.dg/tree-ssa-pr46556-2.c: Likewise.
>        * gcc.dg/tree-ssa-pr46556-3.c: Likewise.
>        * gcc.dg/tree-ssa-pr46556-4.c: Likewise.
>        * gcc.dg/tree-ssa-pr46556-5.c: Likewise.
>
>
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c   (revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c   (revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 12)
> +    foo (p->a[n], p->c[n], p->b[n]);
> +  else if (n > 3)
> +    foo (p->b[n], p->a[n], p->c[n]);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c   (revision 0)
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 3)
> +    {
> +      foo (p->a[n], p->c[n], p->b[n]);
> +      if (n > 12)
> +       foo (p->b[n], p->a[n], p->c[n]);
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-4.c   (revision 0)
> @@ -0,0 +1,28 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (*((int *)p + n*4), *((int *)p + 32 + n*4), *((int *)p + 16 + n*4));
> +  if (n > 3)
> +    {
> +      foo (*((int *)p + n*4), *((int *)p + 32 + n*4), *((int *)p + 16 + n*4));
> +      if (n > 12)
> +       foo (*((int *)p + 16 + n*4), *((int *)p + n*4), *((int *)p + 32 + n*4));
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-5.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-5.c   (revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-5.c   (revision 0)
> @@ -0,0 +1,36 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +/* { dg-skip-if "" { lp64 } { "*" } { "-m32" } } */
> +/* { dg-skip-if "" { llp64 } { "*" } { "-m32" } } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (*((int *)p + n * 4),
> +       *((int *)p + (n + 8) * 4),
> +       *((int *)p + (n + 4) * 4));
> +  if (n > 3)
> +    {
> +      foo (*((int *)p + n * 4),
> +          *((int *)p + (n + 8) * 4),
> +          *((int *)p + (n + 4) * 4));
> +      if (n > 12)
> +       foo (*((int *)p + (n + 4) * 4),
> +            *((int *)p + n * 4),
> +            *((int *)p + (n + 8) * 4));
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  (revision 180138)
> +++ gcc/expr.c  (working copy)
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ssaexpand.h"
>  #include "target-globals.h"
>  #include "params.h"
> +#include "tree-pretty-print.h"
>
>  /* Decide whether a function's arguments should be processed
>    from first to last or from last to first.
> @@ -7648,7 +7649,157 @@ expand_constructor (tree exp, rtx target, enum exp
>   return target;
>  }
>
> +/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
> +   there is a profitable opportunity to restructure address arithmetic
> +   within BASE and OFFSET.  If so, produce such a restructuring and
> +   return it.  */
> +/* TODO: This belongs more properly in a separate pass that performs
> +   general strength reduction on straight-line code.  Eventually move
> +   this there.  */
>
> +static tree
> +restructure_base_and_offset (tree expr, tree base, tree offset,
> +                            HOST_WIDE_INT bitpos)
> +{
> +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> +  double_int c, c1, c2, c3, c4;
> +
> +  /* Look for the following pattern:
> +
> +       base   = MEM_REF (T1, C1);
> +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +       bitpos = C4 * BITS_PER_UNIT
> +
> +     If found, convert it to:
> +
> +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                C1 + (C2 * C3) + C4)
> +   */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = mem_ref_offset (base);
> +  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
> +  c3 = tree_to_double_int (mult_op1);
> +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> +
> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2,
> +                          double_int_to_tree (sizetype, c3));
> +
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +
> +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
> +                        double_int_to_tree (offset_type, c));
> +  return mem_ref;
> +}
> +
> +/* Look for a hidden array reference in memory reference EXP.  If found,
> +   return a revised memory reference that restructures addressability
> +   to combine constants.  */
> +/* TODO: This belongs more properly in a separate pass that performs
> +   general strength reduction on straight-line code.  Eventually move
> +   this there.  */
> +
> +static tree
> +restructure_mem_ref (tree exp)
> +{
> +  tree s2, s3, t1, t2, offset_type, mult_expr, add_expr;
> +  tree s2_op0, s2_op1, s3_op0, s3_op1, result;
> +  gimple s1_stmt, s2_stmt, s3_stmt;
> +  double_int c1, c2, c3, c;
> +
> +  /* Currently we look for the following pattern, where each Si is an
> +     SSA name, each Tj is an arbitrary tree, and each Ck is an integer
> +     constant:
> +
> +       S3 = T2 + C2;
> +       S2 = S3 * C3;
> +       S1 = T1 ptr+ S2;
> +       exp = MEM_REF (S1, C1);
> +
> +     This is rewritten as:
> +
> +       exp = MEM_REF (T1 ptr+ (T2 * C3)), C1 + C2 * C3);
> +
> +  */
> +  if (TREE_CODE (TREE_OPERAND (exp, 0)) != SSA_NAME)
> +    return NULL_TREE;
> +
> +  /* We don't use get_def_for_expr for S1 because TER doesn't forward
> +     S1 in some situations where this transform is useful, such as
> +     when S1 is the base of two MEM_REFs fitting the pattern.  */
> +  s1_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0));

You can't do this - this will possibly generate wrong code.  You _do_
have to use get_def_for_expr.  Or do it when we are still in "true" SSA form...

Richard.

> +  if (!s1_stmt
> +      || !is_gimple_assign (s1_stmt)
> +      || gimple_assign_rhs_code (s1_stmt) != POINTER_PLUS_EXPR)
> +    return NULL_TREE;
> +
> +  c1 = mem_ref_offset (exp);
> +  t1 = gimple_assign_rhs1 (s1_stmt);
> +  s2 = gimple_assign_rhs2 (s1_stmt);
> +
> +  if (TREE_CODE (s2) != SSA_NAME)
> +    return NULL_TREE;
> +
> +  s2_stmt = get_def_for_expr (s2, MULT_EXPR);
> +
> +  if (!s2_stmt)
> +    return NULL_TREE;
> +
> +  s2_op0 = gimple_assign_rhs1 (s2_stmt);
> +  s2_op1 = gimple_assign_rhs2 (s2_stmt);
> +
> +  if (TREE_CODE (s2_op1) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  s3 = s2_op0;
> +  c3 = tree_to_double_int (s2_op1);
> +  s3_stmt = get_def_for_expr (s3, PLUS_EXPR);
> +
> +  if (!s3_stmt)
> +    return NULL_TREE;
> +
> +  s3_op0 = gimple_assign_rhs1 (s3_stmt);
> +  s3_op1 = gimple_assign_rhs2 (s3_stmt);
> +
> +  if (TREE_CODE (s3_op1) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t2 = s3_op0;
> +  c2 = tree_to_double_int (s3_op1);
> +  c = double_int_add (c1, double_int_mul (c2, c3));
> +
> +  offset_type = TREE_TYPE (TREE_OPERAND (exp, 1));
> +
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2,
> +                          double_int_to_tree (sizetype, c3));
> +
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +
> +  result = build2 (MEM_REF, TREE_TYPE (exp), add_expr,
> +                  double_int_to_tree (offset_type, c));
> +
> +  return result;
> +}
> +
> +
>  /* expand_expr: generate code for computing expression EXP.
>    An rtx for the computed value is returned.  The value is never null.
>    In the case of a void EXP, const0_rtx is returned.
> @@ -9313,6 +9464,28 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>        gimple def_stmt;
>        enum insn_code icode;
>        unsigned align;
> +       tree tem;
> +
> +       /* Attempt to restructure a hidden array reference, such
> +          as *(p + (n + k) * s).  This generates dead code, so
> +          require -O2.  */
> +       if (optimize > 1
> +           && mode != BLKmode
> +           && ((tem = restructure_mem_ref (exp)) != NULL_TREE))
> +         {
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "\nRestructured memory reference:\n");
> +               fprintf (dump_file, "  Original reference: ");
> +               print_generic_expr (dump_file, exp, TDF_VOPS|TDF_MEMSYMS);
> +               fprintf (dump_file, "\n  Replacement: ");
> +               print_generic_expr (dump_file, tem, TDF_VOPS|TDF_MEMSYMS);
> +             }
> +           copy_ref_info (tem, exp);
> +           exp = tem;
> +           base = TREE_OPERAND (exp, 0);
> +         }
> +
>        /* Handle expansion of non-aliased memory with non-BLKmode.  That
>           might end up in a register.  */
>        if (TREE_CODE (base) == ADDR_EXPR)
> @@ -9584,7 +9757,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>       {
>        enum machine_mode mode1, mode2;
>        HOST_WIDE_INT bitsize, bitpos;
> -       tree offset;
> +       tree offset, tem2;
>        int volatilep = 0, must_force_mem;
>        bool packedp = false;
>        tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
> @@ -9601,6 +9774,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>                && DECL_PACKED (TREE_OPERAND (exp, 1))))
>          packedp = true;
>
> +       /* Attempt to restructure the base and offset to combine constants.
> +          Bitfield references should eventually be lowered prior to this
> +          point and are not dealt with.  Skip BLKmode aggregates as well.
> +          This generates dead code, so require -O2.  */
> +       if (optimize > 1
> +           && code != BIT_FIELD_REF
> +           && (code != COMPONENT_REF
> +               || !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
> +           && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
> +           && bitpos % BITS_PER_UNIT == 0
> +           && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
> +               != NULL_TREE))
> +         {
> +           copy_ref_info (tem2, exp);
> +           tem = tem2;
> +           /* The offset and bitpos were absorbed into the new MEM_REF, so
> +              make sure we don't add them in again.  */
> +           offset = NULL_TREE;
> +           bitpos = 0;
> +
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "\nRestructured inner reference:\n");
> +               fprintf (dump_file, "  Original reference: ");
> +               print_generic_expr (dump_file, exp, TDF_VOPS|TDF_MEMSYMS);
> +               fprintf (dump_file, "\n  Replacement: ");
> +               print_generic_expr (dump_file, tem, TDF_VOPS|TDF_MEMSYMS);
> +             }
> +         }
> +
>        /* If TEM's type is a union of variable size, pass TARGET to the inner
>           computation, since it will need a temporary and TARGET is known
>           to have to do.  This occurs in unchecked conversion in Ada.  */
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in     (revision 180138)
> +++ gcc/Makefile.in     (working copy)
> @@ -2926,7 +2926,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) coretypes.
>    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
>    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
>    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
> -   $(PARAMS_H) $(COMMON_TARGET_H)
> +   $(PARAMS_H) $(COMMON_TARGET_H) tree-pretty-print.h
>  dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
>    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
>    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
>
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-21  9:53                     ` Richard Guenther
@ 2011-10-21 12:44                       ` William J. Schmidt
  2011-10-23 10:19                         ` Richard Guenther
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-21 12:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Richard Guenther, gcc-patches, bergner



On Fri, 2011-10-21 at 11:26 +0200, Richard Guenther wrote:
> On Tue, Oct 18, 2011 at 4:14 PM, William J. Schmidt
> <wschmidt@linux.vnet.ibm.com> wrote:

<snip>

> > +
> > +  /* We don't use get_def_for_expr for S1 because TER doesn't forward
> > +     S1 in some situations where this transform is useful, such as
> > +     when S1 is the base of two MEM_REFs fitting the pattern.  */
> > +  s1_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0));
> 
> You can't do this - this will possibly generate wrong code.  You _do_
> have to use get_def_for_expr.  Or do it when we are still in "true" SSA form...
> 
> Richard.
> 

OK.  get_def_for_expr always returns NULL here for the cases I was
targeting, so doing this in expand isn't going to be helpful.

Rather than cram this in somewhere else upstream, it might be better to
just wait and let this case be handled by the new strength reduction
pass.  This is one of the easy cases with explicit multiplies in the
instruction stream, so it shouldn't require any special handling there.
Seem reasonable?

Bill

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-21 12:44                       ` William J. Schmidt
@ 2011-10-23 10:19                         ` Richard Guenther
  2011-10-24 14:10                           ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Guenther @ 2011-10-23 10:19 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: Richard Guenther, gcc-patches, bergner

On Fri, Oct 21, 2011 at 2:22 PM, William J. Schmidt
<wschmidt@linux.vnet.ibm.com> wrote:
>
>
> On Fri, 2011-10-21 at 11:26 +0200, Richard Guenther wrote:
>> On Tue, Oct 18, 2011 at 4:14 PM, William J. Schmidt
>> <wschmidt@linux.vnet.ibm.com> wrote:
>
> <snip>
>
>> > +
>> > +  /* We don't use get_def_for_expr for S1 because TER doesn't forward
>> > +     S1 in some situations where this transform is useful, such as
>> > +     when S1 is the base of two MEM_REFs fitting the pattern.  */
>> > +  s1_stmt = SSA_NAME_DEF_STMT (TREE_OPERAND (exp, 0));
>>
>> You can't do this - this will possibly generate wrong code.  You _do_
>> have to use get_def_for_expr.  Or do it when we are still in "true" SSA form...
>>
>> Richard.
>>
>
> OK.  get_def_for_expr always returns NULL here for the cases I was
> targeting, so doing this in expand isn't going to be helpful.
>
> Rather than cram this in somewhere else upstream, it might be better to
> just wait and let this case be handled by the new strength reduction
> pass.  This is one of the easy cases with explicit multiplies in the
> instruction stream, so it shouldn't require any special handling there.
> Seem reasonable?

Yes, sure.

Richard.

> Bill
>
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-23 10:19                         ` Richard Guenther
@ 2011-10-24 14:10                           ` William J. Schmidt
  2011-10-30 18:44                             ` [PING] " William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-24 14:10 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

OK, I've removed the pointer-arithmetic case from expand, to be handled
later by straight-line strength reduction.  Here's the patch to deal
with just the specific pattern of PR46556 (which will also eventually be
handled by strength reduction, but not as quickly).

(FYI, I've been thinking through the strength reduction pass, and my
plan is to stage in some of the easiest cases first, hopefully for 4.7,
and gradually add the more complex pieces.  Explicit multiplies in the
IL with known constants can be done pretty easily.  More complexity is
added when the multiplier is a variable, when conditional increments are
present, and when multiplies are hidden in addressing expressions.)

The present patch was bootstrapped and regression-tested on
powerpc64-linux.  OK for trunk?

Thanks,
Bill


2011-10-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>

gcc:

	PR rtl-optimization/46556
	* expr.c (restructure_base_and_offset): New function.
	(expand_expr_real_1): Replace result of get_inner_reference
	with result of restructure_base_and_offset when applicable.
	* Makefile.in (expr.o): Update dependencies.

gcc/testsuite:

	PR rtl-optimization/46556
	* gcc.dg/tree-ssa-pr46556-1.c: New testcase.
	* gcc.dg/tree-ssa-pr46556-2.c: Likewise.
	* gcc.dg/tree-ssa-pr46556-3.c: Likewise.


Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 12)
+    foo (p->a[n], p->c[n], p->b[n]);
+  else if (n > 3)
+    foo (p->b[n], p->a[n], p->c[n]);
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-expand" } */
+struct x
+{
+  int a[16];
+  int b[16];
+  int c[16];
+};
+
+extern void foo (int, int, int);
+
+void
+f (struct x *p, unsigned int n)
+{
+  foo (p->a[n], p->c[n], p->b[n]);
+  if (n > 3)
+    {
+      foo (p->a[n], p->c[n], p->b[n]);
+      if (n > 12)
+	foo (p->b[n], p->a[n], p->c[n]);
+    }
+}
+
+/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
+/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
+/* { dg-final { cleanup-rtl-dump "expand" } } */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	(revision 180378)
+++ gcc/expr.c	(working copy)
@@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "ssaexpand.h"
 #include "target-globals.h"
 #include "params.h"
+#include "tree-pretty-print.h"
 
 /* Decide whether a function's arguments should be processed
    from first to last or from last to first.
@@ -7648,7 +7649,66 @@ expand_constructor (tree exp, rtx target, enum exp
   return target;
 }
 
+/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
+   there is a profitable opportunity to restructure address arithmetic
+   within BASE and OFFSET.  If so, produce such a restructuring and
+   return it.  */
+/* TODO: This belongs more properly in a separate pass that performs
+   general strength reduction on straight-line code.  Eventually move
+   this there.  */
 
+static tree
+restructure_base_and_offset (tree expr, tree base, tree offset,
+			     HOST_WIDE_INT bitpos)
+{
+  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
+  double_int c, c1, c2, c3, c4;
+
+  /* Look for the following pattern:
+
+       base   = MEM_REF (T1, C1);
+       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
+       bitpos = C4 * BITS_PER_UNIT
+
+     If found, convert it to:
+
+       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
+                C1 + (C2 * C3) + C4)
+   */
+  if (!base
+      || !offset
+      || TREE_CODE (base) != MEM_REF
+      || TREE_CODE (offset) != MULT_EXPR)
+    return NULL_TREE;
+
+  mult_op0 = TREE_OPERAND (offset, 0);
+  mult_op1 = TREE_OPERAND (offset, 1);
+
+  if (TREE_CODE (mult_op0) != PLUS_EXPR
+      || TREE_CODE (mult_op1) != INTEGER_CST
+      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
+    return NULL_TREE;
+
+  t1 = TREE_OPERAND (base, 0);
+  t2 = TREE_OPERAND (mult_op0, 0);
+  c1 = mem_ref_offset (base);
+  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
+  c3 = tree_to_double_int (mult_op1);
+  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
+  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
+
+  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
+
+  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
+			   double_int_to_tree (sizetype, c3));
+
+  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
+
+  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
+			 double_int_to_tree (offset_type, c));
+  return mem_ref;
+}
+
 /* expand_expr: generate code for computing expression EXP.
    An rtx for the computed value is returned.  The value is never null.
    In the case of a void EXP, const0_rtx is returned.
@@ -9584,7 +9644,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
       {
 	enum machine_mode mode1, mode2;
 	HOST_WIDE_INT bitsize, bitpos;
-	tree offset;
+	tree offset, tem2;
 	int volatilep = 0, must_force_mem;
 	bool packedp = false;
 	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
@@ -9601,6 +9661,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
 		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
 	  packedp = true;
 
+	/* Attempt to restructure the base and offset to combine constants.
+	   Bitfield references should eventually be lowered prior to this
+	   point and are not dealt with.  Skip BLKmode aggregates as well.
+	   This generates dead code, so require -O2.  */
+	if (optimize > 1
+	    && code != BIT_FIELD_REF
+	    && (code != COMPONENT_REF 
+		|| !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
+	    && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
+	    && bitpos % BITS_PER_UNIT == 0
+	    && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
+		!= NULL_TREE))
+	  {
+	    copy_ref_info (tem2, exp);
+	    tem = tem2;
+	    /* The offset and bitpos were absorbed into the new MEM_REF, so 
+	       make sure we don't add them in again.  */
+	    offset = NULL_TREE;
+	    bitpos = 0;
+
+	    if (dump_file && (dump_flags & TDF_DETAILS))
+	      {
+		fprintf (dump_file, "\nRestructured inner reference:\n");
+		fprintf (dump_file, "  Original reference: ");
+		print_generic_expr (dump_file, exp, TDF_VOPS|TDF_MEMSYMS);
+		fprintf (dump_file, "\n  Replacement: ");
+		print_generic_expr (dump_file, tem, TDF_VOPS|TDF_MEMSYMS);
+	      }
+	  }
+
 	/* If TEM's type is a union of variable size, pass TARGET to the inner
 	   computation, since it will need a temporary and TARGET is known
 	   to have to do.  This occurs in unchecked conversion in Ada.  */
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 180378)
+++ gcc/Makefile.in	(working copy)
@@ -2926,7 +2926,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) coretypes.
    reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
    tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
    $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
-   $(PARAMS_H) $(COMMON_TARGET_H)
+   $(PARAMS_H) $(COMMON_TARGET_H) tree-pretty-print.h
 dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
    $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
    langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PING] Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-24 14:10                           ` William J. Schmidt
@ 2011-10-30 18:44                             ` William J. Schmidt
  2011-11-02 12:19                               ` Richard Guenther
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-10-30 18:44 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

Ping.

On Mon, 2011-10-24 at 08:38 -0500, William J. Schmidt wrote:
> OK, I've removed the pointer-arithmetic case from expand, to be handled
> later by straight-line strength reduction.  Here's the patch to deal
> with just the specific pattern of PR46556 (which will also eventually be
> handled by strength reduction, but not as quickly).
> 
> (FYI, I've been thinking through the strength reduction pass, and my
> plan is to stage in some of the easiest cases first, hopefully for 4.7,
> and gradually add the more complex pieces.  Explicit multiplies in the
> IL with known constants can be done pretty easily.  More complexity is
> added when the multiplier is a variable, when conditional increments are
> present, and when multiplies are hidden in addressing expressions.)
> 
> The present patch was bootstrapped and regression-tested on
> powerpc64-linux.  OK for trunk?
> 
> Thanks,
> Bill
> 
> 
> 2011-10-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> 
> gcc:
> 
> 	PR rtl-optimization/46556
> 	* expr.c (restructure_base_and_offset): New function.
> 	(expand_expr_real_1): Replace result of get_inner_reference
> 	with result of restructure_base_and_offset when applicable.
> 	* Makefile.in (expr.o): Update dependencies.
> 
> gcc/testsuite:
> 
> 	PR rtl-optimization/46556
> 	* gcc.dg/tree-ssa-pr46556-1.c: New testcase.
> 	* gcc.dg/tree-ssa-pr46556-2.c: Likewise.
> 	* gcc.dg/tree-ssa-pr46556-3.c: Likewise.
> 
> 
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> @@ -0,0 +1,22 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 12)
> +    foo (p->a[n], p->c[n], p->b[n]);
> +  else if (n > 3)
> +    foo (p->b[n], p->a[n], p->c[n]);
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> ===================================================================
> --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> @@ -0,0 +1,27 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-expand" } */
> +struct x
> +{
> +  int a[16];
> +  int b[16];
> +  int c[16];
> +};
> +
> +extern void foo (int, int, int);
> +
> +void
> +f (struct x *p, unsigned int n)
> +{
> +  foo (p->a[n], p->c[n], p->b[n]);
> +  if (n > 3)
> +    {
> +      foo (p->a[n], p->c[n], p->b[n]);
> +      if (n > 12)
> +	foo (p->b[n], p->a[n], p->c[n]);
> +    }
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> +/* { dg-final { cleanup-rtl-dump "expand" } } */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c	(revision 180378)
> +++ gcc/expr.c	(working copy)
> @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "ssaexpand.h"
>  #include "target-globals.h"
>  #include "params.h"
> +#include "tree-pretty-print.h"
> 
>  /* Decide whether a function's arguments should be processed
>     from first to last or from last to first.
> @@ -7648,7 +7649,66 @@ expand_constructor (tree exp, rtx target, enum exp
>    return target;
>  }
> 
> +/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
> +   there is a profitable opportunity to restructure address arithmetic
> +   within BASE and OFFSET.  If so, produce such a restructuring and
> +   return it.  */
> +/* TODO: This belongs more properly in a separate pass that performs
> +   general strength reduction on straight-line code.  Eventually move
> +   this there.  */
> 
> +static tree
> +restructure_base_and_offset (tree expr, tree base, tree offset,
> +			     HOST_WIDE_INT bitpos)
> +{
> +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> +  double_int c, c1, c2, c3, c4;
> +
> +  /* Look for the following pattern:
> +
> +       base   = MEM_REF (T1, C1);
> +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> +       bitpos = C4 * BITS_PER_UNIT
> +
> +     If found, convert it to:
> +
> +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> +                C1 + (C2 * C3) + C4)
> +   */
> +  if (!base
> +      || !offset
> +      || TREE_CODE (base) != MEM_REF
> +      || TREE_CODE (offset) != MULT_EXPR)
> +    return NULL_TREE;
> +
> +  mult_op0 = TREE_OPERAND (offset, 0);
> +  mult_op1 = TREE_OPERAND (offset, 1);
> +
> +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> +      || TREE_CODE (mult_op1) != INTEGER_CST
> +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> +    return NULL_TREE;
> +
> +  t1 = TREE_OPERAND (base, 0);
> +  t2 = TREE_OPERAND (mult_op0, 0);
> +  c1 = mem_ref_offset (base);
> +  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
> +  c3 = tree_to_double_int (mult_op1);
> +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> +
> +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> +
> +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> +			   double_int_to_tree (sizetype, c3));
> +
> +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> +
> +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
> +			 double_int_to_tree (offset_type, c));
> +  return mem_ref;
> +}
> +
>  /* expand_expr: generate code for computing expression EXP.
>     An rtx for the computed value is returned.  The value is never null.
>     In the case of a void EXP, const0_rtx is returned.
> @@ -9584,7 +9644,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>        {
>  	enum machine_mode mode1, mode2;
>  	HOST_WIDE_INT bitsize, bitpos;
> -	tree offset;
> +	tree offset, tem2;
>  	int volatilep = 0, must_force_mem;
>  	bool packedp = false;
>  	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
> @@ -9601,6 +9661,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
>  		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
>  	  packedp = true;
> 
> +	/* Attempt to restructure the base and offset to combine constants.
> +	   Bitfield references should eventually be lowered prior to this
> +	   point and are not dealt with.  Skip BLKmode aggregates as well.
> +	   This generates dead code, so require -O2.  */
> +	if (optimize > 1
> +	    && code != BIT_FIELD_REF
> +	    && (code != COMPONENT_REF 
> +		|| !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
> +	    && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
> +	    && bitpos % BITS_PER_UNIT == 0
> +	    && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
> +		!= NULL_TREE))
> +	  {
> +	    copy_ref_info (tem2, exp);
> +	    tem = tem2;
> +	    /* The offset and bitpos were absorbed into the new MEM_REF, so 
> +	       make sure we don't add them in again.  */
> +	    offset = NULL_TREE;
> +	    bitpos = 0;
> +
> +	    if (dump_file && (dump_flags & TDF_DETAILS))
> +	      {
> +		fprintf (dump_file, "\nRestructured inner reference:\n");
> +		fprintf (dump_file, "  Original reference: ");
> +		print_generic_expr (dump_file, exp, TDF_VOPS|TDF_MEMSYMS);
> +		fprintf (dump_file, "\n  Replacement: ");
> +		print_generic_expr (dump_file, tem, TDF_VOPS|TDF_MEMSYMS);
> +	      }
> +	  }
> +
>  	/* If TEM's type is a union of variable size, pass TARGET to the inner
>  	   computation, since it will need a temporary and TARGET is known
>  	   to have to do.  This occurs in unchecked conversion in Ada.  */
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in	(revision 180378)
> +++ gcc/Makefile.in	(working copy)
> @@ -2926,7 +2926,7 @@ expr.o : expr.c $(CONFIG_H) $(SYSTEM_H) coretypes.
>     reload.h langhooks.h intl.h $(TM_P_H) $(TARGET_H) \
>     tree-iterator.h gt-expr.h $(MACHMODE_H) $(TIMEVAR_H) $(TREE_FLOW_H) \
>     $(TREE_PASS_H) $(DF_H) $(DIAGNOSTIC_H) vecprim.h $(SSAEXPAND_H) \
> -   $(PARAMS_H) $(COMMON_TARGET_H)
> +   $(PARAMS_H) $(COMMON_TARGET_H) tree-pretty-print.h
>  dojump.o : dojump.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) $(TREE_H) \
>     $(FLAGS_H) $(FUNCTION_H) $(EXPR_H) $(OPTABS_H) $(INSN_ATTR_H) insn-config.h \
>     langhooks.h $(GGC_H) gt-dojump.h vecprim.h $(BASIC_BLOCK_H) output.h
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PING] Re: [PATCH] Fix PR46556 (poor address generation)
  2011-10-30 18:44                             ` [PING] " William J. Schmidt
@ 2011-11-02 12:19                               ` Richard Guenther
  2011-11-02 12:31                                 ` William J. Schmidt
  0 siblings, 1 reply; 40+ messages in thread
From: Richard Guenther @ 2011-11-02 12:19 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner

On Sun, 30 Oct 2011, William J. Schmidt wrote:

> Ping.
> 
> On Mon, 2011-10-24 at 08:38 -0500, William J. Schmidt wrote:
> > OK, I've removed the pointer-arithmetic case from expand, to be handled
> > later by straight-line strength reduction.  Here's the patch to deal
> > with just the specific pattern of PR46556 (which will also eventually be
> > handled by strength reduction, but not as quickly).
> > 
> > (FYI, I've been thinking through the strength reduction pass, and my
> > plan is to stage in some of the easiest cases first, hopefully for 4.7,
> > and gradually add the more complex pieces.  Explicit multiplies in the
> > IL with known constants can be done pretty easily.  More complexity is
> > added when the multiplier is a variable, when conditional increments are
> > present, and when multiplies are hidden in addressing expressions.)
> > 
> > The present patch was bootstrapped and regression-tested on
> > powerpc64-linux.  OK for trunk?

Hmmm ...

> > Thanks,
> > Bill
> > 
> > 
> > 2011-10-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> > 
> > gcc:
> > 
> > 	PR rtl-optimization/46556
> > 	* expr.c (restructure_base_and_offset): New function.
> > 	(expand_expr_real_1): Replace result of get_inner_reference
> > 	with result of restructure_base_and_offset when applicable.
> > 	* Makefile.in (expr.o): Update dependencies.
> > 
> > gcc/testsuite:
> > 
> > 	PR rtl-optimization/46556
> > 	* gcc.dg/tree-ssa-pr46556-1.c: New testcase.
> > 	* gcc.dg/tree-ssa-pr46556-2.c: Likewise.
> > 	* gcc.dg/tree-ssa-pr46556-3.c: Likewise.
> > 
> > 
> > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> > @@ -0,0 +1,22 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > +
> > +struct x
> > +{
> > +  int a[16];
> > +  int b[16];
> > +  int c[16];
> > +};
> > +
> > +extern void foo (int, int, int);
> > +
> > +void
> > +f (struct x *p, unsigned int n)
> > +{
> > +  foo (p->a[n], p->c[n], p->b[n]);
> > +}
> > +
> > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
> > +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
> > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
> > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> > @@ -0,0 +1,26 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > +
> > +struct x
> > +{
> > +  int a[16];
> > +  int b[16];
> > +  int c[16];
> > +};
> > +
> > +extern void foo (int, int, int);
> > +
> > +void
> > +f (struct x *p, unsigned int n)
> > +{
> > +  foo (p->a[n], p->c[n], p->b[n]);
> > +  if (n > 12)
> > +    foo (p->a[n], p->c[n], p->b[n]);
> > +  else if (n > 3)
> > +    foo (p->b[n], p->a[n], p->c[n]);
> > +}
> > +
> > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> > +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> > ===================================================================
> > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> > @@ -0,0 +1,27 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > +struct x
> > +{
> > +  int a[16];
> > +  int b[16];
> > +  int c[16];
> > +};
> > +
> > +extern void foo (int, int, int);
> > +
> > +void
> > +f (struct x *p, unsigned int n)
> > +{
> > +  foo (p->a[n], p->c[n], p->b[n]);
> > +  if (n > 3)
> > +    {
> > +      foo (p->a[n], p->c[n], p->b[n]);
> > +      if (n > 12)
> > +	foo (p->b[n], p->a[n], p->c[n]);
> > +    }
> > +}
> > +
> > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> > +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > Index: gcc/expr.c
> > ===================================================================
> > --- gcc/expr.c	(revision 180378)
> > +++ gcc/expr.c	(working copy)
> > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> >  #include "ssaexpand.h"
> >  #include "target-globals.h"
> >  #include "params.h"
> > +#include "tree-pretty-print.h"
> > 
> >  /* Decide whether a function's arguments should be processed
> >     from first to last or from last to first.
> > @@ -7648,7 +7649,66 @@ expand_constructor (tree exp, rtx target, enum exp
> >    return target;
> >  }
> > 
> > +/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
> > +   there is a profitable opportunity to restructure address arithmetic
> > +   within BASE and OFFSET.  If so, produce such a restructuring and
> > +   return it.  */
> > +/* TODO: This belongs more properly in a separate pass that performs
> > +   general strength reduction on straight-line code.  Eventually move
> > +   this there.  */
> > 
> > +static tree
> > +restructure_base_and_offset (tree expr, tree base, tree offset,
> > +			     HOST_WIDE_INT bitpos)
> > +{
> > +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> > +  double_int c, c1, c2, c3, c4;
> > +
> > +  /* Look for the following pattern:
> > +
> > +       base   = MEM_REF (T1, C1);
> > +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> > +       bitpos = C4 * BITS_PER_UNIT
> > +
> > +     If found, convert it to:
> > +
> > +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> > +                C1 + (C2 * C3) + C4)
> > +   */
> > +  if (!base
> > +      || !offset
> > +      || TREE_CODE (base) != MEM_REF
> > +      || TREE_CODE (offset) != MULT_EXPR)
> > +    return NULL_TREE;
> > +
> > +  mult_op0 = TREE_OPERAND (offset, 0);
> > +  mult_op1 = TREE_OPERAND (offset, 1);
> > +
> > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > +      || TREE_CODE (mult_op1) != INTEGER_CST
> > +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > +    return NULL_TREE;
> > +
> > +  t1 = TREE_OPERAND (base, 0);
> > +  t2 = TREE_OPERAND (mult_op0, 0);
> > +  c1 = mem_ref_offset (base);
> > +  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
> > +  c3 = tree_to_double_int (mult_op1);
> > +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> > +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> > +
> > +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> > +
> > +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> > +			   double_int_to_tree (sizetype, c3));
> > +
> > +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> > +
> > +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
> > +			 double_int_to_tree (offset_type, c));
> > +  return mem_ref;
> > +}
> > +
> >  /* expand_expr: generate code for computing expression EXP.
> >     An rtx for the computed value is returned.  The value is never null.
> >     In the case of a void EXP, const0_rtx is returned.
> > @@ -9584,7 +9644,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
> >        {
> >  	enum machine_mode mode1, mode2;
> >  	HOST_WIDE_INT bitsize, bitpos;
> > -	tree offset;
> > +	tree offset, tem2;
> >  	int volatilep = 0, must_force_mem;
> >  	bool packedp = false;
> >  	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
> > @@ -9601,6 +9661,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
> >  		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
> >  	  packedp = true;
> > 
> > +	/* Attempt to restructure the base and offset to combine constants.
> > +	   Bitfield references should eventually be lowered prior to this
> > +	   point and are not dealt with.  Skip BLKmode aggregates as well.
> > +	   This generates dead code, so require -O2.  */
> > +	if (optimize > 1
> > +	    && code != BIT_FIELD_REF
> > +	    && (code != COMPONENT_REF 
> > +		|| !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
> > +	    && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
> > +	    && bitpos % BITS_PER_UNIT == 0
> > +	    && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
> > +		!= NULL_TREE))
> > +	  {
> > +	    copy_ref_info (tem2, exp);
> > +	    tem = tem2;
> > +	    /* The offset and bitpos were absorbed into the new MEM_REF, so 
> > +	       make sure we don't add them in again.  */
> > +	    offset = NULL_TREE;
> > +	    bitpos = 0;

I'm not sure it's a good idea to do that.  Why not instead just
transform the offset and bitpos parts?  Otherwise you are going
to lose all subsequent logic that tries to optimize offsetted
base addressing.

Thus, instead of

> > +       base   = MEM_REF (T1, C1);
> > +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> > +       bitpos = C4 * BITS_PER_UNIT
> > +  
> > +     If found, convert it to:
> > +  
> > +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> > +                C1 + (C2 * C3) + C4)

convert it to

   base = MEM_REF (T1, 0);
   offset = MULT_EXPR (T2, C3);
   bitpos = C1 + (C2 * C3) + C4;

?

Also note that with only handling this here you fail to handle
stores (see expand_assignment, or just grep for the places that
handle MEM_REF).

If we want to handle this during expansion we probably want to
factor out a routine that handles address-generation for all
of the cases.

Sorry for pushing back again ... this area of GCC is "interesting" ;)
And yes, the current code is a mess - well accumulated 25 years of
"optimization" ... thus changing this area is not to be taken easily.

Thanks,
Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PING] Re: [PATCH] Fix PR46556 (poor address generation)
  2011-11-02 12:19                               ` Richard Guenther
@ 2011-11-02 12:31                                 ` William J. Schmidt
  2011-11-02 14:26                                   ` Richard Guenther
  0 siblings, 1 reply; 40+ messages in thread
From: William J. Schmidt @ 2011-11-02 12:31 UTC (permalink / raw)
  To: Richard Guenther; +Cc: gcc-patches, bergner

On Wed, 2011-11-02 at 12:55 +0100, Richard Guenther wrote:
> On Sun, 30 Oct 2011, William J. Schmidt wrote:
> 
> > Ping.
> > 
> > On Mon, 2011-10-24 at 08:38 -0500, William J. Schmidt wrote:
> > > OK, I've removed the pointer-arithmetic case from expand, to be handled
> > > later by straight-line strength reduction.  Here's the patch to deal
> > > with just the specific pattern of PR46556 (which will also eventually be
> > > handled by strength reduction, but not as quickly).
> > > 
> > > (FYI, I've been thinking through the strength reduction pass, and my
> > > plan is to stage in some of the easiest cases first, hopefully for 4.7,
> > > and gradually add the more complex pieces.  Explicit multiplies in the
> > > IL with known constants can be done pretty easily.  More complexity is
> > > added when the multiplier is a variable, when conditional increments are
> > > present, and when multiplies are hidden in addressing expressions.)
> > > 
> > > The present patch was bootstrapped and regression-tested on
> > > powerpc64-linux.  OK for trunk?
> 
> Hmmm ...
> 
> > > Thanks,
> > > Bill
> > > 
> > > 
> > > 2011-10-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> > > 
> > > gcc:
> > > 
> > > 	PR rtl-optimization/46556
> > > 	* expr.c (restructure_base_and_offset): New function.
> > > 	(expand_expr_real_1): Replace result of get_inner_reference
> > > 	with result of restructure_base_and_offset when applicable.
> > > 	* Makefile.in (expr.o): Update dependencies.
> > > 
> > > gcc/testsuite:
> > > 
> > > 	PR rtl-optimization/46556
> > > 	* gcc.dg/tree-ssa-pr46556-1.c: New testcase.
> > > 	* gcc.dg/tree-ssa-pr46556-2.c: Likewise.
> > > 	* gcc.dg/tree-ssa-pr46556-3.c: Likewise.
> > > 
> > > 
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> > > @@ -0,0 +1,22 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > > +
> > > +struct x
> > > +{
> > > +  int a[16];
> > > +  int b[16];
> > > +  int c[16];
> > > +};
> > > +
> > > +extern void foo (int, int, int);
> > > +
> > > +void
> > > +f (struct x *p, unsigned int n)
> > > +{
> > > +  foo (p->a[n], p->c[n], p->b[n]);
> > > +}
> > > +
> > > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
> > > +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
> > > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
> > > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > > +
> > > +struct x
> > > +{
> > > +  int a[16];
> > > +  int b[16];
> > > +  int c[16];
> > > +};
> > > +
> > > +extern void foo (int, int, int);
> > > +
> > > +void
> > > +f (struct x *p, unsigned int n)
> > > +{
> > > +  foo (p->a[n], p->c[n], p->b[n]);
> > > +  if (n > 12)
> > > +    foo (p->a[n], p->c[n], p->b[n]);
> > > +  else if (n > 3)
> > > +    foo (p->b[n], p->a[n], p->c[n]);
> > > +}
> > > +
> > > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> > > +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> > > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> > > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> > > ===================================================================
> > > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> > > @@ -0,0 +1,27 @@
> > > +/* { dg-do compile } */
> > > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > > +struct x
> > > +{
> > > +  int a[16];
> > > +  int b[16];
> > > +  int c[16];
> > > +};
> > > +
> > > +extern void foo (int, int, int);
> > > +
> > > +void
> > > +f (struct x *p, unsigned int n)
> > > +{
> > > +  foo (p->a[n], p->c[n], p->b[n]);
> > > +  if (n > 3)
> > > +    {
> > > +      foo (p->a[n], p->c[n], p->b[n]);
> > > +      if (n > 12)
> > > +	foo (p->b[n], p->a[n], p->c[n]);
> > > +    }
> > > +}
> > > +
> > > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> > > +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> > > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> > > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > > Index: gcc/expr.c
> > > ===================================================================
> > > --- gcc/expr.c	(revision 180378)
> > > +++ gcc/expr.c	(working copy)
> > > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #include "ssaexpand.h"
> > >  #include "target-globals.h"
> > >  #include "params.h"
> > > +#include "tree-pretty-print.h"
> > > 
> > >  /* Decide whether a function's arguments should be processed
> > >     from first to last or from last to first.
> > > @@ -7648,7 +7649,66 @@ expand_constructor (tree exp, rtx target, enum exp
> > >    return target;
> > >  }
> > > 
> > > +/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
> > > +   there is a profitable opportunity to restructure address arithmetic
> > > +   within BASE and OFFSET.  If so, produce such a restructuring and
> > > +   return it.  */
> > > +/* TODO: This belongs more properly in a separate pass that performs
> > > +   general strength reduction on straight-line code.  Eventually move
> > > +   this there.  */
> > > 
> > > +static tree
> > > +restructure_base_and_offset (tree expr, tree base, tree offset,
> > > +			     HOST_WIDE_INT bitpos)
> > > +{
> > > +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> > > +  double_int c, c1, c2, c3, c4;
> > > +
> > > +  /* Look for the following pattern:
> > > +
> > > +       base   = MEM_REF (T1, C1);
> > > +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> > > +       bitpos = C4 * BITS_PER_UNIT
> > > +
> > > +     If found, convert it to:
> > > +
> > > +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> > > +                C1 + (C2 * C3) + C4)
> > > +   */
> > > +  if (!base
> > > +      || !offset
> > > +      || TREE_CODE (base) != MEM_REF
> > > +      || TREE_CODE (offset) != MULT_EXPR)
> > > +    return NULL_TREE;
> > > +
> > > +  mult_op0 = TREE_OPERAND (offset, 0);
> > > +  mult_op1 = TREE_OPERAND (offset, 1);
> > > +
> > > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > > +      || TREE_CODE (mult_op1) != INTEGER_CST
> > > +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > > +    return NULL_TREE;
> > > +
> > > +  t1 = TREE_OPERAND (base, 0);
> > > +  t2 = TREE_OPERAND (mult_op0, 0);
> > > +  c1 = mem_ref_offset (base);
> > > +  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
> > > +  c3 = tree_to_double_int (mult_op1);
> > > +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> > > +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> > > +
> > > +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> > > +
> > > +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> > > +			   double_int_to_tree (sizetype, c3));
> > > +
> > > +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> > > +
> > > +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
> > > +			 double_int_to_tree (offset_type, c));
> > > +  return mem_ref;
> > > +}
> > > +
> > >  /* expand_expr: generate code for computing expression EXP.
> > >     An rtx for the computed value is returned.  The value is never null.
> > >     In the case of a void EXP, const0_rtx is returned.
> > > @@ -9584,7 +9644,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
> > >        {
> > >  	enum machine_mode mode1, mode2;
> > >  	HOST_WIDE_INT bitsize, bitpos;
> > > -	tree offset;
> > > +	tree offset, tem2;
> > >  	int volatilep = 0, must_force_mem;
> > >  	bool packedp = false;
> > >  	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
> > > @@ -9601,6 +9661,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
> > >  		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
> > >  	  packedp = true;
> > > 
> > > +	/* Attempt to restructure the base and offset to combine constants.
> > > +	   Bitfield references should eventually be lowered prior to this
> > > +	   point and are not dealt with.  Skip BLKmode aggregates as well.
> > > +	   This generates dead code, so require -O2.  */
> > > +	if (optimize > 1
> > > +	    && code != BIT_FIELD_REF
> > > +	    && (code != COMPONENT_REF 
> > > +		|| !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
> > > +	    && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
> > > +	    && bitpos % BITS_PER_UNIT == 0
> > > +	    && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
> > > +		!= NULL_TREE))
> > > +	  {
> > > +	    copy_ref_info (tem2, exp);
> > > +	    tem = tem2;
> > > +	    /* The offset and bitpos were absorbed into the new MEM_REF, so 
> > > +	       make sure we don't add them in again.  */
> > > +	    offset = NULL_TREE;
> > > +	    bitpos = 0;
> 
> I'm not sure it's a good idea to do that.  Why not instead just
> transform the offset and bitpos parts?  Otherwise you are going
> to lose all subsequent logic that tries to optimize offsetted
> base addressing.
> 
> Thus, instead of
> 
> > > +       base   = MEM_REF (T1, C1);
> > > +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> > > +       bitpos = C4 * BITS_PER_UNIT
> > > +  
> > > +     If found, convert it to:
> > > +  
> > > +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> > > +                C1 + (C2 * C3) + C4)
> 
> convert it to
> 
>    base = MEM_REF (T1, 0);
>    offset = MULT_EXPR (T2, C3);
>    bitpos = C1 + (C2 * C3) + C4;
> 
> ?

Hm.  I'll have to play with that to be sure it still solves our original
problem after all the downstream logic kicks in...  

> 
> Also note that with only handling this here you fail to handle
> stores (see expand_assignment, or just grep for the places that
> handle MEM_REF).

Bleah.  OK.  Expand is a messy beast -- you had pointed me at this spot
a while back and it looked on the surface like it handled both...sigh.

> 
> If we want to handle this during expansion we probably want to
> factor out a routine that handles address-generation for all
> of the cases.

Well, doing it during expand was intended to be an "easy" temporary
solution for 4.7 until we can do it "right" in the tree phases as part
of strength reduction.  The "easy" part seems to have gone into
hiding. ;)  If you feel the strength reduction pass is generally on the
right track, it may be best to concentrate on getting that completed
rather than keep pulling on loose threads in expand.  Let me know your
preference.

> 
> Sorry for pushing back again ... this area of GCC is "interesting" ;)
> And yes, the current code is a mess - well accumulated 25 years of
> "optimization" ... thus changing this area is not to be taken easily.
> 

Yep, I'm all for getting it right.  Thanks for the review...

Bill

> Thanks,
> Richard.
> 


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PING] Re: [PATCH] Fix PR46556 (poor address generation)
  2011-11-02 12:31                                 ` William J. Schmidt
@ 2011-11-02 14:26                                   ` Richard Guenther
  0 siblings, 0 replies; 40+ messages in thread
From: Richard Guenther @ 2011-11-02 14:26 UTC (permalink / raw)
  To: William J. Schmidt; +Cc: gcc-patches, bergner

On Wed, 2 Nov 2011, William J. Schmidt wrote:

> On Wed, 2011-11-02 at 12:55 +0100, Richard Guenther wrote:
> > On Sun, 30 Oct 2011, William J. Schmidt wrote:
> > 
> > > Ping.
> > > 
> > > On Mon, 2011-10-24 at 08:38 -0500, William J. Schmidt wrote:
> > > > OK, I've removed the pointer-arithmetic case from expand, to be handled
> > > > later by straight-line strength reduction.  Here's the patch to deal
> > > > with just the specific pattern of PR46556 (which will also eventually be
> > > > handled by strength reduction, but not as quickly).
> > > > 
> > > > (FYI, I've been thinking through the strength reduction pass, and my
> > > > plan is to stage in some of the easiest cases first, hopefully for 4.7,
> > > > and gradually add the more complex pieces.  Explicit multiplies in the
> > > > IL with known constants can be done pretty easily.  More complexity is
> > > > added when the multiplier is a variable, when conditional increments are
> > > > present, and when multiplies are hidden in addressing expressions.)
> > > > 
> > > > The present patch was bootstrapped and regression-tested on
> > > > powerpc64-linux.  OK for trunk?
> > 
> > Hmmm ...
> > 
> > > > Thanks,
> > > > Bill
> > > > 
> > > > 
> > > > 2011-10-24  Bill Schmidt  <wschmidt@linux.vnet.ibm.com>
> > > > 
> > > > gcc:
> > > > 
> > > > 	PR rtl-optimization/46556
> > > > 	* expr.c (restructure_base_and_offset): New function.
> > > > 	(expand_expr_real_1): Replace result of get_inner_reference
> > > > 	with result of restructure_base_and_offset when applicable.
> > > > 	* Makefile.in (expr.o): Update dependencies.
> > > > 
> > > > gcc/testsuite:
> > > > 
> > > > 	PR rtl-optimization/46556
> > > > 	* gcc.dg/tree-ssa-pr46556-1.c: New testcase.
> > > > 	* gcc.dg/tree-ssa-pr46556-2.c: Likewise.
> > > > 	* gcc.dg/tree-ssa-pr46556-3.c: Likewise.
> > > > 
> > > > 
> > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c
> > > > ===================================================================
> > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-1.c	(revision 0)
> > > > @@ -0,0 +1,22 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > > > +
> > > > +struct x
> > > > +{
> > > > +  int a[16];
> > > > +  int b[16];
> > > > +  int c[16];
> > > > +};
> > > > +
> > > > +extern void foo (int, int, int);
> > > > +
> > > > +void
> > > > +f (struct x *p, unsigned int n)
> > > > +{
> > > > +  foo (p->a[n], p->c[n], p->b[n]);
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 2 "expand" } } */
> > > > +/* { dg-final { scan-rtl-dump-times "const_int 128" 1 "expand" } } */
> > > > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 1 "expand" } } */
> > > > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c
> > > > ===================================================================
> > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-2.c	(revision 0)
> > > > @@ -0,0 +1,26 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > > > +
> > > > +struct x
> > > > +{
> > > > +  int a[16];
> > > > +  int b[16];
> > > > +  int c[16];
> > > > +};
> > > > +
> > > > +extern void foo (int, int, int);
> > > > +
> > > > +void
> > > > +f (struct x *p, unsigned int n)
> > > > +{
> > > > +  foo (p->a[n], p->c[n], p->b[n]);
> > > > +  if (n > 12)
> > > > +    foo (p->a[n], p->c[n], p->b[n]);
> > > > +  else if (n > 3)
> > > > +    foo (p->b[n], p->a[n], p->c[n]);
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> > > > +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> > > > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> > > > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > > > Index: gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c
> > > > ===================================================================
> > > > --- gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> > > > +++ gcc/testsuite/gcc.dg/tree-ssa/pr46556-3.c	(revision 0)
> > > > @@ -0,0 +1,27 @@
> > > > +/* { dg-do compile } */
> > > > +/* { dg-options "-O2 -fdump-rtl-expand" } */
> > > > +struct x
> > > > +{
> > > > +  int a[16];
> > > > +  int b[16];
> > > > +  int c[16];
> > > > +};
> > > > +
> > > > +extern void foo (int, int, int);
> > > > +
> > > > +void
> > > > +f (struct x *p, unsigned int n)
> > > > +{
> > > > +  foo (p->a[n], p->c[n], p->b[n]);
> > > > +  if (n > 3)
> > > > +    {
> > > > +      foo (p->a[n], p->c[n], p->b[n]);
> > > > +      if (n > 12)
> > > > +	foo (p->b[n], p->a[n], p->c[n]);
> > > > +    }
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-rtl-dump-times "\\(mem/s:SI \\(plus:" 6 "expand" } } */
> > > > +/* { dg-final { scan-rtl-dump-times "const_int 128" 3 "expand" } } */
> > > > +/* { dg-final { scan-rtl-dump-times "const_int 64 \\\[0x40\\\]\\)\\) \\\[" 3 "expand" } } */
> > > > +/* { dg-final { cleanup-rtl-dump "expand" } } */
> > > > Index: gcc/expr.c
> > > > ===================================================================
> > > > --- gcc/expr.c	(revision 180378)
> > > > +++ gcc/expr.c	(working copy)
> > > > @@ -56,6 +56,7 @@ along with GCC; see the file COPYING3.  If not see
> > > >  #include "ssaexpand.h"
> > > >  #include "target-globals.h"
> > > >  #include "params.h"
> > > > +#include "tree-pretty-print.h"
> > > > 
> > > >  /* Decide whether a function's arguments should be processed
> > > >     from first to last or from last to first.
> > > > @@ -7648,7 +7649,66 @@ expand_constructor (tree exp, rtx target, enum exp
> > > >    return target;
> > > >  }
> > > > 
> > > > +/* Given BASE, OFFSET, and BITPOS derived from EXPR, determine whether
> > > > +   there is a profitable opportunity to restructure address arithmetic
> > > > +   within BASE and OFFSET.  If so, produce such a restructuring and
> > > > +   return it.  */
> > > > +/* TODO: This belongs more properly in a separate pass that performs
> > > > +   general strength reduction on straight-line code.  Eventually move
> > > > +   this there.  */
> > > > 
> > > > +static tree
> > > > +restructure_base_and_offset (tree expr, tree base, tree offset,
> > > > +			     HOST_WIDE_INT bitpos)
> > > > +{
> > > > +  tree mult_op0, mult_op1, t1, t2, offset_type, mult_expr, add_expr, mem_ref;
> > > > +  double_int c, c1, c2, c3, c4;
> > > > +
> > > > +  /* Look for the following pattern:
> > > > +
> > > > +       base   = MEM_REF (T1, C1);
> > > > +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> > > > +       bitpos = C4 * BITS_PER_UNIT
> > > > +
> > > > +     If found, convert it to:
> > > > +
> > > > +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> > > > +                C1 + (C2 * C3) + C4)
> > > > +   */
> > > > +  if (!base
> > > > +      || !offset
> > > > +      || TREE_CODE (base) != MEM_REF
> > > > +      || TREE_CODE (offset) != MULT_EXPR)
> > > > +    return NULL_TREE;
> > > > +
> > > > +  mult_op0 = TREE_OPERAND (offset, 0);
> > > > +  mult_op1 = TREE_OPERAND (offset, 1);
> > > > +
> > > > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > > > +      || TREE_CODE (mult_op1) != INTEGER_CST
> > > > +      || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > > > +    return NULL_TREE;
> > > > +
> > > > +  t1 = TREE_OPERAND (base, 0);
> > > > +  t2 = TREE_OPERAND (mult_op0, 0);
> > > > +  c1 = mem_ref_offset (base);
> > > > +  c2 = tree_to_double_int (TREE_OPERAND (mult_op0, 1));
> > > > +  c3 = tree_to_double_int (mult_op1);
> > > > +  c4 = uhwi_to_double_int (bitpos / BITS_PER_UNIT);
> > > > +  c = double_int_add (double_int_add (c1, double_int_mul (c2, c3)), c4);
> > > > +
> > > > +  offset_type = TREE_TYPE (TREE_OPERAND (base, 1));
> > > > +
> > > > +  mult_expr = fold_build2 (MULT_EXPR, sizetype, t2, 
> > > > +			   double_int_to_tree (sizetype, c3));
> > > > +
> > > > +  add_expr = fold_build2 (POINTER_PLUS_EXPR, TREE_TYPE (t1), t1, mult_expr);
> > > > +
> > > > +  mem_ref = fold_build2 (MEM_REF, TREE_TYPE (expr), add_expr,
> > > > +			 double_int_to_tree (offset_type, c));
> > > > +  return mem_ref;
> > > > +}
> > > > +
> > > >  /* expand_expr: generate code for computing expression EXP.
> > > >     An rtx for the computed value is returned.  The value is never null.
> > > >     In the case of a void EXP, const0_rtx is returned.
> > > > @@ -9584,7 +9644,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
> > > >        {
> > > >  	enum machine_mode mode1, mode2;
> > > >  	HOST_WIDE_INT bitsize, bitpos;
> > > > -	tree offset;
> > > > +	tree offset, tem2;
> > > >  	int volatilep = 0, must_force_mem;
> > > >  	bool packedp = false;
> > > >  	tree tem = get_inner_reference (exp, &bitsize, &bitpos, &offset,
> > > > @@ -9601,6 +9661,36 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
> > > >  		&& DECL_PACKED (TREE_OPERAND (exp, 1))))
> > > >  	  packedp = true;
> > > > 
> > > > +	/* Attempt to restructure the base and offset to combine constants.
> > > > +	   Bitfield references should eventually be lowered prior to this
> > > > +	   point and are not dealt with.  Skip BLKmode aggregates as well.
> > > > +	   This generates dead code, so require -O2.  */
> > > > +	if (optimize > 1
> > > > +	    && code != BIT_FIELD_REF
> > > > +	    && (code != COMPONENT_REF 
> > > > +		|| !DECL_BIT_FIELD (TREE_OPERAND (exp, 1)))
> > > > +	    && TYPE_MODE (TREE_TYPE (exp)) != BLKmode
> > > > +	    && bitpos % BITS_PER_UNIT == 0
> > > > +	    && ((tem2 = restructure_base_and_offset (exp, tem, offset, bitpos))
> > > > +		!= NULL_TREE))
> > > > +	  {
> > > > +	    copy_ref_info (tem2, exp);
> > > > +	    tem = tem2;
> > > > +	    /* The offset and bitpos were absorbed into the new MEM_REF, so 
> > > > +	       make sure we don't add them in again.  */
> > > > +	    offset = NULL_TREE;
> > > > +	    bitpos = 0;
> > 
> > I'm not sure it's a good idea to do that.  Why not instead just
> > transform the offset and bitpos parts?  Otherwise you are going
> > to lose all subsequent logic that tries to optimize offsetted
> > base addressing.
> > 
> > Thus, instead of
> > 
> > > > +       base   = MEM_REF (T1, C1);
> > > > +       offset = MULT_EXPR (PLUS_EXPR (T2, C2), C3)
> > > > +       bitpos = C4 * BITS_PER_UNIT
> > > > +  
> > > > +     If found, convert it to:
> > > > +  
> > > > +       MEM_REF (POINTER_PLUS_EXPR (T1, MULT_EXPR (T2, C3)),
> > > > +                C1 + (C2 * C3) + C4)
> > 
> > convert it to
> > 
> >    base = MEM_REF (T1, 0);
> >    offset = MULT_EXPR (T2, C3);
> >    bitpos = C1 + (C2 * C3) + C4;
> > 
> > ?
> 
> Hm.  I'll have to play with that to be sure it still solves our original
> problem after all the downstream logic kicks in...  
> 
> > 
> > Also note that with only handling this here you fail to handle
> > stores (see expand_assignment, or just grep for the places that
> > handle MEM_REF).
> 
> Bleah.  OK.  Expand is a messy beast -- you had pointed me at this spot
> a while back and it looked on the surface like it handled both...sigh.
> 
> > 
> > If we want to handle this during expansion we probably want to
> > factor out a routine that handles address-generation for all
> > of the cases.
> 
> Well, doing it during expand was intended to be an "easy" temporary
> solution for 4.7 until we can do it "right" in the tree phases as part
> of strength reduction.  The "easy" part seems to have gone into
> hiding. ;)  If you feel the strength reduction pass is generally on the
> right track, it may be best to concentrate on getting that completed
> rather than keep pulling on loose threads in expand.  Let me know your
> preference.

Yeah, strength reduction looks like a better approach to me.  After
all an easy temporary solution is likely going to regress for someone
(esp. touching that area...).

I've been hoping for others to chime in with opinions, but appearantly
folks are too busy... ;)

Richard.

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2011-11-02 14:14 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-05 16:16 [PATCH] Fix PR46556 (poor address generation) William J. Schmidt
2011-10-05 16:24 ` Paolo Bonzini
2011-10-05 16:28   ` Paolo Bonzini
2011-10-05 17:48   ` William J. Schmidt
2011-10-05 19:50     ` Paolo Bonzini
2011-10-05 21:04       ` William J. Schmidt
2011-10-06  8:57         ` Paolo Bonzini
2011-10-06 13:23           ` William J. Schmidt
2011-10-05 16:41 ` Steven Bosscher
2011-10-05 17:19   ` William J. Schmidt
2011-10-06 10:21 ` Richard Guenther
2011-10-06 13:49   ` William J. Schmidt
2011-10-06 14:24     ` Richard Guenther
2011-10-06 17:27       ` William J. Schmidt
2011-10-07  8:28         ` Richard Guenther
2011-10-07  9:45           ` Paolo Bonzini
2011-10-07 12:38             ` William J. Schmidt
2011-10-08 15:44           ` William J. Schmidt
2011-10-11 12:15             ` Richard Guenther
2011-10-11 13:57               ` Paolo Bonzini
2011-10-11 14:15                 ` Paolo Bonzini
2011-10-11 14:58               ` William J. Schmidt
2011-10-11 15:33                 ` William J. Schmidt
2011-10-18 14:43                   ` William J. Schmidt
2011-10-21  9:53                     ` Richard Guenther
2011-10-21 12:44                       ` William J. Schmidt
2011-10-23 10:19                         ` Richard Guenther
2011-10-24 14:10                           ` William J. Schmidt
2011-10-30 18:44                             ` [PING] " William J. Schmidt
2011-11-02 12:19                               ` Richard Guenther
2011-11-02 12:31                                 ` William J. Schmidt
2011-11-02 14:26                                   ` Richard Guenther
2011-10-11 21:34               ` Ian Lance Taylor
2011-10-11 21:38                 ` William J. Schmidt
2011-10-12 11:28                 ` Richard Guenther
2011-10-13  7:08                   ` Ian Lance Taylor
2011-10-06 17:47   ` Jeff Law
2011-10-06 18:04     ` William J. Schmidt
2011-10-06 18:15       ` Jeff Law
2011-10-07  8:53     ` Richard Guenther

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).