public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH: PR/40314] extract common address for memory access to fields   of large struct
@ 2009-05-31  9:31 Carrot Wei
  2009-06-01  8:13 ` Steven Bosscher
  0 siblings, 1 reply; 8+ messages in thread
From: Carrot Wei @ 2009-05-31  9:31 UTC (permalink / raw)
  To: gcc-patches

This patch is to implement the optimization suggested by:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40314

It adds a new pass extract_common_addr to detect cases of memory accesses to
fields of large struct, add insns to compute a new base address which is
nearer these fields, then modify memory accesses to use the new base and
offset. Details can be found in the comment of extract-common-addr.c.

The optimization is enabled to ARM only in this patch. It can also be
applicable to other architectures which have addressing mode of
(base + offset) and offset has a limited value range. This optimization can
be enabled by providing an implementation of targetm.get_address_offset_range.

ChangeLog:
2009-05-31  Carrot Wei  <carrot@google.com>

        PR/40314
        * config/arm/arm.c: add functions arm_address_offset_range,
        thumb1_address_offset_range, thumb2_address_offset_range,
        arm_address_offset_range_1.
        * config/arm/arm-protos.h: add prototype of arm_address_offset_range.
        * doc/tm.texi: add entry for TARGET_ADDRESS_OFFSET_RANGE.
        * Makefile.in: add rule for extract_common_addr.
        * target.h: add a hook function get_address_offset_range.
        * target-def.h: add a hook function TARGET_ADDRESS_OFFSET_RANGE.
        * tree-pass.h: add a pass_extract_common_addr.
        * passes.c: add a pass_extract_common_addr.
        * extract-common-addr.c: implementation of the new
        pass_extract_common_addr.

Test:
This patch was applied to trunk GCC and tested on the arm emulator with newlib.
No new failures.

thanks
Carrot


Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 148009)
+++ tree-pass.h (working copy)
@@ -433,6 +433,7 @@ extern struct rtl_opt_pass pass_rtl_fwpr
 extern struct rtl_opt_pass pass_rtl_fwprop_addr;
 extern struct rtl_opt_pass pass_jump2;
 extern struct rtl_opt_pass pass_lower_subreg;
+extern struct rtl_opt_pass pass_extract_common_addr;
 extern struct rtl_opt_pass pass_cse;
 extern struct rtl_opt_pass pass_fast_rtl_dce;
 extern struct rtl_opt_pass pass_ud_rtl_dce;

Index: target.h
===================================================================
--- target.h    (revision 148009)
+++ target.h    (working copy)
@@ -619,6 +619,10 @@ struct gcc_target
   /* Given an address RTX, say whether it is valid.  */
   bool (* legitimate_address_p) (enum machine_mode, rtx, bool);

+  /* Given a base reg, return the min and max address offset.  */
+  void (* get_address_offset_range)(enum machine_mode, const_rtx,
+                                    HOST_WIDE_INT*, HOST_WIDE_INT*, int*);
+
   /* True if the given constant can be put into an object_block.  */
   bool (* use_blocks_for_constant_p) (enum machine_mode, const_rtx);

Index: target-def.h
===================================================================
--- target-def.h        (revision 148009)
+++ target-def.h        (working copy)
@@ -490,6 +490,7 @@
 #define TARGET_LEGITIMIZE_ADDRESS default_legitimize_address
 #define TARGET_DELEGITIMIZE_ADDRESS hook_rtx_rtx_identity
 #define TARGET_LEGITIMATE_ADDRESS_P default_legitimate_address_p
+#define TARGET_ADDRESS_OFFSET_RANGE NULL
 #define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_false
 #define TARGET_MIN_ANCHOR_OFFSET 0
 #define TARGET_MAX_ANCHOR_OFFSET 0
@@ -879,6 +880,7 @@
   TARGET_LEGITIMIZE_ADDRESS,                   \
   TARGET_DELEGITIMIZE_ADDRESS,                 \
   TARGET_LEGITIMATE_ADDRESS_P,                 \
+  TARGET_ADDRESS_OFFSET_RANGE,                  \
   TARGET_USE_BLOCKS_FOR_CONSTANT_P,            \
   TARGET_MIN_ANCHOR_OFFSET,                    \
   TARGET_MAX_ANCHOR_OFFSET,                    \

Index: passes.c
===================================================================
--- passes.c    (revision 148009)
+++ passes.c    (working copy)
@@ -724,6 +724,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_into_cfg_layout_mode);
       NEXT_PASS (pass_jump2);
       NEXT_PASS (pass_lower_subreg);
+      NEXT_PASS (pass_extract_common_addr);
       NEXT_PASS (pass_df_initialize_opt);
       NEXT_PASS (pass_cse);
       NEXT_PASS (pass_rtl_fwprop);

Index: Makefile.in
===================================================================
--- Makefile.in (revision 148009)
+++ Makefile.in (working copy)
@@ -1129,6 +1129,7 @@ OBJS-common = \
        explow.o \
        expmed.o \
        expr.o \
+       extract-common-addr.o \
        final.o \
        fixed-value.o \
        fold-const.o \
@@ -2704,6 +2705,9 @@ ipa-struct-reorg.o: ipa-struct-reorg.c i
    $(TREE_PASS_H) opts.h $(IPA_TYPE_ESCAPE_H) $(TREE_DUMP_H) \
    $(GIMPLE_H)

+extract-common-addr.o : extract-common-addr.c $(CONFIG_H) $(SYSTEM_H) \
+   coretypes.h $(TM_H) $(RTL_H) $(FLAGS_H) insn-config.h $(RECOG_H) $(EXPR_H) \
+   $(BASIC_BLOCK_H) alloc-pool.h $(TARGET_H) tree-pass.h
 coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
    $(FUNCTION_H) $(TOPLEV_H) $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \

Index: extract-common-addr.c
===================================================================
--- extract-common-addr.c       (revision 0)
+++ extract-common-addr.c       (revision 0)
@@ -0,0 +1,584 @@
+/* Extract common address calculation for accesses to
+   fields of large struct.
+   Copyright (C) 2009
+   Free Software Foundation, Inc.
+   Contributed by Carrot Wei <carrot@google.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This file contains optimization for complex address calculation for
+   large structure.
+
+   Suppose we have a struct defined as:
+   typedef struct large_struct
+   {
+     char dummy[400];
+     int  field1;
+     int  field2;
+   } large_struct_t;
+
+   Now we want to access invidual fields through a pointer:
+
+   large_struct_t *p = ...;
+   ... = p->field1;
+   ... = p->field2;
+
+   If the load instruction of the target architecture can't support such
+   a large offset, gcc usually generats following insns to access struct
+   fields:
+
+   (set r200 400)                     # 400 is offset of field1
+   (set r201 (mem (plus r100 r200)))  # r100 contains pointer p
+   ...
+   (set r300 404)                     # 404 is offset of field2
+   (set r301 (mem (plus r100 r300)))  # r100 contains pointer p
+
+   Sometimes the large constant(offset) loading needs more than one
+   instruction, it make things worse.
+
+   One possible optimization for multiple accesses is first compute a new
+   base that is nearer to the fields that we want to access, then use the
+   normal (base + offset) addressing mode to access them. So ideally the
+   previous insns should be transformed to:
+
+   (set r101 (plus r100 400))
+   (set r201 (mem (plus r101 0)))
+   ...
+   (set r301 (mem (plus r101 4)))
+
+   The actual transformation done by this pass is:
+
+   (set r200 400)                     # keep the original insn
+   (set r250 (plus r100 400))         # r250 is new base
+   (set r201 (mem (plus r250 0)))
+   ...
+   (set r300 404)
+   (set r251 (plus r100 400))         # r251 contains same value as r250
+   (set r301 (mem (plus r251 4)))
+
+   And let dce and cse do the rest.
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "basic-block.h"
+#include "alloc-pool.h"
+#include "flags.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "expr.h"
+#include "target.h"
+#include "tree-pass.h"
+
+/* Memory offset information.  */
+struct offset_info
+{
+  rtx mem_insn;              /* The insn does memory access.  */
+  rtx mem;                   /* The mem rtx which uses this offset.  */
+  unsigned int offset_regno; /* The regno of the dest of the set.  */
+  enum machine_mode mode;    /* The dest machine mode.  */
+  HOST_WIDE_INT offset;      /* The offset constant.  */
+  /* Next offset information, they are used with the same base register.  */
+  struct offset_info *next;
+};
+
+/* Memory base address information.  */
+struct base_reg_info
+{
+  /* All offset information used with this base register.  */
+  struct offset_info *offsets;
+  unsigned int base_regno;     /* Base register number.  */
+  rtx base_reg;                /* Base register rtx.  */
+  enum machine_mode mode;      /* Base register machine mode.  */
+};
+
+/* Offset table, indexed by the offset register number.  */
+static struct offset_info **offset_table;
+
+/* Base register table, indexed by the base register number.  */
+static struct base_reg_info *base_table;
+
+/* Memory pool for offset_info elements.  */
+static alloc_pool offset_element_pool;
+
+/* Given a register rtx, look up the offset_table. If we found it and they
+   have same machine mode returns the pointer to the offset information.
+   Otherwise returns NULL.  */
+
+static inline struct offset_info *
+find_reg (rtx x)
+{
+  struct offset_info *offset_item;
+
+  gcc_assert (REG_P (x));
+  offset_item = offset_table[REGNO (x)];
+  if (offset_item == NULL)
+    return NULL;
+  if (offset_item->mode != GET_MODE (x))
+    return NULL;
+  return offset_item;
+}
+
+/* If x is MEM, and the address is in the form of
+   (plus base_reg offset_reg)
+   remove the corresponding item from offset table and
+   insert it into a base_reg_info object. */
+
+static int
+find_mem (rtx x, void *data)
+{
+  HOST_WIDE_INT min_offset, max_offset;
+  int alignment;
+  struct base_reg_info *base_reg_item;
+  struct offset_info *offset_item;
+  rtx reg1, reg2, address;
+
+  if (!MEM_P (x))
+    return 0;
+
+  /* Look for (plus reg1 reg2) pattern.  */
+  address = XEXP (x, 0);
+  if (GET_CODE (address) != PLUS)
+    return 0;
+  reg1 = XEXP (address, 0);
+  reg2 = XEXP (address, 1);
+  if (!REG_P (reg1) || !REG_P (reg2))
+    return 0;
+
+  /* Look for addressing offset. If necessary swap reg1 and reg2
+     so that reg1 always contains base address and reg2 always
+     contains offset.  */
+  offset_item = find_reg (reg2);
+  if (offset_item == NULL)
+    {
+      rtx t = reg1;
+      reg1 = reg2;
+      reg2 = t;
+      offset_item = find_reg (reg2);
+      if (offset_item == NULL)
+        return -1;
+    }
+
+  /* Currently only deals with pseudo base register. Other register
+     classes are also possible but need more complicated
+     implementation of targetm.get_address_offset_range.  */
+  if (HARD_REGISTER_P (reg1))
+    return -1;
+
+  /* If the offset is not large enough, ignore it.  */
+  targetm.get_address_offset_range (GET_MODE (x), reg1,
+                                    &min_offset, &max_offset, &alignment);
+  if (offset_item->offset >= min_offset
+      && offset_item->offset <= max_offset)
+    return -1;
+
+  /* If the base register and offset register don't have the same
+     machine mode, ignore this offset.  */
+  if (GET_MODE (reg1) != offset_item->mode)
+    return -1;
+
+  /* Look up the corresponding base register information.
+     If there is no one, create it.  */
+  base_reg_item = &base_table[REGNO (reg1)];
+  if (base_reg_item->offsets == NULL)
+    {
+      base_reg_item->mode = GET_MODE (reg1);
+      base_reg_item->base_regno = REGNO (reg1);
+      base_reg_item->base_reg = reg1;
+    }
+
+  /* Associate this offset information with the base information.  */
+  offset_item->mem = x;
+  offset_item->mem_insn = (rtx) data;
+  offset_item->next = base_reg_item->offsets;
+  base_reg_item->offsets = offset_item;
+
+  /* Remove the offset information from offset table, since we don't
+     expect it will be used more than once.  */
+  offset_table[offset_item->offset_regno] = NULL;
+  return -1;
+}
+
+/* If x is a register and has an entry in the offset table, the offset
+   information should be invalidated and removed from the table.  */
+
+static inline void
+clobber_reg (rtx x)
+{
+  if (GET_CODE (x) == SUBREG)
+    x = SUBREG_REG (x);
+
+  if (REG_P (x))
+    {
+      if (offset_table[REGNO (x)] == NULL)
+        return;
+      pool_free (offset_element_pool, offset_table[REGNO (x)]);
+      offset_table[REGNO (x)] = NULL;
+    }
+}
+
+/* If a reg is clobbered in insn, remove it from offset table.  */
+
+static void
+clobber_insn_reg (rtx insn)
+{
+  rtx x = PATTERN (insn);
+
+  switch (GET_CODE (x))
+    {
+    case SET:
+      clobber_reg (SET_DEST (x));
+      break;
+
+    case CLOBBER:
+      clobber_reg (XEXP (x, 0));
+      break;
+
+    case PARALLEL:
+      {
+        int i;
+        int len = XVECLEN (x, 0);
+        for (i = 0; i < len; i++)
+          {
+            rtx y = XVECEXP (x, 0, i);
+            if (GET_CODE (y) == SET)
+              clobber_reg (SET_DEST (y));
+            else if (GET_CODE (y) == CLOBBER)
+              clobber_reg (XEXP (y, 0));
+          }
+      }
+      break;
+
+    default:
+      break;
+    }
+}
+
+/* If this insn is in the form of
+   (set reg offset)
+   insert this offset info into offset table.  */
+
+static void
+find_offset (rtx insn)
+{
+  rtx src, dest;
+  unsigned int regno;
+  HOST_WIDE_INT offset_val;
+  struct offset_info *offset;
+  rtx x = single_set (insn);
+  if (!x)
+    return;
+
+  dest = SET_DEST (x);
+  if (!REG_P (dest))
+    return;
+  regno = REGNO (dest);
+  if (regno < FIRST_PSEUDO_REGISTER)
+    return;
+
+  src = SET_SRC (x);
+  if (GET_CODE (src) != CONST_INT)
+    return;
+  offset_val = INTVAL (src);
+  if (offset_val <= 0)
+    return;
+
+  offset = (struct offset_info *) pool_alloc (offset_element_pool);
+  offset->mem = NULL;
+  offset->offset_regno = regno;
+  offset->mode = GET_MODE (dest);
+  offset->offset = offset_val;
+  offset->next = NULL;
+  offset_table[regno] = offset;
+}
+
+/* Look for optimization opportunities.  */
+
+static void
+collect_candidates (void)
+{
+  basic_block block;
+
+  FOR_EACH_BB (block)
+    {
+      rtx insn;
+      memset (offset_table, 0, sizeof (struct offset_info *) * max_reg_num ());
+
+      FOR_BB_INSNS (block, insn)
+        {
+          if (!INSN_P (insn))
+            continue;
+
+          for_each_rtx (&PATTERN (insn), find_mem, insn);
+
+          clobber_insn_reg (insn);
+
+          find_offset (insn);
+        }
+    }
+}
+
+/* Compare function used by qsort.  */
+
+static int
+compare_offset (const void *pa, const void *pb)
+{
+  struct offset_info * const *offset1 = (struct offset_info* const *)pa;
+  struct offset_info * const *offset2 = (struct offset_info* const *)pb;
+  if ((*offset1)->offset != (*offset2)->offset)
+    return (*offset1)->offset - (*offset2)->offset;
+  return GET_MODE ((*offset1)->mem) - GET_MODE ((*offset2)->mem);
+}
+
+/* Do the actual modification.  */
+
+static void
+rewrite_memory_access (struct base_reg_info *base_reg,
+                       struct offset_info **offset_array,
+                       int count,
+                       HOST_WIDE_INT base_offset)
+{
+  int i;
+  for (i = 0; i < count; i++)
+    {
+      rtx new_base_reg, new_base, new_base_insn, new_offset, x;
+      struct offset_info *offset = offset_array[i];
+
+      /* Insert an insn to compute the new base reg.  */
+      new_base_reg = gen_reg_rtx (offset->mode);
+      new_base = gen_rtx_PLUS (offset->mode,
+                          base_reg->base_reg, GEN_INT (base_offset));
+      new_base_insn = gen_move_insn (new_base_reg, new_base);
+      emit_insn_before (new_base_insn, offset->mem_insn);
+
+      /* Modify the original MEM rtx to use new base and offset address.  */
+      new_offset = GEN_INT (offset->offset - base_offset);
+      x = XEXP (offset->mem, 0);
+      if (REGNO (XEXP (x, 0)) == base_reg->base_regno)
+        {
+          validate_change (offset->mem, &XEXP (x, 0), new_base_reg, 1);
+          validate_change (offset->mem, &XEXP (x, 1), new_offset, 1);
+        }
+      else
+        {
+          validate_change (offset->mem, &XEXP (x, 1), new_base_reg, 1);
+          validate_change (offset->mem, &XEXP (x, 0), new_offset, 1);
+        }
+
+      if (!apply_change_group ())
+        remove_insn(new_base_insn);
+    }
+}
+
+/* Optimize a list of memory addressing relative to one base.  */
+
+static void
+process_one_base (struct base_reg_info *base_reg)
+{
+  HOST_WIDE_INT max_offset, min_offset;
+  HOST_WIDE_INT base_offset, max_base;
+  int alignment, new_alignment;
+
+  struct offset_info **offset_array;
+  struct offset_info *offset = base_reg->offsets;
+  int count = 0;
+  int i, group_count;
+
+  /* Count the number of memory accesses using the same base register. */
+  while (offset)
+    {
+      count++;
+      offset = offset->next;
+    }
+  if (count < 2)
+    return;
+
+  /* Sort the memory accesses according to their offset.  */
+  offset_array = XNEWVEC (struct offset_info *, count);
+  offset = base_reg->offsets;
+  for (i = 0; i < count; i++)
+    {
+      offset_array[i] = offset;
+      offset = offset->next;
+    }
+  qsort (offset_array, count, sizeof(struct offset_info *), compare_offset);
+
+  /* group_count contains the number of MEM accesses should be relative to
+     the same new base address.
+     (base_offset + old_base) is the new base address of current group.
+     max_base is the maximum possible value of base_offset.
+     alignment is the offset requirement, not the memory alignment.  */
+  group_count = 1;
+  targetm.get_address_offset_range (GET_MODE (offset_array[0]->mem),
+                                    NULL, &min_offset, &max_offset,
+                                    &new_alignment);
+  base_offset = offset_array[0]->offset - max_offset;
+  max_base = offset_array[0]->offset - min_offset;
+  alignment = new_alignment;
+
+  /* Add the memory access to current group one by one until it is not
+     feasible to add the new memory access.  */
+  for (i = 1; i < count; i++)
+    {
+      HOST_WIDE_INT base2, max_base2;
+      int alignment2;
+
+      targetm.get_address_offset_range (GET_MODE (offset_array[i]->mem),
+              NULL, &min_offset, &max_offset, &new_alignment);
+      base2 = offset_array[i]->offset - max_offset;
+      alignment2 = alignment;
+
+      /* Adjust the new base_offset and alignment.  */
+      if (base2 < base_offset)
+        {
+          if (new_alignment > alignment)
+            {
+              HOST_WIDE_INT delta = offset_array[i]->offset - base_offset;
+              alignment2 = new_alignment;
+              delta = delta & ~(alignment2 - 1);
+              base2 = offset_array[i]->offset - delta;
+            }
+          else
+            base2 = base_offset;
+        }
+      else
+        {
+          if (new_alignment > alignment)
+              alignment2 = new_alignment;
+          else
+            {
+              HOST_WIDE_INT delta = base2 - base_offset;
+              delta = (delta + alignment - 1) & ~(alignment - 1);
+              base2 = base_offset + delta;
+            }
+        }
+
+      /* Adjust max_base.  */
+      max_base2 = offset_array[i]->offset - min_offset;
+      if (new_alignment < alignment)
+        {
+          HOST_WIDE_INT mask = alignment - 1;
+          HOST_WIDE_INT aligned_min = (min_offset + alignment - 1) & mask;
+          max_base2 = offset_array[i]->offset - aligned_min;
+          if (max_base2 < max_base)
+            max_base = max_base2;
+        }
+      else
+        {
+          if (max_base2 > max_base)
+            {
+              HOST_WIDE_INT delta = max_base2 - max_base;
+              delta = (delta + new_alignment - 1) & ~(new_alignment - 1);
+              max_base = max_base2 - delta;
+            }
+          else
+            max_base = max_base2;
+        }
+
+      /* If the new base_offset is greater than the new max_base, this
+         memory access can't be put into current group.  */
+      if (base2 > max_base)
+        {
+          /* Only when there are two or more memory accesses in one group,
+             this optimization can be beneficial.  */
+          if (group_count > 1)
+          {
+            struct offset_info **start = &offset_array[i - group_count];
+            rewrite_memory_access (base_reg, start, group_count, base_offset);
+          }
+
+          /* Start a new group.  */
+          base_offset = offset_array[i]->offset - max_offset;
+          max_base = offset_array[i]->offset - min_offset;
+          alignment = new_alignment;
+          group_count = 0;
+        }
+      else
+        {
+          alignment = alignment2;
+          base_offset = base2;
+        }
+
+      group_count++;
+    }
+
+  /* The final group.  */
+  if (group_count > 1)
+    {
+      struct offset_info **start = &offset_array[i - group_count];
+      rewrite_memory_access (base_reg, start, group_count, base_offset);
+    }
+
+  free (offset_array);
+}
+
+/* This optimization can be enabled by providing an implementation of
+   targetm.get_address_offset_range.  */
+
+static bool
+gate_handle_common_addr (void)
+{
+  return targetm.get_address_offset_range != NULL &&  optimize > 0;
+}
+
+static unsigned int
+rest_of_handle_extract_common_addr (void)
+{
+  int i;
+  int reg_count = max_reg_num();
+  offset_table = XNEWVEC (struct offset_info *, reg_count);
+  base_table = XNEWVEC (struct base_reg_info, reg_count);
+  memset (offset_table, 0, sizeof (struct offset_info *) * reg_count);
+  memset (base_table, 0, sizeof (struct base_reg_info) * reg_count);
+  offset_element_pool = create_alloc_pool ("offset pool",
+                                           sizeof (struct offset_info), 64);
+
+  collect_candidates ();
+
+  for (i = 0; i < reg_count; i++)
+    {
+      if (base_table[i].offsets != NULL)
+        process_one_base(&base_table[i]);
+    }
+
+  free_alloc_pool (offset_element_pool);
+  free (base_table);
+  free (offset_table);
+  return 0;
+}
+
+struct rtl_opt_pass pass_extract_common_addr =
+{
+ {
+  RTL_PASS,
+  "extract_common_addr",                /* name */
+  gate_handle_common_addr,              /* gate */
+  rest_of_handle_extract_common_addr,   /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_NONE,                              /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_dump_func                        /* todo_flags_finish */
+ }
+};

Index: tm.texi
===================================================================
--- tm.texi     (revision 148009)
+++ tm.texi     (working copy)
@@ -5564,6 +5564,11 @@ holding the constant.  This restriction
 of TLS symbols for various targets.
 @end deftypefn

+@deftypefn {Target Hook} void TARGET_ADDRESS_OFFSET_RANGE (enum
machine_mode @var{mode}, const_rtx @var{x}, HOST_WIDE_INT
*@var{min_offset}, HOST_WIDE_INT *@var{max_offset}, int
*@var{alignment})
+Given a memory access @var{mode} and base register @var{x}, this hook should
+return the offset range and alignment in a legal (base + offset) addressing.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (enum
machine_mode @var{mode}, rtx @var{x})
 This hook should return true if pool entries for constant @var{x} can
 be placed in an @code{object_block} structure.  @var{mode} is the mode

Index: arm-protos.h
===================================================================
--- arm-protos.h        (revision 148009)
+++ arm-protos.h        (working copy)
@@ -58,6 +58,8 @@ extern int arm_legitimate_address_outer_
 extern int thumb_legitimate_offset_p (enum machine_mode, HOST_WIDE_INT);
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
                                            int);
+extern void arm_address_offset_range (enum machine_mode, const_rtx,
+                                      HOST_WIDE_INT *, HOST_WIDE_INT *, int *);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);

Index: arm.c
===================================================================
--- arm.c       (revision 148009)
+++ arm.c       (working copy)
@@ -407,6 +407,9 @@ static bool arm_allocate_stack_slots_for
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P    arm_legitimate_address_p

+#undef  TARGET_ADDRESS_OFFSET_RANGE
+#define TARGET_ADDRESS_OFFSET_RANGE arm_address_offset_range
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 /* Obstack for minipool constant handling.  */
@@ -4476,6 +4479,180 @@ arm_legitimate_address_p (enum machine_m
     return thumb1_legitimate_address_p (mode, x, strict_p);
 }

+/* In arm mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+arm_address_offset_range_1 (enum machine_mode mode, const_rtx base_reg,
+                            HOST_WIDE_INT *min_offset,
+                            HOST_WIDE_INT *max_offset,
+                            int *alignment)
+{
+  if (TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_MAVERICK)
+      && (GET_MODE_CLASS (mode) == MODE_FLOAT
+          || (TARGET_MAVERICK && mode == DImode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1015;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (mode == DImode || mode == DFmode)
+    {
+      if (TARGET_LDRD)
+        {
+          *min_offset = -255;
+          *max_offset = 255;
+        }
+      else
+        {
+          *min_offset = -4095;
+          *max_offset = 4091;
+        }
+      *alignment = 1;
+      return;
+    }
+
+  if (arm_arch4)
+    {
+      if (mode == HImode || mode == QImode)
+        *max_offset = 255;
+      else
+        *max_offset = 4095;
+    }
+  else
+    *max_offset = (mode == HImode) ? 4094 : 4095;
+  *min_offset = - (*max_offset);
+  *alignment = 1;
+}
+
+/* In thumb2 mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+thumb2_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                             HOST_WIDE_INT *min_offset,
+                             HOST_WIDE_INT *max_offset,
+                             int *alignment)
+{
+  if (TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_MAVERICK)
+      && (GET_MODE_CLASS (mode) == MODE_FLOAT
+          || (TARGET_MAVERICK && mode == DImode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
+    {
+      if (!TARGET_LDRD || mode != DImode)
+        {
+          *min_offset = -1023;
+          *max_offset = 1023;
+          *alignment = 4;
+          return;
+        }
+    }
+
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1015;
+      *alignment = 4;
+      return;
+    }
+
+  if (mode == DImode || mode == DFmode)
+    {
+      *min_offset = -255;
+      *max_offset = 255;
+      *alignment = 4;
+      return;
+    }
+
+  *min_offset = -255;
+  *max_offset = 4095;
+  *alignment = 1;
+}
+
+/* In thumb1 mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+thumb1_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                             HOST_WIDE_INT *min_offset,
+                             HOST_WIDE_INT *max_offset,
+                             int *alignment)
+{
+  *min_offset = 0;
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 1:
+      *max_offset = 31;
+      *alignment = 1;
+      break;
+
+    case 2:
+      *max_offset = 63;
+      *alignment = 2;
+      break;
+
+    default:
+      *max_offset = 127 - GET_MODE_SIZE (mode);
+      *alignment = 4;
+    }
+}
+
+/* Returns the min and max address offset relative to the specified
+   base register.  */
+
+void
+arm_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                          HOST_WIDE_INT *min_offset,
+                          HOST_WIDE_INT *max_offset,
+                          int *alignment)
+{
+  HOST_WIDE_INT mask;
+
+  /* Currently only deal with pseudo base register.  */
+  gcc_assert (!base_reg || !HARD_REGISTER_P (base_reg));
+
+  if (TARGET_ARM)
+    arm_address_offset_range_1 (mode, base_reg,
+                                min_offset, max_offset, alignment);
+  else if (TARGET_THUMB2)
+    thumb2_address_offset_range (mode, base_reg,
+                                 min_offset, max_offset, alignment);
+  else /* (TARGET_THUMB1) */
+    thumb1_address_offset_range (mode, base_reg,
+                                 min_offset, max_offset, alignment);
+
+  mask = ~(*alignment - 1);
+  *max_offset = *max_offset & mask;
+  *min_offset = (*min_offset + *alignment - 1) & mask;
+}
+
 /* Build the SYMBOL_REF for __tls_get_addr.  */

 static GTY(()) rtx tls_get_addr_libfunc;

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-05-31  9:31 [PATCH: PR/40314] extract common address for memory access to fields of large struct Carrot Wei
@ 2009-06-01  8:13 ` Steven Bosscher
  2009-06-03  7:00   ` Carrot Wei
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Bosscher @ 2009-06-01  8:13 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc-patches

On Sun, May 31, 2009 at 11:12 AM, Carrot Wei <carrot@google.com> wrote:
> ChangeLog:
> 2009-05-31  Carrot Wei  <carrot@google.com>
>
>        PR/40314
>        * config/arm/arm.c: add functions arm_address_offset_range,
>        thumb1_address_offset_range, thumb2_address_offset_range,
>        arm_address_offset_range_1.
>        * config/arm/arm-protos.h: add prototype of arm_address_offset_range.
>        * doc/tm.texi: add entry for TARGET_ADDRESS_OFFSET_RANGE.
>        * Makefile.in: add rule for extract_common_addr.
>        * target.h: add a hook function get_address_offset_range.
>        * target-def.h: add a hook function TARGET_ADDRESS_OFFSET_RANGE.
>        * tree-pass.h: add a pass_extract_common_addr.
>        * passes.c: add a pass_extract_common_addr.
>        * extract-common-addr.c: implementation of the new
>        pass_extract_common_addr.
>
> Test:
> This patch was applied to trunk GCC and tested on the arm emulator with newlib.
> No new failures.

And what does it do for code size?  Compile time?  Do you know CSiBE?
It's a nice benchmark to check those two things, too.


And then some random comments:

First, it would be nice to have some general description at the top of
the file of how the algorithm works.

It looks to me like you're collecting base+offset per basic block (so
the optimization is local) and then process the recorded information
globally.  Why not record-and-process one basic block at a time?

Perhaps you can add some comments in process_one_base in the part with
the fuzzy math?  ;-)  I got all confused by how you calculate the new
base+offset and alignment.


> +/* Given a register rtx, look up the offset_table. If we found it and they
> +   have same machine mode returns the pointer to the offset information.
> +   Otherwise returns NULL.  */
> +
> +static inline struct offset_info *
> +find_reg (rtx x)

A more descriptive function name, perhaps?  Everyone else also calls
their functions find_reg already... :-)


> +/* If x is MEM, and the address is in the form of
> +   (plus base_reg offset_reg)
> +   remove the corresponding item from offset table and
> +   insert it into a base_reg_info object. */

Describe the return value of the function too, please.  Just saying
it's called via for_each_rtx is enough.
And again, a more descriptive function name, perhaps?  Explain what is
passed in DATA.


> +static int
> +find_mem (rtx x, void *data)
> +{
(...)
> +  /* If the offset is not large enough, ignore it.  */
> +  targetm.get_address_offset_range (GET_MODE (x), reg1,
> +                                    &min_offset, &max_offset, &alignment);
> +  if (offset_item->offset >= min_offset
> +      && offset_item->offset <= max_offset)
> +    return -1;
> +
> +  /* If the base register and offset register don't have the same
> +     machine mode, ignore this offset.  */
> +  if (GET_MODE (reg1) != offset_item->mode)
> +    return -1;

Move this second check before the targetm.get_address_offset_range check?


> +  /* Look up the corresponding base register information.
> +     If there is no one, create it.  */
> +  base_reg_item = &base_table[REGNO (reg1)];
> +  if (base_reg_item->offsets == NULL)
> +    {
> +      base_reg_item->mode = GET_MODE (reg1);
> +      base_reg_item->base_regno = REGNO (reg1);
> +      base_reg_item->base_reg = reg1;
> +    }
> +
> +  /* Associate this offset information with the base information.  */
> +  offset_item->mem = x;
> +  offset_item->mem_insn = (rtx) data;
> +  offset_item->next = base_reg_item->offsets;
> +  base_reg_item->offsets = offset_item;

Why collect them in a list here when you later need the offsets in an array?


> +/* If a reg is clobbered in insn, remove it from offset table.  */
> +
> +static void
> +clobber_insn_reg (rtx insn)
> +{

Can write this and clobber_reg by walking over DF_INSN_DEFS looking
for refs with DF_REF_MUST_CLOBBER, and with DF_REF_REAL_REG.


> +/* If this insn is in the form of
> +   (set reg offset)
> +   insert this offset info into offset table.  */
> +
> +static void
> +find_offset (rtx insn)
> +{
> +  rtx src, dest;
> +  unsigned int regno;
> +  HOST_WIDE_INT offset_val;
> +  struct offset_info *offset;
> +  rtx x = single_set (insn);
> +  if (!x)
> +    return;
> +
> +  dest = SET_DEST (x);
> +  if (!REG_P (dest))
> +    return;
> +  regno = REGNO (dest);
> +  if (regno < FIRST_PSEUDO_REGISTER)
> +    return;

if (HARD_REGISTER_NUM_P (regno))
  return;

Same with other FIRST_PSEUDO_REGISTER.  You should use
HARD_REGISTER_NUM_P and HARD_REGISTER_P instead of comparing against
FIRST_PSEUDO_REGISTER.


> +  src = SET_SRC (x);
> +  if (GET_CODE (src) != CONST_INT)
> +    return;
> +  offset_val = INTVAL (src);
> +  if (offset_val <= 0)
> +    return;
> +
> +  offset = (struct offset_info *) pool_alloc (offset_element_pool);
> +  offset->mem = NULL;
> +  offset->offset_regno = regno;
> +  offset->mode = GET_MODE (dest);
> +  offset->offset = offset_val;
> +  offset->next = NULL;
> +  offset_table[regno] = offset;

Maybe pool_free the existing offset_table[regno] entry?  Or are there
still base_reg_items that can point to that entry?  Or are you sure
that offset_table[regno] is always NULL at this point?  If the latter,
maybe gcc_assert that?


> +/* Look for optimization opportunities.  */
> +
> +static void
> +collect_candidates (void)
> +{
> +  basic_block block;
> +
> +  FOR_EACH_BB (block)
> +    {
> +      rtx insn;
> +      memset (offset_table, 0, sizeof (struct offset_info *) * max_reg_num ());

You may want to experiment with something to track the registers for
which you have recorded something in the offset_table.  I think the
number of offsets you find is generally very small compared to
max_reg_num().  It may speed things up a bit if you keep a bitmap of
registers that you recorded something for and only clean those
entries, instead of using memset.


> +}
> +
> +/* Optimize a list of memory addressing relative to one base.  */
> +static void
> +process_one_base (struct base_reg_info *base_reg)
> +{

Describe the function arguments please, always.

> +  HOST_WIDE_INT max_offset, min_offset;
> +  HOST_WIDE_INT base_offset, max_base;
> +  int alignment, new_alignment;
> +
> +  struct offset_info **offset_array;
> +  struct offset_info *offset = base_reg->offsets;
> +  int count = 0;
> +  int i, group_count;
> +
> +  /* Count the number of memory accesses using the same base register. */
> +  while (offset)
> +    {
> +      count++;
> +      offset = offset->next;
> +    }
> +  if (count < 2)
> +    return;
> +
> +  /* Sort the memory accesses according to their offset.  */
> +  offset_array = XNEWVEC (struct offset_info *, count);
> +  offset = base_reg->offsets;
> +  for (i = 0; i < count; i++)
> +    {
> +      offset_array[i] = offset;
> +      offset = offset->next;
> +    }
> +  qsort (offset_array, count, sizeof(struct offset_info *), compare_offset);

So you have recorded the offsets in the list base_reg->offsets.  But
you want to know the number of offsets of a base reg and you want to
sort the list.  So you walk the list to count and the stuff the list
into an array to sort it.

Why not record the offsets into an array straight away, then?  Looks
like you could better use a VEC instead, e.g.:

typedef struct offset_info *offset_info_t
DEF_VEC_P(offset_info_t);
DEF_VEC_ALLOC_P(offset_info_t, heap);

/* Memory base address information.  */
struct base_reg_info
{
  /* All offset information used with this base register.  */
  VEC(offset_info_t, heap) *offsets;
(...)
};

and use VEC_safe_push to record the offset_info entries.  You can use
VEC_length to get the number of memory accesses using the same base
register.  You can qsort a VEC in place, grep for "qsort (VEC_address"
for examples.


> +static unsigned int
> +rest_of_handle_extract_common_addr (void)
> +{
> +  int i;
> +  int reg_count = max_reg_num();
> +  offset_table = XNEWVEC (struct offset_info *, reg_count);
> +  base_table = XNEWVEC (struct base_reg_info, reg_count);
> +  memset (offset_table, 0, sizeof (struct offset_info *) * reg_count);
> +  memset (base_table, 0, sizeof (struct base_reg_info) * reg_count);

XCNEWVEC instead of XNEWVEC+memset.


> +struct rtl_opt_pass pass_extract_common_addr =
> +{
> + {
> +  RTL_PASS,
> +  "extract_common_addr",                /* name */
> +  gate_handle_common_addr,              /* gate */
> +  rest_of_handle_extract_common_addr,   /* execute */
> +  NULL,                                 /* sub */
> +  NULL,                                 /* next */
> +  0,                                    /* static_pass_number */
> +  TV_NONE,                              /* tv_id */

New pass needs a new timevar.


Ciao!
Steven

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-06-01  8:13 ` Steven Bosscher
@ 2009-06-03  7:00   ` Carrot Wei
  2009-06-12 23:19     ` Steven Bosscher
  0 siblings, 1 reply; 8+ messages in thread
From: Carrot Wei @ 2009-06-03  7:00 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: gcc-patches

On Mon, Jun 1, 2009 at 4:13 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Sun, May 31, 2009 at 11:12 AM, Carrot Wei <carrot@google.com> wrote:
>> ChangeLog:
>> 2009-05-31  Carrot Wei  <carrot@google.com>
>>
>>        PR/40314
>>        * config/arm/arm.c: add functions arm_address_offset_range,
>>        thumb1_address_offset_range, thumb2_address_offset_range,
>>        arm_address_offset_range_1.
>>        * config/arm/arm-protos.h: add prototype of arm_address_offset_range.
>>        * doc/tm.texi: add entry for TARGET_ADDRESS_OFFSET_RANGE.
>>        * Makefile.in: add rule for extract_common_addr.
>>        * target.h: add a hook function get_address_offset_range.
>>        * target-def.h: add a hook function TARGET_ADDRESS_OFFSET_RANGE.
>>        * tree-pass.h: add a pass_extract_common_addr.
>>        * passes.c: add a pass_extract_common_addr.
>>        * extract-common-addr.c: implementation of the new
>>        pass_extract_common_addr.
>>
>> Test:
>> This patch was applied to trunk GCC and tested on the arm emulator with newlib.
>> No new failures.
>
> And what does it do for code size?  Compile time?  Do you know CSiBE?
> It's a nice benchmark to check those two things, too.
>
I've tested CSiBE, nearly no changes to code size and compile time. It
seems there is no large structure in CSiBE to trigger this
optimization. For mcf from CPU SPEC 2006, this optimization can reduce
about 7% static instructions.

>
> And then some random comments:
>
> First, it would be nice to have some general description at the top of
> the file of how the algorithm works.
>
> It looks to me like you're collecting base+offset per basic block (so
> the optimization is local) and then process the recorded information
> globally.  Why not record-and-process one basic block at a time?
>
It may lost optimization opportunities by record-and-process one basic
block at a time. In following example two accesses of structure fields
are in different basic blocks, they won't be optimized if we limit the
range to basic block. But we can find it if we process it globally.

if (p->field1)
  return p->field2;

> Perhaps you can add some comments in process_one_base in the part with
> the fuzzy math?  ;-)  I got all confused by how you calculate the new
> base+offset and alignment.
>
done
>
>> +/* Given a register rtx, look up the offset_table. If we found it and they
>> +   have same machine mode returns the pointer to the offset information.
>> +   Otherwise returns NULL.  */
>> +
>> +static inline struct offset_info *
>> +find_reg (rtx x)
>
> A more descriptive function name, perhaps?  Everyone else also calls
> their functions find_reg already... :-)
>
done
>
>> +/* If x is MEM, and the address is in the form of
>> +   (plus base_reg offset_reg)
>> +   remove the corresponding item from offset table and
>> +   insert it into a base_reg_info object. */
>
> Describe the return value of the function too, please.  Just saying
> it's called via for_each_rtx is enough.
> And again, a more descriptive function name, perhaps?  Explain what is
> passed in DATA.
>
done
>
>> +static int
>> +find_mem (rtx x, void *data)
>> +{
> (...)
>> +  /* If the offset is not large enough, ignore it.  */
>> +  targetm.get_address_offset_range (GET_MODE (x), reg1,
>> +                                    &min_offset, &max_offset, &alignment);
>> +  if (offset_item->offset >= min_offset
>> +      && offset_item->offset <= max_offset)
>> +    return -1;
>> +
>> +  /* If the base register and offset register don't have the same
>> +     machine mode, ignore this offset.  */
>> +  if (GET_MODE (reg1) != offset_item->mode)
>> +    return -1;
>
> Move this second check before the targetm.get_address_offset_range check?
>
done
>
>> +  /* Look up the corresponding base register information.
>> +     If there is no one, create it.  */
>> +  base_reg_item = &base_table[REGNO (reg1)];
>> +  if (base_reg_item->offsets == NULL)
>> +    {
>> +      base_reg_item->mode = GET_MODE (reg1);
>> +      base_reg_item->base_regno = REGNO (reg1);
>> +      base_reg_item->base_reg = reg1;
>> +    }
>> +
>> +  /* Associate this offset information with the base information.  */
>> +  offset_item->mem = x;
>> +  offset_item->mem_insn = (rtx) data;
>> +  offset_item->next = base_reg_item->offsets;
>> +  base_reg_item->offsets = offset_item;
>
> Why collect them in a list here when you later need the offsets in an array?
>
>
>> +/* If a reg is clobbered in insn, remove it from offset table.  */
>> +
>> +static void
>> +clobber_insn_reg (rtx insn)
>> +{
>
> Can write this and clobber_reg by walking over DF_INSN_DEFS looking
> for refs with DF_REF_MUST_CLOBBER, and with DF_REF_REAL_REG.
>
done
>
>> +/* If this insn is in the form of
>> +   (set reg offset)
>> +   insert this offset info into offset table.  */
>> +
>> +static void
>> +find_offset (rtx insn)
>> +{
>> +  rtx src, dest;
>> +  unsigned int regno;
>> +  HOST_WIDE_INT offset_val;
>> +  struct offset_info *offset;
>> +  rtx x = single_set (insn);
>> +  if (!x)
>> +    return;
>> +
>> +  dest = SET_DEST (x);
>> +  if (!REG_P (dest))
>> +    return;
>> +  regno = REGNO (dest);
>> +  if (regno < FIRST_PSEUDO_REGISTER)
>> +    return;
>
> if (HARD_REGISTER_NUM_P (regno))
>  return;
>
done

> Same with other FIRST_PSEUDO_REGISTER.  You should use
> HARD_REGISTER_NUM_P and HARD_REGISTER_P instead of comparing against
> FIRST_PSEUDO_REGISTER.
>
>
>> +  src = SET_SRC (x);
>> +  if (GET_CODE (src) != CONST_INT)
>> +    return;
>> +  offset_val = INTVAL (src);
>> +  if (offset_val <= 0)
>> +    return;
>> +
>> +  offset = (struct offset_info *) pool_alloc (offset_element_pool);
>> +  offset->mem = NULL;
>> +  offset->offset_regno = regno;
>> +  offset->mode = GET_MODE (dest);
>> +  offset->offset = offset_val;
>> +  offset->next = NULL;
>> +  offset_table[regno] = offset;
>
> Maybe pool_free the existing offset_table[regno] entry?  Or are there
> still base_reg_items that can point to that entry?  Or are you sure
> that offset_table[regno] is always NULL at this point?  If the latter,
> maybe gcc_assert that?
>
Function clobber_insn_reg will clear offset_table[regno] first, so
offset_table[regno] shoule always be NULL at this point. Added
gcc_assert.
>
>> +/* Look for optimization opportunities.  */
>> +
>> +static void
>> +collect_candidates (void)
>> +{
>> +  basic_block block;
>> +
>> +  FOR_EACH_BB (block)
>> +    {
>> +      rtx insn;
>> +      memset (offset_table, 0, sizeof (struct offset_info *) * max_reg_num ());
>
> You may want to experiment with something to track the registers for
> which you have recorded something in the offset_table.  I think the
> number of offsets you find is generally very small compared to
> max_reg_num().  It may speed things up a bit if you keep a bitmap of
> registers that you recorded something for and only clean those
> entries, instead of using memset.
>
Added bitmap to track the non-emtpy entry and speed up resetting.
>
>> +}
>> +
>> +/* Optimize a list of memory addressing relative to one base.  */
>> +static void
>> +process_one_base (struct base_reg_info *base_reg)
>> +{
>
> Describe the function arguments please, always.
>
done

>> +  HOST_WIDE_INT max_offset, min_offset;
>> +  HOST_WIDE_INT base_offset, max_base;
>> +  int alignment, new_alignment;
>> +
>> +  struct offset_info **offset_array;
>> +  struct offset_info *offset = base_reg->offsets;
>> +  int count = 0;
>> +  int i, group_count;
>> +
>> +  /* Count the number of memory accesses using the same base register. */
>> +  while (offset)
>> +    {
>> +      count++;
>> +      offset = offset->next;
>> +    }
>> +  if (count < 2)
>> +    return;
>> +
>> +  /* Sort the memory accesses according to their offset.  */
>> +  offset_array = XNEWVEC (struct offset_info *, count);
>> +  offset = base_reg->offsets;
>> +  for (i = 0; i < count; i++)
>> +    {
>> +      offset_array[i] = offset;
>> +      offset = offset->next;
>> +    }
>> +  qsort (offset_array, count, sizeof(struct offset_info *), compare_offset);
>
> So you have recorded the offsets in the list base_reg->offsets.  But
> you want to know the number of offsets of a base reg and you want to
> sort the list.  So you walk the list to count and the stuff the list
> into an array to sort it.
>
> Why not record the offsets into an array straight away, then?  Looks
> like you could better use a VEC instead, e.g.:
>
> typedef struct offset_info *offset_info_t
> DEF_VEC_P(offset_info_t);
> DEF_VEC_ALLOC_P(offset_info_t, heap);
>
> /* Memory base address information.  */
> struct base_reg_info
> {
>  /* All offset information used with this base register.  */
>  VEC(offset_info_t, heap) *offsets;
> (...)
> };
>
> and use VEC_safe_push to record the offset_info entries.  You can use
> VEC_length to get the number of memory accesses using the same base
> register.  You can qsort a VEC in place, grep for "qsort (VEC_address"
> for examples.
>
This is a really good method. Thank you.
>
>> +static unsigned int
>> +rest_of_handle_extract_common_addr (void)
>> +{
>> +  int i;
>> +  int reg_count = max_reg_num();
>> +  offset_table = XNEWVEC (struct offset_info *, reg_count);
>> +  base_table = XNEWVEC (struct base_reg_info, reg_count);
>> +  memset (offset_table, 0, sizeof (struct offset_info *) * reg_count);
>> +  memset (base_table, 0, sizeof (struct base_reg_info) * reg_count);
>
> XCNEWVEC instead of XNEWVEC+memset.
>
done
>
>> +struct rtl_opt_pass pass_extract_common_addr =
>> +{
>> + {
>> +  RTL_PASS,
>> +  "extract_common_addr",                /* name */
>> +  gate_handle_common_addr,              /* gate */
>> +  rest_of_handle_extract_common_addr,   /* execute */
>> +  NULL,                                 /* sub */
>> +  NULL,                                 /* next */
>> +  0,                                    /* static_pass_number */
>> +  TV_NONE,                              /* tv_id */
>
> New pass needs a new timevar.
>
>
> Ciao!
> Steven
>

Following is my modified patch.

thanks
Carrot


ChangeLog:
2009-05-31  Carrot Wei  <carrot@google.com>

        PR/40314
        * config/arm/arm.c: add functions arm_address_offset_range,
        thumb1_address_offset_range, thumb2_address_offset_range,
        arm_address_offset_range_1.
        * config/arm/arm-protos.h: add prototype of arm_address_offset_range.
        * doc/tm.texi: add entry for TARGET_ADDRESS_OFFSET_RANGE.
        * Makefile.in: add rule for extract_common_addr.
        * target.h: add a hook function get_address_offset_range.
        * target-def.h: add a hook function TARGET_ADDRESS_OFFSET_RANGE.
        * tree-pass.h: add a pass_extract_common_addr.
        * passes.c: add a pass_extract_common_addr.
        * extract-common-addr.c: implementation of the new
        pass_extract_common_addr.
        * timevar.def: add a new tv_id TV_EXTRACT_COMMON_ADDR.


Index: timevar.def
===================================================================
--- timevar.def (revision 148009)
+++ timevar.def (working copy)
@@ -150,6 +150,7 @@ DEFTIMEVAR (TV_VARCONST              , "
 DEFTIMEVAR (TV_LOWER_SUBREG         , "lower subreg")
 DEFTIMEVAR (TV_JUMP                  , "jump")
 DEFTIMEVAR (TV_FWPROP                , "forward prop")
+DEFTIMEVAR (TV_EXTRACT_COMMON_ADDR   , "extract common address")
 DEFTIMEVAR (TV_CSE                   , "CSE")
 DEFTIMEVAR (TV_DCE                   , "dead code elimination")
 DEFTIMEVAR (TV_DSE1                  , "dead store elim1")

Index: tree-pass.h
===================================================================
--- tree-pass.h (revision 148009)
+++ tree-pass.h (working copy)
@@ -433,6 +433,7 @@ extern struct rtl_opt_pass pass_rtl_fwpr
 extern struct rtl_opt_pass pass_rtl_fwprop_addr;
 extern struct rtl_opt_pass pass_jump2;
 extern struct rtl_opt_pass pass_lower_subreg;
+extern struct rtl_opt_pass pass_extract_common_addr;
 extern struct rtl_opt_pass pass_cse;
 extern struct rtl_opt_pass pass_fast_rtl_dce;
 extern struct rtl_opt_pass pass_ud_rtl_dce;

Index: target.h
===================================================================
--- target.h    (revision 148009)
+++ target.h    (working copy)
@@ -619,6 +619,10 @@ struct gcc_target
   /* Given an address RTX, say whether it is valid.  */
   bool (* legitimate_address_p) (enum machine_mode, rtx, bool);

+  /* Given a base reg, return the min and max address offsets.  */
+  void (* get_address_offset_range)(enum machine_mode, const_rtx,
+                                    HOST_WIDE_INT*, HOST_WIDE_INT*, int*);
+
   /* True if the given constant can be put into an object_block.  */
   bool (* use_blocks_for_constant_p) (enum machine_mode, const_rtx);

Index: target-def.h
===================================================================
--- target-def.h        (revision 148009)
+++ target-def.h        (working copy)
@@ -490,6 +490,7 @@
 #define TARGET_LEGITIMIZE_ADDRESS default_legitimize_address
 #define TARGET_DELEGITIMIZE_ADDRESS hook_rtx_rtx_identity
 #define TARGET_LEGITIMATE_ADDRESS_P default_legitimate_address_p
+#define TARGET_ADDRESS_OFFSET_RANGE NULL
 #define TARGET_USE_BLOCKS_FOR_CONSTANT_P hook_bool_mode_const_rtx_false
 #define TARGET_MIN_ANCHOR_OFFSET 0
 #define TARGET_MAX_ANCHOR_OFFSET 0
@@ -879,6 +880,7 @@
   TARGET_LEGITIMIZE_ADDRESS,                   \
   TARGET_DELEGITIMIZE_ADDRESS,                 \
   TARGET_LEGITIMATE_ADDRESS_P,                 \
+  TARGET_ADDRESS_OFFSET_RANGE,                  \
   TARGET_USE_BLOCKS_FOR_CONSTANT_P,            \
   TARGET_MIN_ANCHOR_OFFSET,                    \
   TARGET_MAX_ANCHOR_OFFSET,                    \

Index: passes.c
===================================================================
--- passes.c    (revision 148009)
+++ passes.c    (working copy)
@@ -725,6 +725,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_jump2);
       NEXT_PASS (pass_lower_subreg);
       NEXT_PASS (pass_df_initialize_opt);
+      NEXT_PASS (pass_extract_common_addr);
       NEXT_PASS (pass_cse);
       NEXT_PASS (pass_rtl_fwprop);
       NEXT_PASS (pass_rtl_cprop);

Index: Makefile.in
===================================================================
--- Makefile.in (revision 148009)
+++ Makefile.in (working copy)
@@ -1129,6 +1129,7 @@ OBJS-common = \
        explow.o \
        expmed.o \
        expr.o \
+       extract-common-addr.o \
        final.o \
        fixed-value.o \
        fold-const.o \
@@ -2704,6 +2705,9 @@ ipa-struct-reorg.o: ipa-struct-reorg.c i
    $(TREE_PASS_H) opts.h $(IPA_TYPE_ESCAPE_H) $(TREE_DUMP_H) \
    $(GIMPLE_H)

+extract-common-addr.o : extract-common-addr.c $(CONFIG_H) $(SYSTEM_H) \
+   coretypes.h $(BITMAP_H) $(TM_H) $(RTL_H) $(FLAGS_H) insn-config.h $(DF_H) \
+   $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) alloc-pool.h $(TARGET_H)
$(TREE_PASS_H)
 coverage.o : coverage.c $(GCOV_IO_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(RTL_H) $(TREE_H) $(FLAGS_H) output.h $(REGS_H) $(EXPR_H) \
    $(FUNCTION_H) $(TOPLEV_H) $(GGC_H) langhooks.h $(COVERAGE_H) gt-coverage.h \

Index: tm.texi
===================================================================
--- tm.texi     (revision 148009)
+++ tm.texi     (working copy)
@@ -5564,6 +5564,11 @@ holding the constant.  This restriction
 of TLS symbols for various targets.
 @end deftypefn

+@deftypefn {Target Hook} void TARGET_ADDRESS_OFFSET_RANGE (enum
machine_mode @var{mode}, const_rtx @var{x}, HOST_WIDE_INT
*@var{min_offset}, HOST_WIDE_INT *@var{max_offset}, int
*@var{alignment})
+Given a memory access @var{mode} and base register @var{x}, this hook should
+return the offset range and alignment in a legal (base + offset) addressing.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_USE_BLOCKS_FOR_CONSTANT_P (enum
machine_mode @var{mode}, rtx @var{x})
 This hook should return true if pool entries for constant @var{x} can
 be placed in an @code{object_block} structure.  @var{mode} is the mode

Index: arm-protos.h
===================================================================
--- arm-protos.h        (revision 148009)
+++ arm-protos.h        (working copy)
@@ -58,6 +58,8 @@ extern int arm_legitimate_address_outer_
 extern int thumb_legitimate_offset_p (enum machine_mode, HOST_WIDE_INT);
 extern rtx thumb_legitimize_reload_address (rtx *, enum machine_mode, int, int,
                                            int);
+extern void arm_address_offset_range (enum machine_mode, const_rtx,
+                                      HOST_WIDE_INT *, HOST_WIDE_INT *, int *);
 extern int arm_const_double_rtx (rtx);
 extern int neg_const_double_rtx_ok_for_fpa (rtx);
 extern int vfp3_const_double_rtx (rtx);

Index: arm.c
===================================================================
--- arm.c       (revision 148009)
+++ arm.c       (working copy)
@@ -407,6 +407,9 @@ static bool arm_allocate_stack_slots_for
 #undef TARGET_LEGITIMATE_ADDRESS_P
 #define TARGET_LEGITIMATE_ADDRESS_P    arm_legitimate_address_p

+#undef  TARGET_ADDRESS_OFFSET_RANGE
+#define TARGET_ADDRESS_OFFSET_RANGE arm_address_offset_range
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 /* Obstack for minipool constant handling.  */
@@ -4476,6 +4479,180 @@ arm_legitimate_address_p (enum machine_m
     return thumb1_legitimate_address_p (mode, x, strict_p);
 }

+/* In arm mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+arm_address_offset_range_1 (enum machine_mode mode, const_rtx base_reg,
+                            HOST_WIDE_INT *min_offset,
+                            HOST_WIDE_INT *max_offset,
+                            int *alignment)
+{
+  if (TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_MAVERICK)
+      && (GET_MODE_CLASS (mode) == MODE_FLOAT
+          || (TARGET_MAVERICK && mode == DImode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1015;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (mode == DImode || mode == DFmode)
+    {
+      if (TARGET_LDRD)
+        {
+          *min_offset = -255;
+          *max_offset = 255;
+        }
+      else
+        {
+          *min_offset = -4095;
+          *max_offset = 4091;
+        }
+      *alignment = 1;
+      return;
+    }
+
+  if (arm_arch4)
+    {
+      if (mode == HImode || mode == QImode)
+        *max_offset = 255;
+      else
+        *max_offset = 4095;
+    }
+  else
+    *max_offset = (mode == HImode) ? 4094 : 4095;
+  *min_offset = - (*max_offset);
+  *alignment = 1;
+}
+
+/* In thumb2 mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+thumb2_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                             HOST_WIDE_INT *min_offset,
+                             HOST_WIDE_INT *max_offset,
+                             int *alignment)
+{
+  if (TARGET_HARD_FLOAT && (TARGET_FPA || TARGET_MAVERICK)
+      && (GET_MODE_CLASS (mode) == MODE_FLOAT
+          || (TARGET_MAVERICK && mode == DImode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1023;
+      *alignment = 4;
+      return;
+    }
+
+  if (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (mode))
+    {
+      if (!TARGET_LDRD || mode != DImode)
+        {
+          *min_offset = -1023;
+          *max_offset = 1023;
+          *alignment = 4;
+          return;
+        }
+    }
+
+  if (TARGET_NEON
+      && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+    {
+      *min_offset = -1023;
+      *max_offset = 1015;
+      *alignment = 4;
+      return;
+    }
+
+  if (mode == DImode || mode == DFmode)
+    {
+      *min_offset = -255;
+      *max_offset = 255;
+      *alignment = 4;
+      return;
+    }
+
+  *min_offset = -255;
+  *max_offset = 4095;
+  *alignment = 1;
+}
+
+/* In thumb1 mode, returns the min and max address offset relative to the
+   specified base register.  */
+
+static void
+thumb1_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                             HOST_WIDE_INT *min_offset,
+                             HOST_WIDE_INT *max_offset,
+                             int *alignment)
+{
+  *min_offset = 0;
+  switch (GET_MODE_SIZE (mode))
+    {
+    case 1:
+      *max_offset = 31;
+      *alignment = 1;
+      break;
+
+    case 2:
+      *max_offset = 63;
+      *alignment = 2;
+      break;
+
+    default:
+      *max_offset = 127 - GET_MODE_SIZE (mode);
+      *alignment = 4;
+    }
+}
+
+/* Returns the min and max address offset relative to the specified
+   base register.  */
+
+void
+arm_address_offset_range (enum machine_mode mode, const_rtx base_reg,
+                          HOST_WIDE_INT *min_offset,
+                          HOST_WIDE_INT *max_offset,
+                          int *alignment)
+{
+  HOST_WIDE_INT mask;
+
+  /* Currently only deal with pseudo base register.  */
+  gcc_assert (!base_reg || !HARD_REGISTER_P (base_reg));
+
+  if (TARGET_ARM)
+    arm_address_offset_range_1 (mode, base_reg,
+                                min_offset, max_offset, alignment);
+  else if (TARGET_THUMB2)
+    thumb2_address_offset_range (mode, base_reg,
+                                 min_offset, max_offset, alignment);
+  else /* (TARGET_THUMB1) */
+    thumb1_address_offset_range (mode, base_reg,
+                                 min_offset, max_offset, alignment);
+
+  mask = ~(*alignment - 1);
+  *max_offset = *max_offset & mask;
+  *min_offset = (*min_offset + *alignment - 1) & mask;
+}
+
 /* Build the SYMBOL_REF for __tls_get_addr.  */

 static GTY(()) rtx tls_get_addr_libfunc;

Index: extract-common-addr.c
===================================================================
--- extract-common-addr.c       (revision 0)
+++ extract-common-addr.c       (revision 0)
@@ -0,0 +1,616 @@
+/* Extract common address calculation for accesses to
+   fields of large struct.
+   Copyright (C) 2009
+   Free Software Foundation, Inc.
+   Contributed by Carrot Wei <carrot@google.com>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/* This file contains optimization for complex address calculation for
+   large structures.
+
+   Suppose we have a struct defined as:
+   typedef struct large_struct
+   {
+     char dummy[400];
+     int  field1;
+     int  field2;
+   } large_struct_t;
+
+   Now we want to access individual fields through a pointer:
+
+   large_struct_t *p = ...;
+   ... = p->field1;
+   ... = p->field2;
+
+   If the load instruction of the target architecture can't support such
+   a large offset, gcc usually generates following insns to access struct
+   fields:
+
+   (set r200 400)                     # 400 is offset of field1
+   (set r201 (mem (plus r100 r200)))  # r100 contains pointer p
+   ...
+   (set r300 404)                     # 404 is offset of field2
+   (set r301 (mem (plus r100 r300)))  # r100 contains pointer p
+
+   Sometimes the large constant(offset) loading needs more than one
+   instruction, it makes things worse.
+
+   One possible optimization for multiple accesses is to first compute a new
+   base that is nearer to the fields that we want to access, then use the
+   normal (base + offset) addressing mode to access them. So ideally the
+   previous insns should be transformed to:
+
+   (set r101 (plus r100 400))
+   (set r201 (mem (plus r101 0)))
+   ...
+   (set r301 (mem (plus r101 4)))
+
+   The actual transformation done by this pass is:
+
+   (set r200 400)                     # keep the original insn
+   (set r250 (plus r100 400))         # r250 is new base
+   (set r201 (mem (plus r250 0)))
+   ...
+   (set r300 404)
+   (set r251 (plus r100 400))         # r251 contains same value as r250
+   (set r301 (mem (plus r251 4)))
+
+   And let dce and cse do the rest.
+
+   The implementation can be divided into 2 parts:
+
+   1. Collect instruction sequences like
+
+   (set r200 400)
+   (set r201 (mem (plus r100 r200)))
+
+   For simplicity the corresponding two insns must reside in the same basic
+   block. For example, the instruction sequence for p->feild1 style accesses
+   should always be in the same basic block.
+
+   2. For each base register, split the corresponding memory accesses into
+   groups. Create a new base value for each group and modify the memory
+   accesses to use the new base value and offset.
+*/
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "basic-block.h"
+#include "alloc-pool.h"
+#include "flags.h"
+#include "insn-config.h"
+#include "recog.h"
+#include "expr.h"
+#include "target.h"
+#include "tree-pass.h"
+#include "vec.h"
+#include "df.h"
+#include "timevar.h"
+
+/* Memory offset information.  */
+struct offset_info
+{
+  rtx mem_insn;              /* The insn does memory access.  */
+  rtx mem;                   /* The mem rtx which uses this offset.  */
+  unsigned int offset_regno; /* The regno of the dest of the set.  */
+  enum machine_mode mode;    /* The dest machine mode.  */
+  HOST_WIDE_INT offset;      /* The offset constant.  */
+};
+
+typedef struct offset_info *offset_info_t;
+DEF_VEC_P(offset_info_t);
+DEF_VEC_ALLOC_P(offset_info_t, heap);
+
+/* Memory base address information.  */
+struct base_reg_info
+{
+  /* All offset information used with this base register.  */
+  VEC(offset_info_t, heap) *offsets;
+  unsigned int base_regno;     /* Base register number.  */
+  rtx base_reg;                /* Base register rtx.  */
+  enum machine_mode mode;      /* Base register machine mode.  */
+};
+
+/* Offset table, indexed by the offset register number.  */
+static struct offset_info **offset_table;
+
+/* Base register table, indexed by the base register number.  */
+static struct base_reg_info *base_table;
+
+/* Memory pool for offset_info elements.  */
+static alloc_pool offset_element_pool;
+
+/* Bitmap of offset_table, used to speed up resetting offset_table.  */
+static bitmap offset_bitmap;
+
+/* Given a register rtx, look up the offset_table. If we found it and they
+   have same machine mode returns the pointer to the offset information.
+   Otherwise returns NULL.  */
+
+static inline struct offset_info *
+find_offset_reg (rtx x)
+{
+  struct offset_info *offset_item;
+
+  gcc_assert (REG_P (x));
+  offset_item = offset_table[REGNO (x)];
+  if (offset_item == NULL)
+    return NULL;
+  if (offset_item->mode != GET_MODE (x))
+    return NULL;
+  return offset_item;
+}
+
+/* If x is MEM, and the address is in the form of
+   (plus base_reg offset_reg)
+   remove the corresponding item from offset table and
+   insert it into a base_reg_info object.
+   This function is called via for_each_rtx.
+   data is the insn containing x.  */
+
+static int
+find_mem_with_offset_reg (rtx *x, void *data)
+{
+  HOST_WIDE_INT min_offset, max_offset;
+  int alignment;
+  struct base_reg_info *base_reg_item;
+  struct offset_info *offset_item;
+  rtx reg1, reg2, address;
+
+  if (!MEM_P (*x))
+    return 0;
+
+  /* Look for (plus reg1 reg2) pattern.  */
+  address = XEXP (*x, 0);
+  if (GET_CODE (address) != PLUS)
+    return 0;
+  reg1 = XEXP (address, 0);
+  reg2 = XEXP (address, 1);
+  if (!REG_P (reg1) || !REG_P (reg2))
+    return 0;
+
+  /* Look for addressing offset. If necessary swap reg1 and reg2
+     so that reg1 always contains base address and reg2 always
+     contains offset.  */
+  offset_item = find_offset_reg (reg2);
+  if (offset_item == NULL)
+    {
+      rtx t = reg1;
+      reg1 = reg2;
+      reg2 = t;
+      offset_item = find_offset_reg (reg2);
+      if (offset_item == NULL)
+        return -1;
+    }
+
+  /* Currently only deals with pseudo base register. Other register
+     classes are also possible but need more complicated
+     implementation of targetm.get_address_offset_range.  */
+  if (HARD_REGISTER_P (reg1))
+    return -1;
+
+  /* If the base register and offset register don't have the same
+     machine mode, ignore this offset.  */
+  if (GET_MODE (reg1) != offset_item->mode)
+    return -1;
+
+  /* If the offset is not large enough, ignore it.  */
+  targetm.get_address_offset_range (GET_MODE (*x), reg1,
+                                    &min_offset, &max_offset, &alignment);
+  if (offset_item->offset >= min_offset
+      && offset_item->offset <= max_offset)
+    return -1;
+
+  /* Look up the corresponding base register information.
+     If there is not one, create it.  */
+  base_reg_item = &base_table[REGNO (reg1)];
+  if (base_reg_item->offsets == NULL)
+    {
+      base_reg_item->mode = GET_MODE (reg1);
+      base_reg_item->base_regno = REGNO (reg1);
+      base_reg_item->base_reg = reg1;
+      base_reg_item->offsets = VEC_alloc (offset_info_t, heap, 10);
+    }
+
+  /* Associate this offset information with the base information.  */
+  offset_item->mem = *x;
+  offset_item->mem_insn = (rtx) data;
+  VEC_safe_push (offset_info_t, heap, base_reg_item->offsets, offset_item);
+
+  /* Remove the offset information from offset table, since we don't
+     expect it will be used more than once.  */
+  offset_table[offset_item->offset_regno] = NULL;
+  bitmap_clear_bit (offset_bitmap, offset_item->offset_regno);
+  return -1;
+}
+
+/* If a reg is clobbered in insn, remove it from offset table.  */
+
+static void
+clobber_insn_reg (rtx insn)
+{
+  df_ref *def_rec;
+
+  for (def_rec = DF_INSN_DEFS (insn); *def_rec; def_rec++)
+    {
+      df_ref def = *def_rec;
+      rtx reg = DF_REF_REAL_REG (def);
+      unsigned int regno = REGNO (reg);
+      if (offset_table[regno] != NULL)
+        {
+          pool_free (offset_element_pool, offset_table[regno]);
+          offset_table[regno] = NULL;
+          bitmap_clear_bit (offset_bitmap, regno);
+        }
+    }
+}
+
+/* If this insn is in the form of
+   (set reg offset)
+   insert this offset info into offset table.  */
+
+static void
+find_offset (rtx insn)
+{
+  rtx src, dest;
+  unsigned int regno;
+  HOST_WIDE_INT offset_val;
+  struct offset_info *offset;
+  rtx x = single_set (insn);
+  if (!x)
+    return;
+
+  dest = SET_DEST (x);
+  if (!REG_P (dest))
+    return;
+  regno = REGNO (dest);
+  if (HARD_REGISTER_NUM_P (regno))
+    return;
+
+  src = SET_SRC (x);
+  if (GET_CODE (src) != CONST_INT)
+    return;
+  offset_val = INTVAL (src);
+  if (offset_val <= 0)
+    return;
+
+  offset = (struct offset_info *) pool_alloc (offset_element_pool);
+  offset->mem = NULL;
+  offset->offset_regno = regno;
+  offset->mode = GET_MODE (dest);
+  offset->offset = offset_val;
+  gcc_assert (!offset_table[regno]);
+  offset_table[regno] = offset;
+  bitmap_set_bit (offset_bitmap, regno);
+}
+
+/* Look for optimization opportunities.  */
+
+static void
+collect_candidates (void)
+{
+  basic_block block;
+
+  FOR_EACH_BB (block)
+    {
+      rtx insn;
+      bitmap_iterator bi;
+      unsigned int regno;
+
+      /* Clear offset_table.  */
+      EXECUTE_IF_SET_IN_BITMAP (offset_bitmap, 0, regno, bi)
+        {
+          pool_free (offset_element_pool, offset_table[regno]);
+          offset_table[regno] = NULL;
+          bitmap_clear_bit (offset_bitmap, regno);
+        }
+
+      FOR_BB_INSNS (block, insn)
+        {
+          if (!INSN_P (insn))
+            continue;
+
+          for_each_rtx (&PATTERN (insn), find_mem_with_offset_reg, insn);
+
+          clobber_insn_reg (insn);
+
+          find_offset (insn);
+        }
+    }
+}
+
+/* Compare function used by qsort.  */
+
+static int
+compare_offset (const void *pa, const void *pb)
+{
+  struct offset_info * const *offset1 = (struct offset_info* const *)pa;
+  struct offset_info * const *offset2 = (struct offset_info* const *)pb;
+  if ((*offset1)->offset != (*offset2)->offset)
+    return (*offset1)->offset - (*offset2)->offset;
+  return GET_MODE ((*offset1)->mem) - GET_MODE ((*offset2)->mem);
+}
+
+/* Do the actual modification to a group of memory accesses.
+   base_reg contains the original base information.
+   base_offset will be added to original base to get new base.
+   count is the number of memory accesses in the group.
+   offset_array is the array of memory accesses.  */
+
+static void
+rewrite_memory_access (struct base_reg_info *base_reg,
+                       struct offset_info **offset_array,
+                       int count,
+                       HOST_WIDE_INT base_offset)
+{
+  int i;
+  for (i = 0; i < count; i++)
+    {
+      rtx new_base_reg, new_base, new_base_insn, new_offset, x;
+      struct offset_info *offset = offset_array[i];
+
+      /* Insert an insn to compute the new base reg.  */
+      new_base_reg = gen_reg_rtx (offset->mode);
+      new_base = gen_rtx_PLUS (offset->mode,
+                          base_reg->base_reg, GEN_INT (base_offset));
+      new_base_insn = gen_move_insn (new_base_reg, new_base);
+      emit_insn_before (new_base_insn, offset->mem_insn);
+
+      /* Modify the original MEM rtx to use new base and offset address.  */
+      new_offset = GEN_INT (offset->offset - base_offset);
+      x = XEXP (offset->mem, 0);
+      if (REGNO (XEXP (x, 0)) == base_reg->base_regno)
+        {
+          validate_change (offset->mem, &XEXP (x, 0), new_base_reg, 1);
+          validate_change (offset->mem, &XEXP (x, 1), new_offset, 1);
+        }
+      else
+        {
+          validate_change (offset->mem, &XEXP (x, 1), new_base_reg, 1);
+          validate_change (offset->mem, &XEXP (x, 0), new_offset, 1);
+        }
+
+      if (!apply_change_group ())
+        remove_insn(new_base_insn);
+
+      df_insn_rescan (offset->mem_insn);
+    }
+}
+
+/* Optimize a list of memory addressing relative to one base register
+   specified in base_reg.
+
+   For each memory access we've already known its offset, by consulting
+   backend we can also know the min and max possible offset constant that
+   can be used in the MEM rtx. So we can easily compute out the new base
+   value range by subtracting min_offset and max_offset from the actual
+   memory offset.
+
+   If two MEMs' new base value range overlap, they can be put into the same
+   group, and the group's base value range is the overlapped part. By this
+   method we can add MEM to this group one by one until the next MEM's base
+   value range doesn't overlap with current group. Now we can modify the
+   group of memory accesses with the low end of the group value range as the
+   new base value. A new group starts with the next MEM access.
+
+   For some architectures the constant offset part of MEM rtx has alignment
+   requirement. So we need to consider it when computing the overlapped base
+   value range.  */
+
+static void
+process_one_base (struct base_reg_info *base_reg)
+{
+  HOST_WIDE_INT max_offset, min_offset;
+  HOST_WIDE_INT base_offset, max_base;
+  int alignment, new_alignment;
+
+  offset_info_t *offset_array = VEC_address (offset_info_t, base_reg->offsets);
+  int count = VEC_length (offset_info_t, base_reg->offsets);
+  int i, group_count;
+
+  if (count < 2)
+    return;
+
+  /* Sort the memory accesses according to their offset.  */
+  qsort (offset_array, count, sizeof (offset_info_t), compare_offset);
+
+  /* group_count contains the number of MEM accesses should be relative to
+     the same new base address.
+     (base_offset + old_base) is the new base address of current group.
+     max_base is the maximum possible value of base_offset.
+     alignment is the offset requirement, not the memory alignment.  */
+  group_count = 1;
+  targetm.get_address_offset_range (GET_MODE (offset_array[0]->mem),
+                                    NULL, &min_offset, &max_offset,
+                                    &new_alignment);
+  base_offset = offset_array[0]->offset - max_offset;
+  max_base = offset_array[0]->offset - min_offset;
+  alignment = new_alignment;
+
+  /* Add the memory access to current group one by one until it is not
+     feasible to add the new memory access.  */
+  for (i = 1; i < count; i++)
+    {
+      HOST_WIDE_INT base2, max_base2;
+      int alignment2;
+
+      targetm.get_address_offset_range (GET_MODE (offset_array[i]->mem),
+              NULL, &min_offset, &max_offset, &new_alignment);
+      base2 = offset_array[i]->offset - max_offset;
+      alignment2 = alignment;
+
+      /* Adjust the new base_offset and alignment.  */
+      if (base2 < base_offset)
+        {
+          /* New base offset is less than current offset.  */
+          if (new_alignment > alignment)
+            {
+              /* But we have stricter alignment. We should use the new
+                 alignment and the aligned old offset as new base offset.  */
+              HOST_WIDE_INT delta = offset_array[i]->offset - base_offset;
+              alignment2 = new_alignment;
+              delta = delta & ~(alignment2 - 1);
+              base2 = offset_array[i]->offset - delta;
+            }
+          else
+            /* New alignment isn't stricter than old alignment.
+               We can simply use old base offset and alignment.  */
+            base2 = base_offset;
+        }
+      else
+        {
+          /* New base offset is greater than current offset.  */
+          if (new_alignment >= alignment)
+            /* New alignment is also stricter than old alignment.
+               We can simply use new base offset and new alignment.  */
+            alignment2 = new_alignment;
+          else
+            {
+              /* Old alignment is stricter than new alignment. We need to
+                 adjust new base value according to old alignment and keep
+                 the old alignment.  */
+              HOST_WIDE_INT delta = max_offset & ~(alignment - 1);
+              base2 = offset_array[i]->offset - delta;
+            }
+        }
+
+      /* Adjust max_base.  */
+      max_base2 = offset_array[i]->offset - min_offset;
+      if (new_alignment < alignment)
+        {
+          /* Old alignment is stricter than new alignment, adjust max_base
+             according to old alignment.  */
+          HOST_WIDE_INT mask = alignment - 1;
+          HOST_WIDE_INT aligned_min = (min_offset + alignment - 1) & mask;
+          max_base2 = offset_array[i]->offset - aligned_min;
+          /* If the new max_base is smaller, update max_base.*/
+          if (max_base2 < max_base)
+            max_base = max_base2;
+        }
+      else
+        {
+          /* New alignment is stricter than old alignment.  */
+          if (max_base2 > max_base)
+            {
+              /* New max_base is greater than old max_base, we must use the
+                 old one, and adjust its alignment.  */
+              HOST_WIDE_INT delta = offset_array[i]->offset - max_base;
+              delta = (delta + new_alignment - 1) & ~(new_alignment - 1);
+              max_base = offset_array[i]->offset - delta;
+            }
+          else
+            /* New max_base is less than old max_base, use the new one.  */
+            max_base = max_base2;
+        }
+
+      /* If the new base_offset is greater than the new max_base, this
+         memory access can't be put into current group.  */
+      if (base2 > max_base)
+        {
+          /* Only when there are two or more memory accesses in one group,
+             this optimization can be beneficial.  */
+          if (group_count > 1)
+          {
+            struct offset_info **start = &offset_array[i - group_count];
+            rewrite_memory_access (base_reg, start, group_count, base_offset);
+          }
+
+          /* Start a new group.  */
+          base_offset = offset_array[i]->offset - max_offset;
+          max_base = offset_array[i]->offset - min_offset;
+          alignment = new_alignment;
+          group_count = 0;
+        }
+      else
+        {
+          alignment = alignment2;
+          base_offset = base2;
+        }
+
+      group_count++;
+    }
+
+  /* The final group.  */
+  if (group_count > 1)
+    {
+      struct offset_info **start = &offset_array[i - group_count];
+      rewrite_memory_access (base_reg, start, group_count, base_offset);
+    }
+}
+
+/* This optimization can be enabled by providing an implementation of
+   targetm.get_address_offset_range.  */
+
+static bool
+gate_handle_common_addr (void)
+{
+  return targetm.get_address_offset_range != NULL &&  optimize > 0;
+}
+
+static unsigned int
+rest_of_handle_extract_common_addr (void)
+{
+  int i;
+  int reg_count = max_reg_num();
+  offset_table = XCNEWVEC (struct offset_info *, reg_count);
+  base_table = XCNEWVEC (struct base_reg_info, reg_count);
+  offset_bitmap = BITMAP_ALLOC (NULL);
+  offset_element_pool = create_alloc_pool ("offset pool",
+                                           sizeof (struct offset_info), 64);
+
+  /* Phase1, collect optimization cadidates.  */
+  collect_candidates ();
+
+  /* Phase2, create new base and rewrite related memory accesses.  */
+  for (i = 0; i < reg_count; i++)
+    {
+      if (base_table[i].offsets != NULL)
+        {
+          process_one_base(&base_table[i]);
+          VEC_free (offset_info_t, heap, base_table[i].offsets);
+        }
+    }
+
+  free_alloc_pool (offset_element_pool);
+  BITMAP_FREE (offset_bitmap);
+  free (base_table);
+  free (offset_table);
+  return 0;
+}
+
+struct rtl_opt_pass pass_extract_common_addr =
+{
+ {
+  RTL_PASS,
+  "extract_common_addr",                /* name */
+  gate_handle_common_addr,              /* gate */
+  rest_of_handle_extract_common_addr,   /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_EXTRACT_COMMON_ADDR,               /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_dump_func                        /* todo_flags_finish */
+ }
+};

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-06-03  7:00   ` Carrot Wei
@ 2009-06-12 23:19     ` Steven Bosscher
  2009-06-13  6:11       ` Adam Nemet
  0 siblings, 1 reply; 8+ messages in thread
From: Steven Bosscher @ 2009-06-12 23:19 UTC (permalink / raw)
  To: Carrot Wei; +Cc: gcc-patches

On Wed, Jun 3, 2009 at 8:59 AM, Carrot Wei<carrot@google.com> wrote:
> I've tested CSiBE, nearly no changes to code size and compile time. It
> seems there is no large structure in CSiBE to trigger this
> optimization. For mcf from CPU SPEC 2006, this optimization can reduce
> about 7% static instructions.

Alright, counts as an improvement worth pursuing.

What I was wondering though -- isn't this just a special case of Adam
Nemet's "constant anchor" additions to CSE?

You may want to check out
http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01786.html (esp. the
section about TARGET_CONST_ANCHOR.  Isn't this the same optimization
as what you implemented?

Ciao!
Steven

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-06-12 23:19     ` Steven Bosscher
@ 2009-06-13  6:11       ` Adam Nemet
  2009-06-15  8:16         ` Carrot Wei
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Nemet @ 2009-06-13  6:11 UTC (permalink / raw)
  To: Steven Bosscher; +Cc: Carrot Wei, gcc-patches

Steven Bosscher <stevenb.gcc@gmail.com> writes:

> On Wed, Jun 3, 2009 at 8:59 AM, Carrot Wei<carrot@google.com> wrote:
>> I've tested CSiBE, nearly no changes to code size and compile time. It
>> seems there is no large structure in CSiBE to trigger this
>> optimization. For mcf from CPU SPEC 2006, this optimization can reduce
>> about 7% static instructions.
>
> Alright, counts as an improvement worth pursuing.
>
> What I was wondering though -- isn't this just a special case of Adam
> Nemet's "constant anchor" additions to CSE?

No, but I think the existing related-value optimization in CSE might be
able to handle this.  You just need to avoid propagating the addition
into the mem.  I.e. instead of:

(set r200 400)                     # 400 is offset of field1
(set r201 (mem (plus r100 r200)))  # r100 contains struct base
...
(set r300 404)                     # 404 is offset of field2
(set r301 (mem (plus r100 r300)))  # r100 contains struct base

start with:

(set r200 400)
(set t1 (plus r100 r200))
(set r201 (mem t1))
...
(set r300 404)
(set t2 (plus r100 r300))
(set r301 (mem t2))

then t2 should be expressed with as t1 + 4 as a related value
(hopefully!) and then fwprop can now propagate into both mems.

Adam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-06-13  6:11       ` Adam Nemet
@ 2009-06-15  8:16         ` Carrot Wei
  2009-06-15 19:00           ` Adam Nemet
  0 siblings, 1 reply; 8+ messages in thread
From: Carrot Wei @ 2009-06-15  8:16 UTC (permalink / raw)
  To: Adam Nemet; +Cc: Steven Bosscher, gcc-patches

Thank you, Adam. Your method can solve part of the problems just as
you described. But my pass can expose more opportunities to CSE and
GCSE:

1. Due to lack of target specific information, CSE may do wrong
optimization. Suppose we have two byte memory accesses with address
(base+400) and (base+440), CSE may transform them to
     t1 = base + 400
     // t1 is used as address 1
     ...
     t2 = t1 + 40
     // t2 is used as address 2 to load a byte

   Actually in thumb mode byte loading can only have offset range of
[0, 31]. This transform does not help.

2. In other cases CSE may miss some optimization because it can only
use the available value. One simple example is:

   t1 = base + 404
   // t1 is used as address 1
   ...
   t2 = base + 400
   // t2 is used as address 2

  Address2 is 4 bytes smaller than address1, and used later than
address1. In thumb mode memory offset can't be negative. Current CSE
framework can't deal with this situation.

3. CSE works on superblock only. I hope this optimization can also be
done globally.

thanks
Carrot

On Sat, Jun 13, 2009 at 10:27 AM, Adam Nemet<anemet@caviumnetworks.com> wrote:
> Steven Bosscher <stevenb.gcc@gmail.com> writes:
>
>> On Wed, Jun 3, 2009 at 8:59 AM, Carrot Wei<carrot@google.com> wrote:
>>> I've tested CSiBE, nearly no changes to code size and compile time. It
>>> seems there is no large structure in CSiBE to trigger this
>>> optimization. For mcf from CPU SPEC 2006, this optimization can reduce
>>> about 7% static instructions.
>>
>> Alright, counts as an improvement worth pursuing.
>>
>> What I was wondering though -- isn't this just a special case of Adam
>> Nemet's "constant anchor" additions to CSE?
>
> No, but I think the existing related-value optimization in CSE might be
> able to handle this.  You just need to avoid propagating the addition
> into the mem.  I.e. instead of:
>
> (set r200 400)                     # 400 is offset of field1
> (set r201 (mem (plus r100 r200)))  # r100 contains struct base
> ...
> (set r300 404)                     # 404 is offset of field2
> (set r301 (mem (plus r100 r300)))  # r100 contains struct base
>
> start with:
>
> (set r200 400)
> (set t1 (plus r100 r200))
> (set r201 (mem t1))
> ...
> (set r300 404)
> (set t2 (plus r100 r300))
> (set r301 (mem t2))
>
> then t2 should be expressed with as t1 + 4 as a related value
> (hopefully!) and then fwprop can now propagate into both mems.
>
> Adam
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-06-15  8:16         ` Carrot Wei
@ 2009-06-15 19:00           ` Adam Nemet
  2009-06-16  8:58             ` Carrot Wei
  0 siblings, 1 reply; 8+ messages in thread
From: Adam Nemet @ 2009-06-15 19:00 UTC (permalink / raw)
  To: Carrot Wei; +Cc: Steven Bosscher, gcc-patches

Carrot Wei writes:
> Thank you, Adam. Your method can solve part of the problems just as
> you described.

Thanks for confirming.

> But my pass can expose more opportunities to CSE and GCSE:
> 
> 1. Due to lack of target specific information, CSE may do wrong
> optimization. Suppose we have two byte memory accesses with address
> (base+400) and (base+440), CSE may transform them to
>      t1 = base + 400
>      // t1 is used as address 1
>      ...
>      t2 = t1 + 40
>      // t2 is used as address 2 to load a byte
> 
>    Actually in thumb mode byte loading can only have offset range of
> [0, 31]. This transform does not help.

You should be able to describe this in TARGET_RTX_COSTS hook in your backend.

> 2. In other cases CSE may miss some optimization because it can only
> use the available value. One simple example is:
> 
>    t1 = base + 404
>    // t1 is used as address 1
>    ...
>    t2 = base + 400
>    // t2 is used as address 2
> 
>   Address2 is 4 bytes smaller than address1, and used later than
> address1. In thumb mode memory offset can't be negative. Current CSE
> framework can't deal with this situation.

Yes this case would be missed.  These CSE optimizations are most effective if
there are instructions to get from one expression to the other regardless of
the order, i.e. when the offsetting is symmetric.

I still think however that it would be better to have a new local pass
complementing rather than duplicating the related-value optimizations in CSE.
The pass could rearrange the order of additive expression to maximize reuse
and would be limited to targets that have this asymmetry.

> 3. CSE works on superblock only. I hope this optimization can also be
> done globally.

This has been discussed a few times already
(e.g. <http://gcc.gnu.org/ml/gcc/2009-03/msg00475.html>).

Adam

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH: PR/40314] extract common address for memory access to   fields of large struct
  2009-06-15 19:00           ` Adam Nemet
@ 2009-06-16  8:58             ` Carrot Wei
  0 siblings, 0 replies; 8+ messages in thread
From: Carrot Wei @ 2009-06-16  8:58 UTC (permalink / raw)
  To: Adam Nemet; +Cc: Steven Bosscher, gcc-patches

On Tue, Jun 16, 2009 at 2:07 AM, Adam Nemet<anemet@caviumnetworks.com> wrote:
> Carrot Wei writes:
>> Thank you, Adam. Your method can solve part of the problems just as
>> you described.
>
> Thanks for confirming.
>
>> But my pass can expose more opportunities to CSE and GCSE:
>>
>> 1. Due to lack of target specific information, CSE may do wrong
>> optimization. Suppose we have two byte memory accesses with address
>> (base+400) and (base+440), CSE may transform them to
>>      t1 = base + 400
>>      // t1 is used as address 1
>>      ...
>>      t2 = t1 + 40
>>      // t2 is used as address 2 to load a byte
>>
>>    Actually in thumb mode byte loading can only have offset range of
>> [0, 31]. This transform does not help.
>
> You should be able to describe this in TARGET_RTX_COSTS hook in your backend.
>
TARGET_RTX_COSTS hook can't handle this case. For TARGET_RTX_COSTS t2
= t1 + 40 is a simple add expression, it doesn't have a higher cost
than t2 = t1 + 20. But when put it into the memory access context,
it's different. If we are loading a byte, t2 = t1 + 40 costs more than
t2 = t1+20. If we are loading a word, they have same cost.

>> 2. In other cases CSE may miss some optimization because it can only
>> use the available value. One simple example is:
>>
>>    t1 = base + 404
>>    // t1 is used as address 1
>>    ...
>>    t2 = base + 400
>>    // t2 is used as address 2
>>
>>   Address2 is 4 bytes smaller than address1, and used later than
>> address1. In thumb mode memory offset can't be negative. Current CSE
>> framework can't deal with this situation.
>
> Yes this case would be missed.  These CSE optimizations are most effective if
> there are instructions to get from one expression to the other regardless of
> the order, i.e. when the offsetting is symmetric.
>
> I still think however that it would be better to have a new local pass
> complementing rather than duplicating the related-value optimizations in CSE.
> The pass could rearrange the order of additive expression to maximize reuse
> and would be limited to targets that have this asymmetry.
>
This new pass doesn't simply duplicate the related-value
optimizations. It exposes those common address part in memory access
instructions. It still needs CSE and GCSE help to actually remove the
duplicated address calculation.

Actually most of my code are dealing with memory access instructions.
The related-value optimization part is only in function
process_one_base. There is still big difference in them. CSE only uses
known value to optimize later expression. This new pass can compute
optimal base value for a group of memory access. The new pass also
considers memory access characters, such as offset range and
alignment. These can't be handled by CSE easily.

>> 3. CSE works on superblock only. I hope this optimization can also be
>> done globally.
>
> This has been discussed a few times already
> (e.g. <http://gcc.gnu.org/ml/gcc/2009-03/msg00475.html>).
>
> Adam
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2009-06-16  8:04 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-31  9:31 [PATCH: PR/40314] extract common address for memory access to fields of large struct Carrot Wei
2009-06-01  8:13 ` Steven Bosscher
2009-06-03  7:00   ` Carrot Wei
2009-06-12 23:19     ` Steven Bosscher
2009-06-13  6:11       ` Adam Nemet
2009-06-15  8:16         ` Carrot Wei
2009-06-15 19:00           ` Adam Nemet
2009-06-16  8:58             ` Carrot Wei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).