public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* new sign/zero extension elimination pass
@ 2010-10-18 15:54 Tom de Vries
  2010-10-18 16:03 ` Andrew Pinski
                   ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Tom de Vries @ 2010-10-18 15:54 UTC (permalink / raw)
  To: gcc-patches; +Cc: Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 2096 bytes --]

I created a new sign/zero extension elimination pass.

The motivating example for this pass is:

  void f(unsigned char *p, short s, int c, int *z)
    {
      if (c)
        *z = 0;
      *p ^= (unsigned char)s;
    }

For MIPS, compilation results in the following insns.

  (set (reg/v:SI 199)
       (sign_extend:SI (subreg:HI (reg:SI 200) 2)))

  ...

  (set (reg:QI 203)
       (subreg:QI (reg/v:SI 199) 3))

These insns are the only def and the only use of reg 199, each located in a
different bb.

The sign-extension preserves the lower half of reg 200 and copies it to
reg 199, and the subreg use of reg 199 only reads the least significant byte.
The sign extension is therefore redundant (the extension part, not the copy
part), and can safely be replaced with a regcopy from reg 200 to reg 199:

  (set (reg/v:SI 199)
       (reg:SI 200))


This new pass manages to analyze this pattern, and replace the sign_extend with
a regcopy, which results in 1 less instruction in the assembly. The other passes
that eliminate sign/zero extensions do no manage to do that. Combine doesn't
work since the def and the use of reg 199 are in a different bb. Implicit_zee
does not work here since it only combines an extension with the defs of its src
operand, which is not applicable in this case.

The pass does a couple of linear scans over the insns, so it's not expensive in
terms of runtime.

See ee.c for a more detailed writeup related to motivating example, comparison
to other passes that do sign/zero extension elimination, intended effect,
implementation and limitations.

The pass has been tested on X86_64, MIPS, ARM. An earlier version of the pass
was tested on PPC as well. All on Linux host.

At the moment there are 2 known issue:
- a bug on ARM, which I'm fixing currently.
- an assert while building on MIPS. I hit the assert
    gcc_assert ((*inner_mode == *outer_mode) != (*extend != UNKNOWN));
  in loop-iv.c:get_biv_step. I disabled it and no tests fell over, but that
  needs more investigation.

I would like review comments on the general idea of the pass.

- Tom de Vries

[-- Attachment #2: ee.submission.1.patch --]
[-- Type: text/x-patch, Size: 31902 bytes --]

Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h	(revision 165080)
+++ gcc/tree-pass.h	(working copy)
@@ -479,6 +479,7 @@
 extern struct rtl_opt_pass pass_initial_value_sets;
 extern struct rtl_opt_pass pass_unshare_all_rtl;
 extern struct rtl_opt_pass pass_instantiate_virtual_regs;
+extern struct rtl_opt_pass pass_ee;
 extern struct rtl_opt_pass pass_rtl_fwprop;
 extern struct rtl_opt_pass pass_rtl_fwprop_addr;
 extern struct rtl_opt_pass pass_jump2;
Index: gcc/opts.c
===================================================================
--- gcc/opts.c	(revision 165080)
+++ gcc/opts.c	(working copy)
@@ -823,6 +823,7 @@
   flag_tree_switch_conversion = opt2;
   flag_ipa_cp = opt2;
   flag_ipa_sra = opt2;
+  flag_ee = opt2;
 
   /* Track fields in field-sensitive alias analysis.  */
   set_param_value ("max-fields-for-field-sensitive",
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def	(revision 165080)
+++ gcc/timevar.def	(working copy)
@@ -179,6 +179,7 @@
 DEFTIMEVAR (TV_VARCONST              , "varconst")
 DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
 DEFTIMEVAR (TV_JUMP                  , "jump")
+DEFTIMEVAR (TV_EE                    , "extension elimination")
 DEFTIMEVAR (TV_FWPROP                , "forward prop")
 DEFTIMEVAR (TV_CSE                   , "CSE")
 DEFTIMEVAR (TV_DCE                   , "dead code elimination")
Index: gcc/ee.c
===================================================================
--- gcc/ee.c	(revision 0)
+++ gcc/ee.c	(revision 0)
@@ -0,0 +1,907 @@
+/* Redundant extension elimination 
+   Copyright (C) 2010 Free Software Foundation, Inc.
+   Contributed by Tom de Vries (tom@codesourcery.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/*
+
+  MOTIVATING EXAMPLE
+
+  The motivating example for this pass is:
+
+    void f(unsigned char *p, short s, int c, int *z)
+    {
+      if (c)
+        *z = 0;
+      *p ^= (unsigned char)s;
+    }
+
+  For MIPS, compilation results in the following insns.
+
+    (set (reg/v:SI 199)
+         (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
+
+    ...
+
+    (set (reg:QI 203)
+         (subreg:QI (reg/v:SI 199) 3))
+
+  These insns are the only def and the only use of reg 199, each located in a
+  different bb.
+
+  The sign-extension preserves the lower half of reg 200 and copies them to 
+  reg 199, and the subreg use of reg 199 only reads the least significant byte.
+  The sign extension is therefore redundant (the extension part, not the copy 
+  part), and can safely be replaced with a regcopy from reg 200 to reg 199.
+
+
+  OTHER SIGN/ZERO EXTENSION ELIMINATION PASSES
+
+  There are other passes which eliminate sign/zero-extension: combine and
+  implicit_zee. Both attempt to eliminate extensions by combining them with
+  other instructions. The combine pass does this at bb level,
+  implicit_zee works at inter-bb level.
+
+  The combine pass combine an extension with either:
+  - all uses of the extension, or 
+  - all defs of the operand of the extension.
+  The implicit_zee pass only implements the latter.
+
+  For our motivating example, combine doesn't work since the def and the use of
+  reg 199 are in a different bb.
+
+  Implicit_zee does not work since it only combines an extension with the defs
+  of its operand.
+
+
+  INTENDED EFFECT
+
+  This pass works by removing sign/zero-extensions, or replacing them with
+  regcopies. The idea there is that the regcopy might be eliminated by a later
+  pass. In case the regcopy cannot be eliminated, it might at least be cheaper
+  than the extension.
+
+
+  IMPLEMENTATION
+
+  The pass scans a number of times over all instructions.
+  
+  The first scan collects all extensions.
+
+  The second scan registers all uses of a reg in the biggest_use array. After
+  that first scan, the biggest_use array contains the size in bits of the
+  biggest use of each reg, which allows us to find redundant extensions.
+
+  A number of backward scans, bounded by --param ee-max-propagate=<n> is done
+  in which information from the previous scan is used. As a runtime improvement,
+  also information from the current scan is used in propagation, if we know that
+  the information from the current scan is final (all uses have been 
+  registered). If there are no more changes in size of biggest use, or there are
+  no more extensions, no new scan is initiated.
+
+  After the last scan, redundant extensions are deleted or replaced.
+
+  In case that the src and dest reg of the replacement are not of the same size,
+  we do not replace with a normal regcopy, but with a truncate or with the copy
+  of a paradoxical subreg instead.
+
+
+  LIMITATIONS
+
+  The scope of the analysis is limited to an extension and its uses. The other
+  type of analysis (related to the defs of the operand of an extension) is not
+  done.
+
+  Furthermore, we do the analysis of biggest use per reg. So when determining
+  whether an extension is redundant, we take all uses of a the dest reg into
+  account, also the ones that are not uses of the extension. This could be
+  overcome by calculating the def-use chains and using those for analysis
+  instead.
+
+  Finally, there is propagation of information, but not in its most runtime
+  efficient form. In order to have that a text-book backward iterative approach
+  is needed.  */
+
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "tm_p.h"
+#include "flags.h"
+#include "regs.h"
+#include "hard-reg-set.h"
+#include "basic-block.h"
+#include "insn-config.h"
+#include "function.h"
+#include "expr.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "toplev.h"
+#include "target.h"
+#include "timevar.h"
+#include "optabs.h"
+#include "insn-codes.h"
+#include "rtlhooks-def.h"
+#include "output.h"
+#include "params.h"
+#include "timevar.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+
+#define SKIP_REG (-1)
+
+/* Array to register the biggest use of a reg, in bits.  */
+
+static int *biggest_use;
+
+/* Array to register the amount of uses of a reg.  */
+
+static int *nr_uses;
+
+/* Arrays to keep the data from the previous run available.  */
+
+static int *prev_biggest_use;
+static int *prev_nr_uses;
+
+/* Variable that indicates whether the biggest_use info has changed relative to
+   the previous run. */
+
+bool changed;
+
+/* Vector that contains the extensions in the function.  */
+
+VEC (rtx,heap) *extensions;
+
+/* Vector that contains the extensions in the function that are going to be
+   removed or replaced.  */
+
+VEC (rtx,heap) *redundant_extensions;
+
+/* Forward declarations.  */
+
+static void note_use (rtx *x, void *data);
+static bool skip_reg_p (int regno);
+
+/* Convenience macro to swap 2 variables.  */
+
+#define SWAP(T, a, b) do { T tmp; tmp = (a); (a) = (b); (b) = tmp; } while (0)
+
+/* Check whether this is a paradoxical subreg. */
+
+static bool
+paradoxical_subreg_p (rtx subreg)
+{
+  enum machine_mode subreg_mode, reg_mode;
+
+  if (GET_CODE (subreg) != SUBREG)
+    return false;
+
+  subreg_mode = GET_MODE (subreg);
+  reg_mode = GET_MODE (SUBREG_REG (subreg));
+
+  if (GET_MODE_SIZE (subreg_mode) > GET_MODE_SIZE (reg_mode))
+    return true;
+
+  return false;
+}
+
+/* Get the size and reg number of a REG or SUBREG use.  */
+
+static bool
+reg_use_p (rtx use, int *size, unsigned int *regno)
+{
+  rtx reg;
+      
+  if (REG_P (use))
+    {
+      *regno = REGNO (use);
+      *size = GET_MODE_BITSIZE (GET_MODE (use));
+      return true;
+    }
+  else if (GET_CODE (use) == SUBREG)
+    {
+      reg = SUBREG_REG (use);
+
+      if (!REG_P (reg))
+        return false;
+
+      *regno = REGNO (reg);
+
+      if (paradoxical_subreg_p (use))
+        *size = GET_MODE_BITSIZE (GET_MODE (reg));
+      else
+        *size = subreg_lsb (use) + GET_MODE_BITSIZE (GET_MODE (use));
+
+      return true;
+    }
+
+  return false;
+}
+
+/* Register the use of a reg.  */
+
+static void
+register_use (int size, unsigned int regno)
+{
+  int *current;
+  int prev;
+  
+  gcc_assert (size >= 0);
+  gcc_assert (regno < (unsigned int)max_reg_num ());
+
+  current = &biggest_use[regno];  
+
+  if (*current == SKIP_REG)
+    return;
+
+  *current = MAX (*current, size);
+  nr_uses[regno]++;
+
+  if (prev_nr_uses == NULL)
+    return;
+
+  prev = prev_biggest_use[regno];
+  if (nr_uses[regno] == prev_nr_uses[regno] && *current < prev)
+    {
+      if (dump_file)
+        fprintf (dump_file, "reg %d: size %d -> %d\n", regno, prev, *current);
+      changed = true;
+    }
+}
+
+/* Get the biggest use of a reg from a previous scan. In case all uses are
+   already registered in the current pass, use the biggest use from the current
+   pass.  */
+
+static int
+get_prev_biggest_use (unsigned int regno)
+{
+  int current, res;
+  
+  gcc_assert (!skip_reg_p (regno));
+
+  res = prev_biggest_use[regno];
+
+  current = biggest_use[regno];
+  gcc_assert (current <= res);
+  if (nr_uses[regno] == prev_nr_uses[regno] && current < res)
+    res = current;
+
+  gcc_assert (res >= 0);
+  return res;
+}
+
+/* Handle embedded uses.  */
+
+static void
+note_embedded_uses (rtx use, rtx pattern)
+{
+  const char *format_ptr;
+  int i, j;
+
+  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
+  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
+    if (format_ptr[i] == 'e')
+      note_use (&XEXP (use, i), pattern);
+    else if (format_ptr[i] == 'E')
+      for (j = 0; j < XVECLEN (use, i); j++)
+        note_use (&XVECEXP (use, i, j), pattern);
+}
+
+/* Get the set that has use as its SRC operand.  */
+
+static rtx
+get_set (rtx use, rtx pattern)
+{
+  rtx sub;
+  int i;
+
+  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
+    return pattern;
+
+  if (GET_CODE (pattern) == PARALLEL)
+    for (i = 0; i < XVECLEN (pattern, 0); ++i)
+      {
+        sub = XVECEXP (pattern, 0, i);
+        if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
+          return sub;
+      }
+  
+  return NULL_RTX;
+}
+
+/* Handle a restricted op use. In this context restricted means that a bit in an
+   operand influences only the same bit or more significant bits in the result.
+   The bitwise ops are a subclass, but PLUS is one as well.  */
+
+static void
+note_restricted_op_use (rtx use, unsigned int nr_operands, rtx pattern)
+{
+  unsigned int i, smallest;
+  int operand_size[2];
+  int used_size;
+  unsigned int operand_regno[2];
+  bool operand_reg[2];
+  bool operand_ignore[2];
+  rtx set;
+
+  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
+  for (i = 0; i < nr_operands; ++i)
+    {
+      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
+                                  &operand_regno[i]);
+      operand_ignore[i] = false;
+    }
+
+  /* Handle case of reg and-masked with const.  */
+  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
+    {
+      used_size =
+        HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
+      operand_size[0] = MIN (operand_size[0], used_size);
+    }
+
+  /* Handle case of reg or-masked with const.  */
+  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
+    {
+      used_size =
+        HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
+      operand_size[0] = MIN (operand_size[0], used_size);
+    }
+
+  /* Ignore the use of a in 'a = a + b'.  */
+  set = get_set (use, pattern);
+  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG. */
+  if (set != NULL_RTX && REG_P (SET_DEST (set)))
+    for (i = 0; i < nr_operands; ++i)
+      operand_ignore[i] = (operand_reg[i]
+                           && (REGNO (SET_DEST (set)) == operand_regno[i]));
+
+  /* Handle the case a reg is combined with don't care bits.  */
+  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
+      && operand_size[0] != operand_size[1])
+    {
+      smallest = operand_size[0] > operand_size[1];
+
+      if (paradoxical_subreg_p (XEXP (use, smallest))
+          && !SUBREG_PROMOTED_VAR_P (XEXP (use, smallest)))
+        operand_size[1 - smallest] = operand_size[smallest];
+    }
+
+  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG. */
+  if (prev_biggest_use != NULL && set != NULL_RTX && REG_P (SET_DEST (set))
+      && !skip_reg_p (REGNO (SET_DEST (set))))
+    for (i = 0; i < nr_operands; ++i)
+      operand_size[i] = MIN (operand_size[i],
+                             get_prev_biggest_use (REGNO (SET_DEST (set))));
+
+  /* Register the operand use, if necessary.  */
+  for (i = 0; i < nr_operands; ++i)
+    if (!operand_reg[i])
+      note_use (&XEXP (use, i), pattern);
+    else if (!operand_ignore[i])
+      register_use (operand_size[i], operand_regno[i]);
+}
+
+/* Handle all uses noted by note_uses.  */
+
+static void
+note_use (rtx *x, void *data)
+{
+  rtx use = *x;
+  rtx pattern = (rtx)data;
+  int use_size;
+  unsigned int use_regno;
+  rtx set;
+
+  switch (GET_CODE (use))
+    {
+    case REG:
+    case SUBREG:
+      if (!reg_use_p (use, &use_size, &use_regno))
+        {
+          note_embedded_uses (use, pattern);
+          return;
+        }
+      if (prev_biggest_use != NULL)
+        {
+          set = get_set (use, pattern);
+          /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG. */
+          if (set != NULL_RTX && REG_P (SET_DEST (set))
+              && !skip_reg_p (REGNO (SET_DEST (set))))
+            use_size = MIN (use_size,
+                            get_prev_biggest_use (REGNO (SET_DEST (set))));
+        }
+      register_use (use_size, use_regno);
+      return;
+    case IOR:
+    case AND:
+    case XOR:
+    case PLUS:
+    case MINUS:
+      note_restricted_op_use (use, 2, pattern);
+      return;
+    case NOT:
+    case NEG:
+      note_restricted_op_use (use, 1, pattern);
+      return;
+    case ASHIFT:
+      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno)
+	  || !CONST_INT_P (XEXP (use, 1))
+          || INTVAL (XEXP (use, 1)) <= 0
+          || paradoxical_subreg_p (XEXP (use, 0)))
+        {
+          note_embedded_uses (use, pattern);
+          return;
+        }
+      register_use (use_size - INTVAL (XEXP (use, 1)), use_regno);
+      return;
+    default:
+      note_embedded_uses (use, pattern);
+      return;
+    }
+}
+
+/* Check whether reg is implicitly used.  */
+
+static bool
+implicit_use_p (int regno ATTRIBUTE_UNUSED)
+{
+#ifdef EPILOGUE_USES
+  if (EPILOGUE_USES (regno))
+    return true;
+#endif
+
+#ifdef EH_USES
+  if (EH_USES (regno))
+    return true;
+#endif
+
+  return false;
+}
+
+/* Check whether reg should be skipped in analysis.  */
+
+static bool
+skip_reg_p (int regno)
+{
+  /* TODO: handle hard registers. The problem with hard registers is that 
+     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
+     We don't handle that properly.  */
+  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
+}
+
+/* Note the uses of argument registers in a call.  */
+
+static void
+note_call_uses (rtx insn)
+{
+  rtx link, link_expr;
+
+  if (!CALL_P (insn))
+    return;
+
+  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
+    {
+      link_expr = XEXP (link, 0);
+
+      if (GET_CODE (link_expr) == USE)
+        note_use (&XEXP (link_expr, 0), link);
+    }
+}
+
+/* Dump the biggest uses found.  */
+
+static void
+dump_biggest_use (void)
+{
+  int i;
+
+  if (!dump_file)
+    return;
+
+  for (i = 0; i < max_reg_num (); i++)
+    if (biggest_use[i] > 0)
+      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
+
+  fprintf (dump_file, "\n");
+}
+
+/* Calculate the biggest use mode for all regs. */
+
+static void
+calculate_biggest_use (void)
+{
+  int i;
+  basic_block bb;
+  rtx insn;
+
+  /* Initialize biggest_use for all regs to 0. If a reg is used implicitly, we
+     handle that reg conservatively and set it to SKIP_REG instead.  */
+  for (i = 0; i < max_reg_num (); i++)
+    {
+      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
+      nr_uses[i] = skip_reg_p (i) ? SKIP_REG : 0;
+    }
+
+  /* For all insns, call note_use for each use in insn.  */
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS_REVERSE (bb, insn)
+      {
+        if (!NONDEBUG_INSN_P (insn))
+          continue;
+
+        note_uses (&PATTERN (insn), note_use, PATTERN (insn));
+
+        if (CALL_P (insn))
+          note_call_uses (insn);
+      }
+}
+
+/* Check whether this is a sign/zero extension.  */
+
+static bool
+extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
+{
+  rtx src, op0;
+
+  /* Detect set of reg.  */
+  if (GET_CODE (PATTERN (insn)) != SET)
+    return false;
+
+  src = SET_SRC (PATTERN (insn));
+  *dest = SET_DEST (PATTERN (insn));
+          
+  if (!REG_P (*dest))
+    return false;
+
+  /* Detect sign or zero extension.  */
+  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
+      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
+    {
+      op0 = XEXP (src, 0);
+
+      /* Determine amount of least significant bits preserved by operation.  */
+      if (GET_CODE (src) == AND)
+        *preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
+      else
+        *preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
+
+      if (GET_CODE (op0) == SUBREG)
+        {
+          if (subreg_lsb (op0) != 0)
+            return false;
+      
+          *inner = SUBREG_REG (op0);
+          return true;
+        }
+      else if (REG_P (op0))
+        {
+          *inner = op0;
+          return true;
+        }
+    }
+
+  return false;
+}
+
+/* Find extensions and store them in the extensions vector.  */
+
+static bool
+find_extensions (void)
+{
+  basic_block bb;
+  rtx insn, dest, inner;
+  int preserved_size;
+
+  /* For all insns, call note_use for each use in insn.  */
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      {
+        if (!NONDEBUG_INSN_P (insn))
+          continue;
+
+        if (!extension_p (insn, &dest, &inner, &preserved_size))
+          continue;
+
+        VEC_safe_push (rtx, heap, extensions, insn);
+
+        if (dump_file)
+          fprintf (dump_file,
+                   "found extension %u with preserved size %d defining"
+                   " reg %d\n",
+                   INSN_UID (insn), preserved_size, REGNO (dest));
+      }
+
+  if (dump_file)
+    {
+      if (!VEC_empty (rtx, extensions))
+        fprintf (dump_file, "\n");
+      else
+        fprintf (dump_file, "no extensions found.\n");
+    }
+
+  return !VEC_empty (rtx, extensions);
+}
+
+/* Check whether this is a redundant sign/zero extension.  */
+
+static bool
+redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
+{
+  int biggest_dest_use;
+
+  if (!extension_p (insn, dest, inner, preserved_size))
+    return false;
+
+  biggest_dest_use = biggest_use[REGNO (*dest)];
+      
+  if (biggest_dest_use == SKIP_REG)
+    return false;
+
+  if (*preserved_size < biggest_dest_use)
+    return false;
+
+  return true;
+}
+
+/* Find the redundant extensions in the extensions vector and move them to the
+   redundant_extensions vector.  */
+
+static void
+find_redundant_extensions (void)
+{
+  rtx insn, dest, inner;
+  int ix;
+  bool found = false;
+  int preserved_size;
+
+  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
+    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
+      {
+        VEC_safe_push (rtx, heap, redundant_extensions, insn);
+        VEC_unordered_remove (rtx, extensions, ix);
+
+        if (dump_file)
+          fprintf (dump_file,
+                   "found superfluous extension %u with preserved size %d"
+                   " defining reg %d\n",
+                   INSN_UID (insn), preserved_size, REGNO (dest));
+        found = true;
+      }
+
+  if (dump_file && found)
+    fprintf (dump_file, "\n");
+}
+
+/* run calculate_biggest_use iteratively.  */
+
+static void
+propagate (void)
+{
+  unsigned int nr_propagate = 0;
+  unsigned int max_propagate = PARAM_VALUE (PARAM_EE_MAX_PROPAGATE);
+
+  while (nr_propagate < max_propagate && !VEC_empty (rtx, extensions))
+    {
+      ++nr_propagate;
+
+      if (dump_file)
+        fprintf (dump_file, "propagating(%u)\n", nr_propagate);
+
+      SWAP (int *, biggest_use, prev_biggest_use);
+      SWAP (int *, nr_uses, prev_nr_uses);
+
+      changed = false;
+      calculate_biggest_use ();
+      if (!changed)
+        break;
+
+      if (dump_file)
+        fprintf (dump_file, "\n");
+
+      find_redundant_extensions ();
+    }
+}
+
+/* Try to remove or replace the redundant extension.  */
+
+static void
+try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
+{
+  rtx cp_src, cp_dest, seq, one;
+
+  /* Check whether replacement is needed.  */
+  if (dest != inner)
+    {
+      start_sequence ();
+
+      /* Determine the proper replacement operation.  */
+      if (GET_MODE (dest) == GET_MODE (inner))
+        {
+          cp_src = inner;
+          cp_dest = dest;
+        }
+      else if (GET_MODE_SIZE (GET_MODE (dest))
+               > GET_MODE_SIZE (GET_MODE (inner)))
+        {
+          emit_clobber (dest);
+          cp_src = inner;
+          cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
+        }
+      else 
+        {
+          cp_src = gen_rtx_TRUNCATE (GET_MODE (dest), inner);
+          cp_dest = dest;
+        }
+
+      emit_move_insn (cp_dest, cp_src);
+
+      seq = get_insns ();
+      end_sequence ();
+
+      /* If the replacement is not supported, bail out.  */
+      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
+        if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
+          return;
+
+      /* Insert the replacement.  */
+      emit_insn_before (seq, insn);
+
+      if (dump_file)
+        {
+          fprintf (dump_file, "superfluous extension %u replaced by\n",
+                   INSN_UID (insn));
+          for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
+            fprintf (dump_file, " %u", INSN_UID (seq));
+          fprintf (dump_file, "\n");
+        }
+    }
+  else
+    if (dump_file)
+      fprintf (dump_file, "superfluous extension %u removed\n", INSN_UID (insn));
+
+  /* Remove the extension.  */
+  delete_insn (insn);  
+}
+
+/* Remove the redundant extensions.  */
+
+static void
+remove_redundant_extensions (void)
+{
+  rtx insn, dest, inner;
+  int preserved_size;
+  int ix;
+
+  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
+    {
+      extension_p (insn, &dest, &inner, &preserved_size);
+      try_remove_or_replace_extension (insn, dest, inner);
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\n");
+}
+
+/* Setup the variables at the start of the pass.  */
+
+static void
+init_pass (bool for_propagate)
+{
+  if (for_propagate)
+    {
+      prev_biggest_use = XNEWVEC (int, max_reg_num ());
+      prev_nr_uses = XNEWVEC (int, max_reg_num ());
+      return;
+    }
+
+  biggest_use = XNEWVEC (int, max_reg_num ());
+  nr_uses = XNEWVEC (int, max_reg_num ());
+
+  prev_biggest_use = NULL;
+  prev_nr_uses = NULL;
+
+  extensions = VEC_alloc (rtx, heap, 10); 
+  redundant_extensions = VEC_alloc (rtx, heap, 10);
+}
+
+/* Free the variables at the end of the pass.  */
+
+static void
+finish_pass (void)
+{
+  XDELETEVEC (biggest_use);
+  XDELETEVEC (nr_uses);
+  
+  XDELETEVEC (prev_biggest_use);
+  XDELETEVEC (prev_nr_uses);
+
+  VEC_free (rtx, heap, extensions);
+  VEC_free (rtx, heap, redundant_extensions);
+}
+
+/* Find redundant extensions and remove or replace them if possible.  */
+
+static void
+find_and_remove_redundant_extensions (void)
+{
+  init_pass (false);
+
+  if (!find_extensions ())
+    {
+      finish_pass ();
+      return;
+    }
+
+  calculate_biggest_use ();
+  dump_biggest_use ();
+
+  find_redundant_extensions ();
+
+  if (optimize >= 2 && !VEC_empty (rtx, extensions))
+    {
+      init_pass (true);
+      propagate ();
+    }
+
+  remove_redundant_extensions ();
+
+  finish_pass ();
+}
+
+/* Remove redundant extensions.  */
+
+static unsigned int
+rest_of_handle_ee (void)
+{
+  find_and_remove_redundant_extensions ();
+  return 0;
+}
+
+/* Run ee pass when flag_ee is set at optimization level > 0.  */
+
+static bool
+gate_handle_ee (void)
+{
+  return (optimize > 0 && flag_ee);
+}
+
+struct rtl_opt_pass pass_ee =
+{
+ {
+  RTL_PASS,
+  "ee",                                 /* name */
+  gate_handle_ee,                       /* gate */
+  rest_of_handle_ee,                    /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_EE,                                /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_ggc_collect |
+  TODO_dump_func |
+  TODO_verify_rtl_sharing,              /* todo_flags_finish */
+ }
+};
Index: gcc/common.opt
===================================================================
--- gcc/common.opt	(revision 165080)
+++ gcc/common.opt	(working copy)
@@ -761,6 +761,10 @@
 Common Report Var(flag_eliminate_dwarf2_dups)
 Perform DWARF2 duplicate elimination
 
+fextension-elimination
+Common Report Var(flag_ee) Init(0) Optimization
+Perform extension elimination
+
 fipa-sra
 Common Report Var(flag_ipa_sra) Init(0) Optimization
 Perform interprocedural reduction of aggregates
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in	(revision 165080)
+++ gcc/Makefile.in	(working copy)
@@ -1218,6 +1218,7 @@
 	dwarf2asm.o \
 	dwarf2out.o \
 	ebitmap.o \
+	ee.o \
 	emit-rtl.o \
 	et-forest.o \
 	except.o \
@@ -3107,6 +3108,11 @@
 web.o : web.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
    insn-config.h $(RECOG_H) $(DF_H) $(OBSTACK_H) $(TIMEVAR_H) $(TREE_PASS_H)
+ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
+   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
+   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
+   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) $(DIAGNOSTIC_CORE_H) \
+   $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h $(PARAMS_H) $(CGRAPH_H)
 implicit-zee.o : implicit-zee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
    $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c	(revision 165080)
+++ gcc/passes.c	(working copy)
@@ -976,6 +976,7 @@
       NEXT_PASS (pass_lower_subreg);
       NEXT_PASS (pass_df_initialize_opt);
       NEXT_PASS (pass_cse);
+      NEXT_PASS (pass_ee);
       NEXT_PASS (pass_rtl_fwprop);
       NEXT_PASS (pass_rtl_cprop);
       NEXT_PASS (pass_rtl_pre);
Index: gcc/params.def
===================================================================
--- gcc/params.def	(revision 165080)
+++ gcc/params.def	(working copy)
@@ -849,6 +849,12 @@
 	  "lto-min-partition",
 	  "Size of minimal paritition for WHOPR (in estimated instructions)",
 	  1000, 0, 0)
+
+DEFPARAM (PARAM_EE_MAX_PROPAGATE,
+	  "ee-max-propagate",
+	  "Maximum number of scans over all insn to do propagation",
+	  3, 0, 0)
+
 /*
 Local variables:
 mode:c
Index: gcc/testsuite/gcc.dg/extend-4.c
===================================================================
--- gcc/testsuite/gcc.dg/extend-4.c	(revision 0)
+++ gcc/testsuite/gcc.dg/extend-4.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+unsigned char f(unsigned int a)
+{
+  unsigned int b = a & 0x10ff;
+  return b;
+}
+
+/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "superfluous extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-1.c
===================================================================
--- gcc/testsuite/gcc.dg/extend-1.c	(revision 0)
+++ gcc/testsuite/gcc.dg/extend-1.c	(revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+void f(unsigned char * p, short s, int c, int *z)
+{
+  if (c)
+    *z = 0;
+  *p ^= (unsigned char)s;
+}
+
+/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "superfluous extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
Index: gcc/testsuite/gcc.dg/extend-2.c
===================================================================
--- gcc/testsuite/gcc.dg/extend-2.c	(revision 0)
+++ gcc/testsuite/gcc.dg/extend-2.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+/* { dg-require-effective-target ilp32 } */
+
+void f(unsigned char * p, short *s, int c)
+{
+  short or = 0;
+  while (c)
+    {
+      or = or | s[c];
+      c --;
+    }
+  *p = (unsigned char)or;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "superfluous extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-2-64.c
===================================================================
--- gcc/testsuite/gcc.dg/extend-2-64.c	(revision 0)
+++ gcc/testsuite/gcc.dg/extend-2-64.c	(revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+/* { dg-require-effective-target mips64 } */
+
+void f(unsigned char * p, short *s, int c)
+{
+  short or = 0;
+  while (c)
+    {
+      or = or | s[c];
+      c --;
+    }
+  *p = (unsigned char)or;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend:" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "superfluous extension \[0-9\]+ replaced" 3 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-3.c
===================================================================
--- gcc/testsuite/gcc.dg/extend-3.c	(revision 0)
+++ gcc/testsuite/gcc.dg/extend-3.c	(revision 0)
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+unsigned int f(unsigned char byte)
+{
+  return byte << 25;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump "superfluous extension \[0-9\]+ replaced" "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 15:54 new sign/zero extension elimination pass Tom de Vries
@ 2010-10-18 16:03 ` Andrew Pinski
  2010-10-18 16:59   ` Richard Guenther
  2010-10-21 10:44   ` Paolo Bonzini
  2010-10-22  9:15 ` Eric Botcazou
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 43+ messages in thread
From: Andrew Pinski @ 2010-10-18 16:03 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt

On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
> I created a new sign/zero extension elimination pass.
>
> The motivating example for this pass is:
>
>  void f(unsigned char *p, short s, int c, int *z)
>    {
>      if (c)
>        *z = 0;
>      *p ^= (unsigned char)s;
>    }
>
> For MIPS, compilation results in the following insns.
>
>  (set (reg/v:SI 199)
>       (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>
>  ...
>
>  (set (reg:QI 203)
>       (subreg:QI (reg/v:SI 199) 3))
>
> These insns are the only def and the only use of reg 199, each located in a
> different bb.


This sounds like a job for GCSE to do.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 16:03 ` Andrew Pinski
@ 2010-10-18 16:59   ` Richard Guenther
  2010-10-21 10:06     ` Tom de Vries
  2010-10-21 10:44   ` Paolo Bonzini
  1 sibling, 1 reply; 43+ messages in thread
From: Richard Guenther @ 2010-10-18 16:59 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

On Mon, Oct 18, 2010 at 5:42 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>> I created a new sign/zero extension elimination pass.
>>
>> The motivating example for this pass is:
>>
>>  void f(unsigned char *p, short s, int c, int *z)
>>    {
>>      if (c)
>>        *z = 0;
>>      *p ^= (unsigned char)s;
>>    }
>>
>> For MIPS, compilation results in the following insns.
>>
>>  (set (reg/v:SI 199)
>>       (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>>
>>  ...
>>
>>  (set (reg:QI 203)
>>       (subreg:QI (reg/v:SI 199) 3))
>>
>> These insns are the only def and the only use of reg 199, each located in a
>> different bb.
>
>
> This sounds like a job for GCSE to do.

The question is why it is expanded the way it is.  On x86 I just get

;; *p_3(D) = D.2690_7;

(insn 16 15 0 (parallel [
            (set (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
                (xor:QI (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
                    (subreg:QI (reg/v:HI 63 [ s ]) 0)))
            (clobber (reg:CC 17 flags))
        ]) t.c:5 -1
     (nil))

note that the tree level already has

  D.2688_4 = *p_3(D);
  D.2689_6 = (unsigned char) s_5(D);
  D.2690_7 = D.2689_6 ^ D.2688_4;
  *p_3(D) = D.2690_7;

so no extension.

Richard.

> Thanks,
> Andrew Pinski
>

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 16:59   ` Richard Guenther
@ 2010-10-21 10:06     ` Tom de Vries
  0 siblings, 0 replies; 43+ messages in thread
From: Tom de Vries @ 2010-10-21 10:06 UTC (permalink / raw)
  To: Richard Guenther; +Cc: Andrew Pinski, gcc-patches, Bernd Schmidt

Richard Guenther wrote:
> On Mon, Oct 18, 2010 at 5:42 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>>>
>>>  void f(unsigned char *p, short s, int c, int *z)
>>>    {
>>>      if (c)
>>>        *z = 0;
>>>      *p ^= (unsigned char)s;
>>>    }
>>>
>>> For MIPS, compilation results in the following insns.
>>>
>>>  (set (reg/v:SI 199)
>>>       (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>>>
>>>  ...
>>>
>>>  (set (reg:QI 203)
>>>       (subreg:QI (reg/v:SI 199) 3))
>>>
>>> These insns are the only def and the only use of reg 199, each located in a
>>> different bb.
>>
>> This sounds like a job for GCSE to do.
> 
> The question is why it is expanded the way it is.  On x86 I just get
> 
> ;; *p_3(D) = D.2690_7;
> 
> (insn 16 15 0 (parallel [
>             (set (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
>                 (xor:QI (mem:QI (reg/v/f:DI 62 [ p ]) [0 *p_3(D)+0 S1 A8])
>                     (subreg:QI (reg/v:HI 63 [ s ]) 0)))
>             (clobber (reg:CC 17 flags))
>         ]) t.c:5 -1
>      (nil))
> 
> note that the tree level already has
> 
>   D.2688_4 = *p_3(D);
>   D.2689_6 = (unsigned char) s_5(D);
>   D.2690_7 = D.2689_6 ^ D.2688_4;
>   *p_3(D) = D.2690_7;
> 
> so no extension.
> 
> Richard.
> 
>> Thanks,
>> Andrew Pinski
>>

The extension is generated for the 'short int s' function parameter by
assign_parm_setup_reg due to the MIPS setting of PROMOTE_MODE,
TARGET_PROMOTE_FUNCTION_MODE and TARGET_PROMOTE_PROTOTYPES. However, this
behavior is not reproducible for other targets that I tried (ARM, PPC, x86_64),
so this is probably not the best example to discuss between targets.

Maybe a better example is http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0.

The optimized tree code for MIPS is this:
...
dct2x2dc_dconly (short int[2] * d)
{
  int d1;
  int d0;
  short int D.1992;
  short unsigned int D.1991;
  short int D.1990;
  short unsigned int D.1989;
  short unsigned int D.1988;
  short unsigned int D.1987;
  int D.1986;
  short int D.1985;
  int D.1984;
  short int D.1983;
  short int[2] * D.1982;
  int D.1981;
  short int D.1980;
  int D.1979;
  short int D.1978;

<bb 2>:
  D.1978_2 = (*d_1(D))[0];
  D.1979_3 = (int) D.1978_2;
  D.1980_4 = (*d_1(D))[1];
  D.1981_5 = (int) D.1980_4;
  d0_6 = D.1981_5 + D.1979_3;
  D.1982_7 = d_1(D) + 4;
  D.1983_8 = (*D.1982_7)[0];
  D.1984_9 = (int) D.1983_8;
  D.1985_11 = (*D.1982_7)[1];
  D.1986_12 = (int) D.1985_11;
  d1_13 = D.1986_12 + D.1984_9;
  D.1987_14 = (short unsigned int) d0_6;
  D.1988_15 = (short unsigned int) d1_13;
  D.1989_16 = D.1988_15 + D.1987_14;
  D.1990_17 = (short int) D.1989_16;
  (*d_1(D))[0] = D.1990_17;
  D.1991_20 = D.1987_14 - D.1988_15;
  D.1992_21 = (short int) D.1991_20;
  (*d_1(D))[1] = D.1992_21;
  return;

}
...

The assignments:
...
  D.1987_14 = (short unsigned int) d0_6;
  D.1988_15 = (short unsigned int) d1_13;
...

are expanded into zero_extensions:
...
(insn 10 9 11 3 ext13.c:5 (set (reg:SI 204 [ D.1987+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 213) 2))) -1 (nil))

(insn 14 13 15 3 ext13.c:5 (set (reg:SI 205 [ D.1988+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 216) 2))) -1 (nil))
...
The same holds for ARM and PPC.

But for x86_64, the assignments are expanded into subreg copies:
...
(insn 9 8 10 3 (set (reg:HI 69 [ D.2798 ])
        (subreg:HI (reg:SI 78) 0)) test4.c:7 -1
     (nil))

(insn 13 12 14 3 (set (reg:HI 70 [ D.2799 ])
        (subreg:HI (reg:SI 81) 0)) test4.c:7 -1
     (nil))
...

AFAIU, this difference in behaviour is caused by the difference in PROMOTE_MODE,
which causes the short unsigned int D.1987 to live an SI reg for ARM, MIPS and
PPC, an in an HI reg for X86_64.

For x86_64, the subreg copies are combined by combine with other operations, and
don't result in extra operations.

For ARM, MIPS and PPC, the zero_extensions are not optimized away by any current
pass. Demonstrated here in the MIPS assembly:
...
	lh	$6,2($2)
	lh	$4,0($2)
	lh	$5,4($2)
	lh	$3,6($2)
	addu	$4,$6,$4
	addu	$3,$3,$5
	andi	$4,$4,0xffff    <---
	andi	$3,$3,0xffff    <---
	addu	$5,$3,$4
	subu	$3,$4,$3
	sh	$5,0($2)
	j	$31
	sh	$3,2($2)
...

The new pass removes the superfluous zero_extensions for ARM, MIPS and PPC.

Another example for which the new pass removes the superfluous extensions is
https://bugs.launchpad.net/gcc-linaro/+bug/634682.

- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 16:03 ` Andrew Pinski
  2010-10-18 16:59   ` Richard Guenther
@ 2010-10-21 10:44   ` Paolo Bonzini
  2010-10-21 11:00     ` Paolo Bonzini
  2010-10-21 17:21     ` Paolo Bonzini
  1 sibling, 2 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-21 10:44 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

On 10/18/2010 05:42 PM, Andrew Pinski wrote:
>> >  For MIPS, compilation results in the following insns.
>> >
>> >    (set (reg/v:SI 199)
>> >         (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>> >
>> >    ...
>> >
>> >    (set (reg:QI 203)
>> >         (subreg:QI (reg/v:SI 199) 3))
>> >
>> >  These insns are the only def and the only use of reg 199, each located in a
>> >  different bb.
>
> This sounds like a job for GCSE to do.

Actually, fwprop should _already_ do that if assuming simplify-rtx.c 
does the simplification of (subreg:QI (sign_extend:SI (subreg:HI (reg:SI 
200) 2))) 3).

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-21 10:44   ` Paolo Bonzini
@ 2010-10-21 11:00     ` Paolo Bonzini
  2010-10-21 17:21     ` Paolo Bonzini
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-21 11:00 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

On 10/18/2010 05:42 PM, Andrew Pinski wrote:
>> >  For MIPS, compilation results in the following insns.
>> >
>> >    (set (reg/v:SI 199)
>> >         (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>> >
>> >    ...
>> >
>> >    (set (reg:QI 203)
>> >         (subreg:QI (reg/v:SI 199) 3))
>> >
>> >  These insns are the only def and the only use of reg 199, each located in a
>> >  different bb.
>
> This sounds like a job for GCSE to do.

Actually, fwprop should _already_ do that if assuming simplify-rtx.c 
does the simplification of (subreg:QI (sign_extend:SI (subreg:HI (reg:SI 
200) 2))) 3).

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-21 10:44   ` Paolo Bonzini
  2010-10-21 11:00     ` Paolo Bonzini
@ 2010-10-21 17:21     ` Paolo Bonzini
  2010-10-22  9:05       ` Tom de Vries
  1 sibling, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-21 17:21 UTC (permalink / raw)
  Cc: Andrew Pinski, Tom de Vries, gcc-patches, Bernd Schmidt

On 10/21/2010 12:24 PM, Paolo Bonzini wrote:
> On 10/18/2010 05:42 PM, Andrew Pinski wrote:
>>> > For MIPS, compilation results in the following insns.
>>> >
>>> > (set (reg/v:SI 199)
>>> > (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>>> >
>>> > ...
>>> >
>>> > (set (reg:QI 203)
>>> > (subreg:QI (reg/v:SI 199) 3))
>>> >
>>> > These insns are the only def and the only use of reg 199, each 
>>> located in a
>>> > different bb.
>>
>> This sounds like a job for GCSE to do.
> 
> Actually, fwprop should _already_ do that if assuming simplify-rtx.c 
> does the simplification of (subreg:QI (sign_extend:SI (subreg:HI (reg:SI 
> 200) 2))) 3).

... which this code should do in simplify_subreg:

  /* Optimize SUBREG truncations of zero and sign extended values.  */
  if ((GET_CODE (op) == ZERO_EXTEND
       || GET_CODE (op) == SIGN_EXTEND)
      && GET_MODE_BITSIZE (outermode) < GET_MODE_BITSIZE (innermode))
    {
      unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);

      /* If we're requesting the lowpart of a zero or sign extension,
         there are three possibilities.  If the outermode is the same
         as the origmode, we can omit both the extension and the subreg.
         If the outermode is not larger than the origmode, we can apply
         the truncation without the extension.  Finally, if the outermode
         is larger than the origmode, but both are integer modes, we
         can just extend to the appropriate mode.  */
      if (bitpos == 0)
        {
          enum machine_mode origmode = GET_MODE (XEXP (op, 0));
          if (outermode == origmode)
            return XEXP (op, 0);
          if (GET_MODE_BITSIZE (outermode) <= GET_MODE_BITSIZE (origmode))
            return simplify_gen_subreg (outermode, XEXP (op, 0), origmode,
                                        subreg_lowpart_offset (outermode,
                                                               origmode));
          if (SCALAR_INT_MODE_P (outermode))
            return simplify_gen_unary (GET_CODE (op), outermode,
                                       XEXP (op, 0), origmode);
        }

However, the def of pseudo 200 is "complex enough" that fwprop will not want
to propagate it unless it simplifies to a constant.

It should be enough to tell fwprop that such propagations are always fine
if the destination pseudo has one use only.  In this case register pressure
cannot increase.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-21 17:21     ` Paolo Bonzini
@ 2010-10-22  9:05       ` Tom de Vries
  2010-10-22  9:24         ` Paolo Bonzini
  0 siblings, 1 reply; 43+ messages in thread
From: Tom de Vries @ 2010-10-22  9:05 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Andrew Pinski, gcc-patches, Bernd Schmidt

Paolo Bonzini wrote:
> On 10/21/2010 12:24 PM, Paolo Bonzini wrote:
>> On 10/18/2010 05:42 PM, Andrew Pinski wrote:
>>>>> For MIPS, compilation results in the following insns.
>>>>>
>>>>> (set (reg/v:SI 199)
>>>>> (sign_extend:SI (subreg:HI (reg:SI 200) 2)))
>>>>>
>>>>> ...
>>>>>
>>>>> (set (reg:QI 203)
>>>>> (subreg:QI (reg/v:SI 199) 3))
>>>>>
>>>>> These insns are the only def and the only use of reg 199, each 
>>>> located in a
>>>>> different bb.
>>> This sounds like a job for GCSE to do.
>> Actually, fwprop should _already_ do that if assuming simplify-rtx.c 
>> does the simplification of (subreg:QI (sign_extend:SI (subreg:HI (reg:SI 
>> 200) 2))) 3).
> 
> ... which this code should do in simplify_subreg:
> 
>   /* Optimize SUBREG truncations of zero and sign extended values.  */
>   if ((GET_CODE (op) == ZERO_EXTEND
>        || GET_CODE (op) == SIGN_EXTEND)
>       && GET_MODE_BITSIZE (outermode) < GET_MODE_BITSIZE (innermode))
>     {
>       unsigned int bitpos = subreg_lsb_1 (outermode, innermode, byte);
> 
>       /* If we're requesting the lowpart of a zero or sign extension,
>          there are three possibilities.  If the outermode is the same
>          as the origmode, we can omit both the extension and the subreg.
>          If the outermode is not larger than the origmode, we can apply
>          the truncation without the extension.  Finally, if the outermode
>          is larger than the origmode, but both are integer modes, we
>          can just extend to the appropriate mode.  */
>       if (bitpos == 0)
>         {
>           enum machine_mode origmode = GET_MODE (XEXP (op, 0));
>           if (outermode == origmode)
>             return XEXP (op, 0);
>           if (GET_MODE_BITSIZE (outermode) <= GET_MODE_BITSIZE (origmode))
>             return simplify_gen_subreg (outermode, XEXP (op, 0), origmode,
>                                         subreg_lowpart_offset (outermode,
>                                                                origmode));
>           if (SCALAR_INT_MODE_P (outermode))
>             return simplify_gen_unary (GET_CODE (op), outermode,
>                                        XEXP (op, 0), origmode);
>         }
> 
> However, the def of pseudo 200 is "complex enough" that fwprop will not want
> to propagate it unless it simplifies to a constant.
> 
> It should be enough to tell fwprop that such propagations are always fine
> if the destination pseudo has one use only.  In this case register pressure
> cannot increase.
> 
> Paolo

Paolo,

Thanks for the pointer to fwprop. I agree with you that for this example, it
would make sense to have this done in fwprop.

But, I still think we need the new pass. I don't see how fwprop would help in
the example mentioned in
http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, where a superfluous
zero_extend:
...
(insn 10 9 11 2 ext13.c:5 (set (reg:SI 204 [ D.1987+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 213) 2))) 186 {*zero_extendhisi2}
(expr_list:REG_DEAD (reg:SI 213)
        (nil)))
...

is used by:
...
(insn 15 14 16 2 ext13.c:5 (set (reg:SI 217)
        (plus:SI (reg:SI 205 [ D.1988+-2 ])
            (reg:SI 204 [ D.1987+-2 ]))) 10 {*addsi3} (nil))

(insn 17 16 18 2 ext13.c:6 (set (reg:SI 218)
        (minus:SI (reg:SI 204 [ D.1987+-2 ])
            (reg:SI 205 [ D.1988+-2 ]))) 23 {subsi3} (expr_list:REG_DEAD (reg:SI
205 [ D.1988+-2 ])
        (expr_list:REG_DEAD (reg:SI 204 [ D.1987+-2 ])
            (nil))))
...

- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 15:54 new sign/zero extension elimination pass Tom de Vries
  2010-10-18 16:03 ` Andrew Pinski
@ 2010-10-22  9:15 ` Eric Botcazou
  2010-10-28 20:45   ` Tom de Vries
  2010-10-29  1:04   ` Paolo Bonzini
  2010-10-28 20:55 ` Andrew Pinski
  2010-11-08 21:32 ` Andrew Pinski
  3 siblings, 2 replies; 43+ messages in thread
From: Eric Botcazou @ 2010-10-22  9:15 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt

> This new pass manages to analyze this pattern, and replace the sign_extend
> with a regcopy, which results in 1 less instruction in the assembly. The
> other passes that eliminate sign/zero extensions do no manage to do that.
> Combine doesn't work since the def and the use of reg 199 are in a
> different bb. Implicit_zee does not work here since it only combines an
> extension with the defs of its src operand, which is not applicable in this
> case.

Was it originally written for a pre-4.3 compiler?  If not, why does it not use 
the DF framework instead of recomputing DU chains manually?

> The pass does a couple of linear scans over the insns, so it's not
> expensive in terms of runtime.

At -O2 it can do up to 5 full scans over the insns AFAICS; the head comment in 
the file has "a number of times".  This appears to be quite suboptimal and you 
rightfully mentioned it in the head comment.  Couldn't we avoid doing a full 
scan for each new round during the propagation phase, even without using a 
fully-fledged iterative algorithm?

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-22  9:05       ` Tom de Vries
@ 2010-10-22  9:24         ` Paolo Bonzini
  0 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-22  9:24 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Andrew Pinski, gcc-patches, Bernd Schmidt

On 10/22/2010 10:26 AM, Tom de Vries wrote:
> But, I still think we need the new pass. I don't see how fwprop would help in
> the example mentioned in
> http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, where a superfluous
> zero_extend:
> ...
> (insn 10 9 11 2 ext13.c:5 (set (reg:SI 204 [ D.1987+-2 ])
>          (zero_extend:SI (subreg:HI (reg:SI 213) 2))) 186 {*zero_extendhisi2}
> (expr_list:REG_DEAD (reg:SI 213)
>          (nil)))
> ...
>
> is used by:
> ...
> (insn 15 14 16 2 ext13.c:5 (set (reg:SI 217)
>          (plus:SI (reg:SI 205 [ D.1988+-2 ])
>              (reg:SI 204 [ D.1987+-2 ]))) 10 {*addsi3} (nil))
>
> (insn 17 16 18 2 ext13.c:6 (set (reg:SI 218)
>          (minus:SI (reg:SI 204 [ D.1987+-2 ])
>              (reg:SI 205 [ D.1988+-2 ]))) 23 {subsi3} (expr_list:REG_DEAD (reg:SI
> 205 [ D.1988+-2 ])
>          (expr_list:REG_DEAD (reg:SI 204 [ D.1987+-2 ])
>              (nil))))
> ...

Yes, in this case fwprop is not powerful enough, you need to scan the 
insn stream in two directions.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-22  9:15 ` Eric Botcazou
@ 2010-10-28 20:45   ` Tom de Vries
  2010-10-29  2:11     ` Paolo Bonzini
  2010-10-29  1:04   ` Paolo Bonzini
  1 sibling, 1 reply; 43+ messages in thread
From: Tom de Vries @ 2010-10-28 20:45 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Bernd Schmidt

Eric Botcazou wrote:
>> This new pass manages to analyze this pattern, and replace the sign_extend
>> with a regcopy, which results in 1 less instruction in the assembly. The
>> other passes that eliminate sign/zero extensions do no manage to do that.
>> Combine doesn't work since the def and the use of reg 199 are in a
>> different bb. Implicit_zee does not work here since it only combines an
>> extension with the defs of its src operand, which is not applicable in this
>> case.
> 
> Was it originally written for a pre-4.3 compiler?  

No, it was written on top of a 4.5.1 compiler.

> If not, why does it not use 
> the DF framework instead of recomputing DU chains manually?
> 

The current form of the pass does not compute DU-chains. It treats all defs and
uses of the same reg as if all defs reach all uses. This is imprecise, and I'm
expecting to find an example where this is a limiting factor, but until now I
haven't.

>> The pass does a couple of linear scans over the insns, so it's not
>> expensive in terms of runtime.
> 
> At -O2 it can do up to 5 full scans over the insns AFAICS; the head comment in 
> the file has "a number of times".  This appears to be quite suboptimal and you 
> rightfully mentioned it in the head comment.  Couldn't we avoid doing a full 
> scan for each new round during the propagation phase, even without using a 
> fully-fledged iterative algorithm?
> 

If we save the use size for each use in the second scan, rather than just saving
the biggest use size for each reg, we only have to visit instructions where
propagation can happen in a propagating scan. So we can make a list of
instructions where that can happen, and iterate over those only. The second
example I mentioned has 14 insns, of which 8 would be in such a list. Actually,
currently it's 6, but it will be 8 if we also propagate over extensions, which
we don't do yet. Either way, this is already better than doing a full scan.

Another runtime improvement would be to keep track of the propagated size per
insn, which we can use to avoid processing an insn (applying the transfer
function) when it won't have an effect.

A further runtime improvement would be to move insn out the the list when
processed, and back into the list when necessary, using UD-chains. Essentially
this is already an iterative algorithm with a worklist.

The best version I can think now of is an iterative solution using DU/UD-chains,
like this:
- Calculate and store the size of each use.
- Calculate and store the biggest use of each def (using DU-chains).
- Push the progating defs where the biggest use size is smaller than word size
  on a stack.
- Pop a def.
- For the def, check if the propagated biggest use is bigger than the current
  biggest use.
  If not, back to pop a def.
  If so, store current biggest use as propagated biggest use.
- For the def, propagate the biggest use of the def to the operands.
- For each operand, if the use size was reduced, find the reaching definitions
  (using UD-chains).
- For each propagating definition, recompute the biggest use. If that one was
  reduced below the propagated biggest use, push the def on the stack.
- back to pop a def.

I think it would make sense to do either the first 2 improvements (which are
relatively easy to do on top of the current version), or to go directly for the
last version (which would mean a rewrite).

- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 15:54 new sign/zero extension elimination pass Tom de Vries
  2010-10-18 16:03 ` Andrew Pinski
  2010-10-22  9:15 ` Eric Botcazou
@ 2010-10-28 20:55 ` Andrew Pinski
  2010-10-28 21:00   ` Andrew Pinski
  2010-11-08 21:32 ` Andrew Pinski
  3 siblings, 1 reply; 43+ messages in thread
From: Andrew Pinski @ 2010-10-28 20:55 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt

On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
> I created a new sign/zero extension elimination pass.
>
> The motivating example for this pass is:

In the above case fwprop could do the majority of the work.  In fact
it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
does not replace it.  I think you could extend fwprop to the correct
thing.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-28 20:55 ` Andrew Pinski
@ 2010-10-28 21:00   ` Andrew Pinski
  2010-10-28 21:12     ` Tom de Vries
  2010-10-29  0:34     ` Paolo Bonzini
  0 siblings, 2 replies; 43+ messages in thread
From: Andrew Pinski @ 2010-10-28 21:00 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 672 bytes --]

On Thu, Oct 28, 2010 at 12:03 PM, Andrew Pinski <pinskia@gmail.com> wrote:
> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>> I created a new sign/zero extension elimination pass.
>>
>> The motivating example for this pass is:
>
> In the above case fwprop could do the majority of the work.  In fact
> it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
> does not replace it.  I think you could extend fwprop to the correct
> thing.

Something like the attached patch.  I have not bootstrap/tested it yet
but it works for your simple example.
This allows us not to add another pass.

Thanks,
Andrew Pinski

[-- Attachment #2: fixsubreg.diff.txt --]
[-- Type: text/plain, Size: 706 bytes --]

Index: fwprop.c
===================================================================
--- fwprop.c	(revision 166031)
+++ fwprop.c	(working copy)
@@ -543,8 +543,14 @@ propagate_rtx_1 (rtx *px, rtx old_rtx, r
 	  valid_ops &= propagate_rtx_1 (&op0, old_rtx, new_rtx, flags);
           if (op0 == XEXP (x, 0))
 	    return true;
-	  tem = simplify_gen_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
-				     SUBREG_BYTE (x));
+	  tem = simplify_subreg (mode, op0, GET_MODE (SUBREG_REG (x)),
+				 SUBREG_BYTE (x));
+	  if (!tem)
+	    return false;
+	  else if (GET_CODE (tem) == SUBREG && REG_P (SUBREG_REG (tem)))
+	    valid_ops = true;
+	  else if (REG_P (tem))
+	    valid_ops = true;
 	}
       break;
 

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-28 21:00   ` Andrew Pinski
@ 2010-10-28 21:12     ` Tom de Vries
  2010-10-28 22:58       ` Andrew Pinski
  2010-10-29  0:34     ` Paolo Bonzini
  1 sibling, 1 reply; 43+ messages in thread
From: Tom de Vries @ 2010-10-28 21:12 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches, Bernd Schmidt

Andrew,

Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:03 PM, Andrew Pinski <pinskia@gmail.com> wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>> In the above case fwprop could do the majority of the work.  In fact
>> it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
>> does not replace it.  I think you could extend fwprop to the correct
>> thing.
> 
> Something like the attached patch.  I have not bootstrap/tested it yet
> but it works for your simple example.
> This allows us not to add another pass.
> 
> Thanks,
> Andrew Pinski
> 

thanks for this patch.

I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
that for the example with which I submitted the pass initially, it would make
sense to handle it in fwprop. However, I also think that for the example
mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
wouldn't work, so we still need the new pass.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-28 21:12     ` Tom de Vries
@ 2010-10-28 22:58       ` Andrew Pinski
  2010-10-29 15:06         ` Tom de Vries
  0 siblings, 1 reply; 43+ messages in thread
From: Andrew Pinski @ 2010-10-28 22:58 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt

On Thu, Oct 28, 2010 at 12:30 PM, Tom de Vries <tom@codesourcery.com> wrote:
> I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
> that for the example with which I submitted the pass initially, it would make
> sense to handle it in fwprop. However, I also think that for the example
> mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
> wouldn't work, so we still need the new pass.

Well when I look at the tree dumps I see:
(short unsigned int) ((int) (*D.1255)[1] + (int) (*D.1255)[0])

So we should be optimizing this at the tree level such that we don't
see extra sign extends there.  We would optimize it such that it looks
like:
(short unsigned int)(*D.1255)[1] + (short unsigned int)(*D.1255)[0]

I think we need a tree combiner for that and fold to do the folding.
See PR 14844 for another case of the problem.

-- Pinski

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-28 21:00   ` Andrew Pinski
  2010-10-28 21:12     ` Tom de Vries
@ 2010-10-29  0:34     ` Paolo Bonzini
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-29  0:34 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

On 10/28/2010 09:19 PM, Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:03 PM, Andrew Pinski<pinskia@gmail.com>  wrote:
>> On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries<tom@codesourcery.com>  wrote:
>>> I created a new sign/zero extension elimination pass.
>>>
>>> The motivating example for this pass is:
>>
>> In the above case fwprop could do the majority of the work.  In fact
>> it simplifies the (subreg (zero_extend (subreg))) into (subreg) but
>> does not replace it.  I think you could extend fwprop to the correct
>> thing.
>
> Something like the attached patch.

If you can bootstrap it, I think this should go in independent of the 
pass in the thread.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-22  9:15 ` Eric Botcazou
  2010-10-28 20:45   ` Tom de Vries
@ 2010-10-29  1:04   ` Paolo Bonzini
  2010-10-29  1:33     ` Paolo Bonzini
                       ` (2 more replies)
  1 sibling, 3 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-29  1:04 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

On 10/22/2010 10:30 AM, Eric Botcazou wrote:
> If not, why does it not use
> the DF framework instead of recomputing DU chains manually?

Rather than on DU chains (which are a very expensive problem just 
because of its asymptotic complexity, so it's better to use it only on 
small regions such as loops), I'd be picky about usage of note_uses, 
which can be very simply replaced by a loop over the DF_INSN_USES vector.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-29  1:04   ` Paolo Bonzini
@ 2010-10-29  1:33     ` Paolo Bonzini
  2010-11-03 18:50     ` Eric Botcazou
  2010-11-08 21:29     ` Tom de Vries
  2 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-29  1:33 UTC (permalink / raw)
  To: gcc-patches; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

On 10/22/2010 10:30 AM, Eric Botcazou wrote:
> If not, why does it not use
> the DF framework instead of recomputing DU chains manually?

Rather than on DU chains (which are a very expensive problem just 
because of its asymptotic complexity, so it's better to use it only on 
small regions such as loops), I'd be picky about usage of note_uses, 
which can be very simply replaced by a loop over the DF_INSN_USES vector.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-28 20:45   ` Tom de Vries
@ 2010-10-29  2:11     ` Paolo Bonzini
  2010-10-29  2:42       ` Paolo Bonzini
  2010-10-31 19:30       ` Tom de Vries
  0 siblings, 2 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-29  2:11 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Eric Botcazou, gcc-patches, Bernd Schmidt

On 10/28/2010 08:46 PM, Tom de Vries wrote:
> The best version I can think now of is an iterative solution using DU/UD-chains,
> like this:
> - Calculate and store the size of each use.
> - Calculate and store the biggest use of each def (using DU-chains).
> - Push the progating defs where the biggest use size is smaller than word size
>    on a stack.
> - Pop a def.
> - For the def, check if the propagated biggest use is bigger than the current
>    biggest use.
>    If not, back to pop a def.
>    If so, store current biggest use as propagated biggest use.
> - For the def, propagate the biggest use of the def to the operands.
> - For each operand, if the use size was reduced, find the reaching definitions
>    (using UD-chains).
> - For each propagating definition, recompute the biggest use. If that one was
>    reduced below the propagated biggest use, push the def on the stack.
> - back to pop a def.
>

DU and UD-chains are quadratic, and the reaching definitions problem has 
very big solutions too, so they are rarely the right solution 
(unfortunately).

In addition, I wonder if this pass wouldn't introduce new undefined 
overflows, which would be a correctness problem.  If you're going for a 
rewrite, I'd do it on tree-SSA so that:

1) you can use unsigned math to avoid undefined overflow;

2) SSA will provide you with cheap DU/UD.

Hopefully, only simple redundancies will remain after expand, and fwprop 
can take care of these using Andrew's patch.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-29  2:11     ` Paolo Bonzini
@ 2010-10-29  2:42       ` Paolo Bonzini
  2010-10-31 19:30       ` Tom de Vries
  1 sibling, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-29  2:42 UTC (permalink / raw)
  To: gcc-patches; +Cc: Eric Botcazou, gcc-patches, Bernd Schmidt

On 10/28/2010 08:46 PM, Tom de Vries wrote:
> The best version I can think now of is an iterative solution using DU/UD-chains,
> like this:
> - Calculate and store the size of each use.
> - Calculate and store the biggest use of each def (using DU-chains).
> - Push the progating defs where the biggest use size is smaller than word size
>    on a stack.
> - Pop a def.
> - For the def, check if the propagated biggest use is bigger than the current
>    biggest use.
>    If not, back to pop a def.
>    If so, store current biggest use as propagated biggest use.
> - For the def, propagate the biggest use of the def to the operands.
> - For each operand, if the use size was reduced, find the reaching definitions
>    (using UD-chains).
> - For each propagating definition, recompute the biggest use. If that one was
>    reduced below the propagated biggest use, push the def on the stack.
> - back to pop a def.
>

DU and UD-chains are quadratic, and the reaching definitions problem has 
very big solutions too, so they are rarely the right solution 
(unfortunately).

In addition, I wonder if this pass wouldn't introduce new undefined 
overflows, which would be a correctness problem.  If you're going for a 
rewrite, I'd do it on tree-SSA so that:

1) you can use unsigned math to avoid undefined overflow;

2) SSA will provide you with cheap DU/UD.

Hopefully, only simple redundancies will remain after expand, and fwprop 
can take care of these using Andrew's patch.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-28 22:58       ` Andrew Pinski
@ 2010-10-29 15:06         ` Tom de Vries
  0 siblings, 0 replies; 43+ messages in thread
From: Tom de Vries @ 2010-10-29 15:06 UTC (permalink / raw)
  To: Andrew Pinski; +Cc: gcc-patches, Bernd Schmidt

Andrew,

Andrew Pinski wrote:
> On Thu, Oct 28, 2010 at 12:30 PM, Tom de Vries <tom@codesourcery.com> wrote:
>> I agreed with Paolo in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01897.html
>> that for the example with which I submitted the pass initially, it would make
>> sense to handle it in fwprop. However, I also think that for the example
>> mentioned in http://gcc.gnu.org/ml/gcc-patches/2010-10/msg01796.html, that
>> wouldn't work, so we still need the new pass.
> 
> Well when I look at the tree dumps I see:
> (short unsigned int) ((int) (*D.1255)[1] + (int) (*D.1255)[0])
> 

I'm not able to reproduce that tree. Can you tell me how you got that one?

I compiled the example http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40893#c0 with
MIPS as default target:
...
$ rm -f *.c.* ; gcc -O3 test2.c -S -fdump-rtl-all -fdump-tree-all
...

and grepped for a pattern matching the tree code you mention:
...
$ grep '(short unsigned int)' test2.c.* | grep '+'
test2.c.003t.original:  (*d)[0] = (int16_t) ((short unsigned int) d0 + (short
unsigned int) d1);
...

The only match is 003t.original, which looks like:
...
{
  int d0 = (int) (*d)[0] + (int) (*d)[1];
  int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1];

    int d0 = (int) (*d)[0] + (int) (*d)[1];
    int d1 = (int) (*(d + 4))[0] + (int) (*(d + 4))[1];
  (*d)[0] = (int16_t) ((short unsigned int) d0 + (short unsigned int) d1);
  (*d)[1] = (int16_t) ((short unsigned int) d0 - (short unsigned int) d1);
}
...

I did a clean checkout and build for x86_64 build this morning. Same tree as for
MIPS, no other match for the grep.

The tree you mention looks like '(short unsigned int) d0' combined with the def
of d0.

> So we should be optimizing this at the tree level such that we don't
> see extra sign extends there.  We would optimize it such that it looks
> like:
> (short unsigned int)(*D.1255)[1] + (short unsigned int)(*D.1255)[0]
> 
> I think we need a tree combiner for that and fold to do the folding.
> See PR 14844 for another case of the problem.
> 

Thanks for the links to PR 14844 and the tree combiner (PR 15459). Do you know
of an updated (to gimple) version of the tree combiner? I would like to try it
on this example.

If I look at the representation after cselim, the 2 uses of d0 are cse-ed:
...
{
  int d1;
  int d0;
  int16_t D.2096;
  short unsigned int D.2095;
  int16_t D.2094;
  short unsigned int D.2093;
  short unsigned int D.2092;
  short unsigned int D.2091;
  int D.2090;
  int16_t D.2089;
  int D.2088;
  int16_t D.2087;
  int D.2085;
  int16_t D.2084;
  int D.2083;
  int16_t D.2082;

<bb 2>:
  D.2082_2 = *d_1(D)[0];
  D.2083_3 = (int) D.2082_2;
  D.2084_4 = *d_1(D)[1];
  D.2085_5 = (int) D.2084_4;
  d0_6 = D.2083_3 + D.2085_5;
  D.2087_8 = MEM[(int16_t[2] *)d_1(D) + 4B][0];
  D.2088_9 = (int) D.2087_8;
  D.2089_11 = MEM[(int16_t[2] *)d_1(D) + 4B][1];
  D.2090_12 = (int) D.2089_11;
  d1_13 = D.2088_9 + D.2090_12;
  D.2091_14 = (short unsigned int) d0_6;
  D.2092_15 = (short unsigned int) d1_13;
  D.2093_16 = D.2091_14 + D.2092_15;
  D.2094_17 = (int16_t) D.2093_16;
  *d_1(D)[0] = D.2094_17;
  D.2095_20 = D.2091_14 - D.2092_15;
  D.2096_21 = (int16_t) D.2095_20;
  *d_1(D)[1] = D.2096_21;
  return;
}
...

So if I filter out all the statements related to d0 we get:
...
{
  int d0;
  short unsigned int D.2091;
  int D.2085;
  int16_t D.2084;
  int D.2083;
  int16_t D.2082;

<bb 2>:
  D.2082_2 = *d_1(D)[0];
  D.2083_3 = (int) D.2082_2;
  D.2084_4 = *d_1(D)[1];
  D.2085_5 = (int) D.2084_4;
  d0_6 = D.2083_3 + D.2085_5;
  D.2091_14 = (short unsigned int) d0_6;
  ...
}
...

I think a tree combiner would work here like you suggested, although it would
have to combine 6 gimple statements to get there.

But I wonder, the rtl combiner has a weak point: it only combines chains of
instructions that have no other uses of the intermediate results (PR 18395/1).
AFAIU the source code of the tree combiner, it suffers from the same problem. Is
my understanding correct there?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-29  2:11     ` Paolo Bonzini
  2010-10-29  2:42       ` Paolo Bonzini
@ 2010-10-31 19:30       ` Tom de Vries
  2010-10-31 20:58         ` Joseph S. Myers
  2010-10-31 21:11         ` Paolo Bonzini
  1 sibling, 2 replies; 43+ messages in thread
From: Tom de Vries @ 2010-10-31 19:30 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Eric Botcazou, gcc-patches, Bernd Schmidt

Paolo Bonzini wrote:
> On 10/28/2010 08:46 PM, Tom de Vries wrote:
>> The best version I can think now of is an iterative solution using
>> DU/UD-chains,
>> like this:
>> - Calculate and store the size of each use.
>> - Calculate and store the biggest use of each def (using DU-chains).
>> - Push the progating defs where the biggest use size is smaller than
>> word size
>>    on a stack.
>> - Pop a def.
>> - For the def, check if the propagated biggest use is bigger than the
>> current
>>    biggest use.
>>    If not, back to pop a def.
>>    If so, store current biggest use as propagated biggest use.
>> - For the def, propagate the biggest use of the def to the operands.
>> - For each operand, if the use size was reduced, find the reaching
>> definitions
>>    (using UD-chains).
>> - For each propagating definition, recompute the biggest use. If that
>> one was
>>    reduced below the propagated biggest use, push the def on the stack.
>> - back to pop a def.
>>
> 
> DU and UD-chains are quadratic, and the reaching definitions problem has
> very big solutions too, so they are rarely the right solution
> (unfortunately).
> 

thanks, I overlooked that completely.

> In addition, I wonder if this pass wouldn't introduce new undefined
> overflows, which would be a correctness problem.

The pass assumes wraparound overflow for plus and minus. The documentation of
the rtl operator 'plus' states that 'plus wraps round modulo the width of m'
(similar for 'minus'). I interpreted this independently from -fwrapv and
-fstrict-overflow, am I wrong there?

>  If you're going for a
> rewrite, I'd do it on tree-SSA so that:
> 
> 1) you can use unsigned math to avoid undefined overflow;
> 
> 2) SSA will provide you with cheap DU/UD.
> 
> Hopefully, only simple redundancies will remain after expand, and fwprop
> can take care of these using Andrew's patch.
> 

I see your point about the cheap DU/UD. I think the answer whether to do it in
tree-SSA or RTL depends on the answer to my previous question. If going to
tree-SSA allows us to catch more cases, we should do that.

- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-31 19:30       ` Tom de Vries
@ 2010-10-31 20:58         ` Joseph S. Myers
  2010-10-31 21:11         ` Paolo Bonzini
  1 sibling, 0 replies; 43+ messages in thread
From: Joseph S. Myers @ 2010-10-31 20:58 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Paolo Bonzini, Eric Botcazou, gcc-patches, Bernd Schmidt

On Sun, 31 Oct 2010, Tom de Vries wrote:

> The pass assumes wraparound overflow for plus and minus. The documentation of
> the rtl operator 'plus' states that 'plus wraps round modulo the width of m'
> (similar for 'minus'). I interpreted this independently from -fwrapv and
> -fstrict-overflow, am I wrong there?

-fwrapv and -fstrict-overflow relate only to source code, GENERIC and 
GIMPLE semantics, and do not affect RTL which is always modulo.  (This is 
true at least for addition, subtraction, multiplication and absolute 
value.  I make no claims about the semantics for division and modulo 
operations for INT_MIN and -1, but my suggestion in bug 30484 was that the 
target should specify the semantics of its RTL insns, not that they should 
depend on command-line options.  The aim is to stop the options from 
affecting GENERIC and GIMPLE semantics in future; see the 
no-undefined-overflow branch.)

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-31 19:30       ` Tom de Vries
  2010-10-31 20:58         ` Joseph S. Myers
@ 2010-10-31 21:11         ` Paolo Bonzini
  2010-11-03 18:49           ` Eric Botcazou
  1 sibling, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2010-10-31 21:11 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Eric Botcazou, gcc-patches, Bernd Schmidt

On Sun, Oct 31, 2010 at 18:55, Tom de Vries <tom@codesourcery.com> wrote:
>> DU and UD-chains are quadratic, and the reaching definitions problem has
>> very big solutions too, so they are rarely the right solution
>> (unfortunately).
>
> thanks, I overlooked that completely.

No problem, I discovered it the hard way (and had to rewrite fwprop's
dataflow to work around it).

>> In addition, I wonder if this pass wouldn't introduce new undefined
>> overflows, which would be a correctness problem.
>
> The pass assumes wraparound overflow for plus and minus. The documentation of
> the rtl operator 'plus' states that 'plus wraps round modulo the width of m'
> (similar for 'minus'). I interpreted this independently from -fwrapv and
> -fstrict-overflow, am I wrong there?

I think you're right, as Joseph confirmed.  There is one occurrence of
flag_wrapv in simplify-rtx.c and zero in combine.c, so this means that
indeed RTL treats PLUS/MINUS as wrapping.  I remember seeing more but
my memory must be at fault.  Since it's so limited, the one occurrence
that is there should be removed.

>>  If you're going for a rewrite, I'd do it on tree-SSA so that:
>>
>> 1) you can use unsigned math to avoid undefined overflow;
>>
>> 2) SSA will provide you with cheap DU/UD.
>>
>> Hopefully, only simple redundancies will remain after expand, and fwprop
>> can take care of these using Andrew's patch.
>
> I see your point about the cheap DU/UD. I think the answer whether to do it in
> tree-SSA or RTL depends on the answer to my previous question. If going to
> tree-SSA allows us to catch more cases, we should do that.

If you can fix the performance issues that Eric has without a rewrite,
and if it's not quadratic, I would have no problem at all with RTL.

If you need a rewrite, tree-SSA could be a better match if only for
DU/UD and ability to use the propagation engine.  The only thing I'd
be wary of, is vectorization failures because of the more complex
trees.  (But then it may even turn out to _improve_ vectorization.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-31 21:11         ` Paolo Bonzini
@ 2010-11-03 18:49           ` Eric Botcazou
  0 siblings, 0 replies; 43+ messages in thread
From: Eric Botcazou @ 2010-11-03 18:49 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

> No problem, I discovered it the hard way (and had to rewrite fwprop's
> dataflow to work around it).

The other ZEE pass uses UD/DU chains though.  The now removed SEE pass used 
them as well.

> I think you're right, as Joseph confirmed.  There is one occurrence of
> flag_wrapv in simplify-rtx.c and zero in combine.c, so this means that
> indeed RTL treats PLUS/MINUS as wrapping.  I remember seeing more but
> my memory must be at fault.  Since it's so limited, the one occurrence
> that is there should be removed.

They were introduced to fix PR rtl-optimization/23047.

> If you can fix the performance issues that Eric has without a rewrite,
> and if it's not quadratic, I would have no problem at all with RTL.

Yes, I also think that the pass makes sense at the RTL level, as it does 
something that cannot be done (easily) elsewhere.  On the other hand, it's 
unfortunate to have 2 different ZEE passes.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-29  1:04   ` Paolo Bonzini
  2010-10-29  1:33     ` Paolo Bonzini
@ 2010-11-03 18:50     ` Eric Botcazou
  2010-11-08 21:29     ` Tom de Vries
  2 siblings, 0 replies; 43+ messages in thread
From: Eric Botcazou @ 2010-11-03 18:50 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Tom de Vries, gcc-patches, Bernd Schmidt

> Rather than on DU chains (which are a very expensive problem just
> because of its asymptotic complexity, so it's better to use it only on
> small regions such as loops), I'd be picky about usage of note_uses,
> which can be very simply replaced by a loop over the DF_INSN_USES vector.

Indeed, that would already be better.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-29  1:04   ` Paolo Bonzini
  2010-10-29  1:33     ` Paolo Bonzini
  2010-11-03 18:50     ` Eric Botcazou
@ 2010-11-08 21:29     ` Tom de Vries
  2010-11-08 22:11       ` Paolo Bonzini
  2 siblings, 1 reply; 43+ messages in thread
From: Tom de Vries @ 2010-11-08 21:29 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Eric Botcazou, gcc-patches, Bernd Schmidt

Paolo,

Paolo Bonzini wrote:
> > On 10/22/2010 10:30 AM, Eric Botcazou wrote:
>> >> If not, why does it not use
>> >> the DF framework instead of recomputing DU chains manually?
> >
> > Rather than on DU chains (which are a very expensive problem just
> > because of its asymptotic complexity, so it's better to use it only on
> > small regions such as loops), I'd be picky about usage of note_uses,
> > which can be very simply replaced by a loop over the DF_INSN_USES vector.
> >
> > Paolo

I just looked into using DF_INSN_USES, and I'm not sure that using that is a
good idea. There is a difference between using note_uses and DF_INSN_USES.

If there is an insn
...
(set
  (reg:SI 1)
  (plus:SI (reg:SI 2)
           (reg:SI 3)))
...

the note_uses helper function will be visited with the plus expression,
something that is used in the pass (see note_use in ee.c). Using DF_INSN_USES
will only yield (reg:SI 2) and (reg:SI 3), and does not provide the context in
which it is used.


Furthermore, I would like to know whether there is a problem with checking in
the pass into trunk in its current form. My understanding of the discussion up
until now is that the consensus is that the pass is useful, but not fully efficient.
I agree with the fact that it's possible to improve it, but I also think that
the runtime is negligible (it's an O(n) pass currently) and the benefit of
improving the pass will not be worth the effort. I will try to confirm this with
a profiling run on spec2000. If the profiling run confirms that the runtime is
negligible, is the pass (in principle) ok for trunk?
If so, I will sent out a new version with a few bug fixes that were missing in
the previous version.

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-10-18 15:54 new sign/zero extension elimination pass Tom de Vries
                   ` (2 preceding siblings ...)
  2010-10-28 20:55 ` Andrew Pinski
@ 2010-11-08 21:32 ` Andrew Pinski
  3 siblings, 0 replies; 43+ messages in thread
From: Andrew Pinski @ 2010-11-08 21:32 UTC (permalink / raw)
  To: Tom de Vries; +Cc: gcc-patches, Bernd Schmidt

On Mon, Oct 18, 2010 at 8:36 AM, Tom de Vries <tom@codesourcery.com> wrote:
> I created a new sign/zero extension elimination pass.
>
> The motivating example for this pass is:
>
>  void f(unsigned char *p, short s, int c, int *z)
>    {
>      if (c)
>        *z = 0;
>      *p ^= (unsigned char)s;
>    }
>
> For MIPS, compilation results in the following insns.


One more thing about the above testcase.  When I was working on
removing some more zero extensions, I noticed that
TARGET_PROMOTE_PROTOTYPES was being defined for MIPS.  This macro was
just removed from SPARC for the same reason why I was looking into
those zero extensions.  You might want to check the definition for
MIPS and refine it.

Thanks,
Andrew Pinski

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-11-08 21:29     ` Tom de Vries
@ 2010-11-08 22:11       ` Paolo Bonzini
  2010-11-12  8:29         ` Tom de Vries
  0 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2010-11-08 22:11 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Eric Botcazou, gcc-patches, Bernd Schmidt

On Mon, Nov 8, 2010 at 22:21, Tom de Vries <tom@codesourcery.com> wrote:
> I just looked into using DF_INSN_USES, and I'm not sure that using that is a
> good idea. There is a difference between using note_uses and DF_INSN_USES.
>
> If there is an insn
> ...
> (set
>  (reg:SI 1)
>  (plus:SI (reg:SI 2)
>           (reg:SI 3)))
> ...
>
> the note_uses helper function will be visited with the plus expression,
> something that is used in the pass (see note_use in ee.c).

This is an interesting point, thanks.

> Furthermore, I would like to know whether there is a problem with checking in
> the pass into trunk in its current form. My understanding of the discussion up
> until now is that the consensus is that the pass is useful, but not fully efficient.
> I agree with the fact that it's possible to improve it, but I also think that
> the runtime is negligible (it's an O(n) pass currently) and the benefit of
> improving the pass will not be worth the effort. I will try to confirm this with
> a profiling run on spec2000. If the profiling run confirms that the runtime is
> negligible, is the pass (in principle) ok for trunk?

I am not an RTL reviewer, so my opinion doesn't weigh too much.

Paolo

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-11-08 22:11       ` Paolo Bonzini
@ 2010-11-12  8:29         ` Tom de Vries
  2010-11-13 10:41           ` Eric Botcazou
  0 siblings, 1 reply; 43+ messages in thread
From: Tom de Vries @ 2010-11-12  8:29 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Paolo Bonzini, gcc-patches, Bernd Schmidt

Eric,

Paolo Bonzini wrote:
> On Mon, Nov 8, 2010 at 22:21, Tom de Vries <tom@codesourcery.com> wrote:
>> I just looked into using DF_INSN_USES, and I'm not sure that using that is a
>> good idea. There is a difference between using note_uses and DF_INSN_USES.
>>
>> If there is an insn
>> ...
>> (set
>>  (reg:SI 1)
>>  (plus:SI (reg:SI 2)
>>           (reg:SI 3)))
>> ...
>>
>> the note_uses helper function will be visited with the plus expression,
>> something that is used in the pass (see note_use in ee.c).
> 
> This is an interesting point, thanks.
> 
>> Furthermore, I would like to know whether there is a problem with checking in
>> the pass into trunk in its current form. My understanding of the discussion up
>> until now is that the consensus is that the pass is useful, but not fully efficient.
>> I agree with the fact that it's possible to improve it, but I also think that
>> the runtime is negligible (it's an O(n) pass currently) and the benefit of
>> improving the pass will not be worth the effort. I will try to confirm this with
>> a profiling run on spec2000. If the profiling run confirms that the runtime is
>> negligible, is the pass (in principle) ok for trunk?
> 
> I am not an RTL reviewer, so my opinion doesn't weigh too much.
> 
> Paolo

I profiled the pass on spec2000:

                    -mabi=32     -mabi=64
ee-pass (usr time):     0.70         1.16
total   (usr time):   919.30       879.26
ee-pass        (%):     0.08         0.13

The pass takes 0.13% or less of the total usr runtime. Is it necessary to
improve the runtime of this pass?

Thanks,
- Tom

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-11-12  8:29         ` Tom de Vries
@ 2010-11-13 10:41           ` Eric Botcazou
  2012-07-11 10:31             ` Tom de Vries
  0 siblings, 1 reply; 43+ messages in thread
From: Eric Botcazou @ 2010-11-13 10:41 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Paolo Bonzini, gcc-patches, Bernd Schmidt

> I profiled the pass on spec2000:
>
>                     -mabi=32     -mabi=64
> ee-pass (usr time):     0.70         1.16
> total   (usr time):   919.30       879.26
> ee-pass        (%):     0.08         0.13
>
> The pass takes 0.13% or less of the total usr runtime.

For how many hits?  What are the numbers with --param ee-max-propagate=0?

> Is it necessary to improve the runtime of this pass?

I've already given my opinion about the implementation.  The other passes in 
the compiler try hard not to rescan everything when a single bit changes; as 
currently written, yours doesn't.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2010-11-13 10:41           ` Eric Botcazou
@ 2012-07-11 10:31             ` Tom de Vries
  2012-07-11 11:42               ` Jakub Jelinek
                                 ` (3 more replies)
  0 siblings, 4 replies; 43+ messages in thread
From: Tom de Vries @ 2012-07-11 10:31 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Tom de Vries, Paolo Bonzini, gcc-patches, Bernd Schmidt

[-- Attachment #1: Type: text/plain, Size: 1734 bytes --]

On 13/11/10 10:50, Eric Botcazou wrote:
>> I profiled the pass on spec2000:
>>
>>                     -mabi=32     -mabi=64
>> ee-pass (usr time):     0.70         1.16
>> total   (usr time):   919.30       879.26
>> ee-pass        (%):     0.08         0.13
>>
>> The pass takes 0.13% or less of the total usr runtime.
> 
> For how many hits?  What are the numbers with --param ee-max-propagate=0?
> 
>> Is it necessary to improve the runtime of this pass?
> 
> I've already given my opinion about the implementation.  The other passes in 
> the compiler try hard not to rescan everything when a single bit changes; as 
> currently written, yours doesn't.
> 

Eric,

I've done the following:
- refactored the pass such that it now scans at most twice over all
  instructions.
- updated the patch to be applicable to current trunk
- updated the motivating example to a more applicable one (as discussed in
  this thread), and added that one as test-case.
- added a part in the header comment illustrating the working of the pass
  on the motivating example.

bootstrapped and reg-tested on x86_64 and i686.

build and reg-tested on mips, mips64, and arm.

OK for trunk?

Thanks,
- Tom

2012-07-10  Tom de Vries  <tom@codesourcery.com>

	* ee.c: New file.
	* tree-pass.h (pass_ee): Declare.
	* opts.c ( default_options_table): Set flag_ee at -O2.
	* timevar.def (TV_EE): New timevar.
	* common.opt (fextension-elimination): New option.
	* Makefile.in (ee.o): New rule.
	* passes.c (pass_ee): Add it.

	* gcc.dg/extend-1.c: New test.
	* gcc.dg/extend-2.c: Same.
	* gcc.dg/extend-2-64.c: Same.
	* gcc.dg/extend-3.c: Same.
	* gcc.dg/extend-4.c: Same.
	* gcc.dg/extend-5.c: Same.
	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.

[-- Attachment #2: ee.current-ml.7.patch --]
[-- Type: text/x-patch, Size: 37763 bytes --]

Index: gcc/tree-pass.h
===================================================================
--- gcc/tree-pass.h (revision 189409)
+++ gcc/tree-pass.h (working copy)
@@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
 
 extern struct rtl_opt_pass pass_expand;
 extern struct rtl_opt_pass pass_instantiate_virtual_regs;
+extern struct rtl_opt_pass pass_ee;
 extern struct rtl_opt_pass pass_rtl_fwprop;
 extern struct rtl_opt_pass pass_rtl_fwprop_addr;
 extern struct rtl_opt_pass pass_jump;
Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
===================================================================
--- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
+++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
@@ -5,19 +5,19 @@
 /* { dg-final { scan-assembler "\tbnel\t" } } */
 /* { dg-final { scan-assembler-not "\tbne\t" } } */
 
-NOMIPS16 int
-f (int n, int i)
+NOMIPS16 long int
+f (long int n, long int i)
 {
-  int s = 0;
+  long int s = 0;
   for (; i & 1; i++)
     s += i;
   return s;
 }
 
-NOMIPS16 int
-g (int n, int i)
+NOMIPS16 long int
+g (long int n, long int i)
 {
-  int s = 0;
+  long int s = 0;
   for (i = 0; i < n; i++)
     s += i;
   return s;
Index: gcc/testsuite/gcc.dg/extend-4.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+unsigned char f(unsigned int a, int c)
+{
+  unsigned int b = a;
+  if (c)
+    b = a & 0x10ff;
+  return b;
+}
+
+/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-1.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+void f(unsigned char * p, short s, int c, int *z)
+{
+  if (c)
+    *z = 0;
+  *p ^= (unsigned char)s;
+}
+
+/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
Index: gcc/testsuite/gcc.dg/extend-5.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+
+void f (short d[2][2])
+{
+  int d0 = d[0][0] + d[0][1];
+  int d1 = d[1][0] + d[1][1];
+  d[0][0] = d0 + d1;
+      d[0][1] = d0 - d1;
+}
+
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
Index: gcc/testsuite/gcc.dg/extend-2.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee" } */
+/* { dg-require-effective-target ilp32 } */
+
+void f(unsigned char * p, short *s, int c)
+{
+  short or = 0;
+  while (c)
+    {
+      or = or | s[c];
+      c --;
+    }
+  *p = (unsigned char)or;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-2-64.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
+/* { dg-require-effective-target mips64 } */
+
+void f(unsigned char * p, short *s, int c)
+{
+  short or = 0;
+  while (c)
+    {
+      or = or | s[c];
+      c --;
+    }
+  *p = (unsigned char)or;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/testsuite/gcc.dg/extend-3.c
===================================================================
--- /dev/null (new file)
+++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
+/* { dg-require-effective-target mips64 } */
+
+unsigned int f(unsigned char byte)
+{
+  return byte << 25;
+}
+
+/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
+/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
+/* { dg-final { cleanup-rtl-dump "ee" } } */
+
Index: gcc/opts.c
===================================================================
--- gcc/opts.c (revision 189409)
+++ gcc/opts.c (working copy)
@@ -490,6 +490,7 @@ static const struct default_options defa
     { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
     { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
     { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
+    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
 
     /* -O3 optimizations.  */
     { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
Index: gcc/timevar.def
===================================================================
--- gcc/timevar.def (revision 189409)
+++ gcc/timevar.def (working copy)
@@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post 
 DEFTIMEVAR (TV_VARCONST              , "varconst")
 DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
 DEFTIMEVAR (TV_JUMP                  , "jump")
+DEFTIMEVAR (TV_EE                    , "extension elimination")
 DEFTIMEVAR (TV_FWPROP                , "forward prop")
 DEFTIMEVAR (TV_CSE                   , "CSE")
 DEFTIMEVAR (TV_DCE                   , "dead code elimination")
Index: gcc/ee.c
===================================================================
--- /dev/null (new file)
+++ gcc/ee.c (revision 0)
@@ -0,0 +1,1190 @@
+/* Redundant extension elimination.
+   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
+   Contributed by Tom de Vries (tom@codesourcery.com)
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+/*
+
+  MOTIVATING EXAMPLE
+
+  The motivating example for this pass is the example from PR 40893:
+
+    void f (short d[2][2])
+    {
+      int d0 = d[0][0] + d[0][1];
+      int d1 = d[1][0] + d[1][1];
+      d[0][0] = d0 + d1;
+      d[0][1] = d0 - d1;
+    }
+
+  For MIPS, compilation results in the following insns.
+
+    (set (reg:SI 204)
+         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
+
+    (set (reg:SI 205)
+         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
+
+    (set (reg:SI 217)
+         (plus:SI (reg:SI 205)
+                  (reg:SI 204)))
+
+    (set (reg:SI 218)
+         (minus:SI (reg:SI 204)
+                   (reg:SI 205)))
+
+    (set (mem:HI (reg/v/f:SI 210))
+         (subreg:HI (reg:SI 217) 2))
+
+    (set (mem:HI (plus:SI (reg/v/f:SI 210)
+                 (const_int 2 [0x2])))
+         (subreg:HI (reg:SI 218) 2))
+
+
+  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
+  are the only uses.  And the plus and minus operators belong to the class of
+  operators where a bit in the result is only influenced by same-or-less
+  significant bitss in the operands, so the plus and minus insns only use the
+  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
+  204 and 205, so the zero_extends are redundant.
+
+
+  INTENDED EFFECT
+
+  This pass works by removing sign/zero-extensions, or replacing them with
+  regcopies.  The idea there is that the regcopy might be eliminated by a later
+  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
+  than the extension.
+
+
+  IMPLEMENTATION
+
+  The pass scans at most two times over all instructions.
+
+  The first scan collects all extensions.  If there are no extensions, we're
+  done.
+
+  The second scan registers all uses of a reg in the biggest_use array.
+  Additionally, it registers how the use size of a pseudo is propagated to the
+  operands of the insns defining the pseudo.
+
+  The biggest_use array now contains the size in bits of the biggest use
+  of each reg, which allows us to find redundant extensions.
+
+  If there are still non-redundant extensions left, we use the propagation
+  information in an iterative fashion to improve the biggest_use array, after
+  which we may find more redundant extensions.
+
+  Finally, redundant extensions are deleted or replaced.
+
+  In case that the src and dest reg of the replacement are not of the same size,
+  we do not replace with a normal regcopy, but with a truncate or with the copy
+  of a paradoxical subreg instead.
+
+
+  ILLUSTRATION OF PASS
+
+  The dump of the pass shows us how the pass works on the motivating example.
+
+  We find the 2 extensions:
+    found extension with preserved size 16 defining reg 204
+    found extension with preserved size 16 defining reg 205
+
+  We calculate the biggests uses of a register:
+    biggest_use
+    reg 204: size 32
+    reg 205: size 32
+    reg 217: size 16
+    reg 218: size 16
+
+  We propagate the biggest uses where possible:
+    propagations
+    205: 32 -> 16
+    204: 32 -> 16
+    214: 32 -> 16
+    215: 32 -> 16
+
+  We conclude that the extensions are redundant:
+    found redundant extension with preserved size 16 defining reg 205
+    found redundant extension with preserved size 16 defining reg 204
+
+  And we replace them with regcopies:
+    (set (reg:SI 204)
+        (reg:SI 213))
+
+    (set (reg:SI 205)
+        (reg:SI 216))
+
+
+  LIMITATIONS
+
+  The scope of the analysis is limited to an extension and its uses.  The other
+  type of analysis (related to the defs of the operand of an extension) is not
+  done.
+
+  Furthermore, we do the analysis of biggest use per reg.  So when determining
+  whether an extension is redundant, we take all uses of a dest reg into
+  account, also the ones that are not uses of the extension.
+  The consideration is that using use-def chains will give a more precise
+  analysis, but is much more expensive in terms of runtime.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "tm.h"
+#include "rtl.h"
+#include "tree.h"
+#include "tm_p.h"
+#include "flags.h"
+#include "regs.h"
+#include "hard-reg-set.h"
+#include "basic-block.h"
+#include "insn-config.h"
+#include "function.h"
+#include "expr.h"
+#include "insn-attr.h"
+#include "recog.h"
+#include "toplev.h"
+#include "target.h"
+#include "timevar.h"
+#include "optabs.h"
+#include "insn-codes.h"
+#include "rtlhooks-def.h"
+#include "output.h"
+#include "params.h"
+#include "timevar.h"
+#include "tree-pass.h"
+#include "cgraph.h"
+#include "vec.h"
+
+#define SKIP_REG (-1)
+#define NONE (-1)
+
+/* Number of registers at start of pass.  */
+
+static int n_regs;
+
+/* Array to register the biggest use of a reg, in bits.  */
+
+static int *biggest_use;
+
+/* Array to register the promoted subregs.  */
+
+static VEC (rtx,heap) **promoted_subreg;
+
+/* Array to register for a reg what the last propagated size is.  */
+
+static int *propagated_size;
+
+typedef struct use
+{
+  int regno;
+  int size;
+  int offset;
+  rtx *use;
+} use_type;
+
+DEF_VEC_O(use_type);
+DEF_VEC_ALLOC_O(use_type,heap);
+
+/* Vector to register the uses.  */
+
+static VEC (use_type,heap) **uses;
+
+typedef struct prop
+{
+  rtx set;
+  int uses_regno;
+  int uses_index;
+} prop_type;
+
+DEF_VEC_O(prop_type);
+DEF_VEC_ALLOC_O(prop_type,heap);
+
+/* Vector to register the propagations.  */
+
+static VEC (prop_type,heap) **props;
+
+/* Work list for propragation.  */
+
+static VEC (int,heap) *wl;
+
+/* Array to register what regs are in the work list.  */
+
+static bool *in_wl;
+
+/* Vector that contains the extensions in the function.  */
+
+static VEC (rtx,heap) *extensions;
+
+/* Vector that contains the extensions in the function that are going to be
+   removed or replaced.  */
+
+static VEC (rtx,heap) *redundant_extensions;
+
+/* Forward declaration.  */
+
+static void note_use (rtx *x, void *data);
+static bool skip_reg_p (int regno);
+static void register_prop (rtx set, use_type *use);
+
+/* Check whether SUBREG is a promoted subreg.  */
+
+static bool
+promoted_subreg_p (rtx subreg)
+{
+  return (GET_CODE (subreg) == SUBREG
+	  && SUBREG_PROMOTED_VAR_P (subreg));
+}
+
+/* Check whether SUBREG is a promoted subreg for which we cannot reset the
+   promotion.  */
+
+static bool
+fixed_promoted_subreg_p (rtx subreg)
+{
+  int mre;
+
+  if (!promoted_subreg_p (subreg))
+    return false;
+
+  mre = targetm.mode_rep_extended (GET_MODE (subreg),
+				   GET_MODE (SUBREG_REG (subreg)));
+  return mre != UNKNOWN;
+}
+
+/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
+   OFFSET.  Return true if successful.  */
+
+static bool
+reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
+{
+  rtx reg;
+
+  if (REG_P (use))
+    {
+      *regno = REGNO (use);
+      *offset = 0;
+      *size = GET_MODE_BITSIZE (GET_MODE (use));
+      return true;
+    }
+  else if (GET_CODE (use) == SUBREG)
+    {
+      reg = SUBREG_REG (use);
+
+      if (!REG_P (reg))
+	return false;
+
+      *regno = REGNO (reg);
+
+      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
+	{
+	  *offset = 0;
+	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
+	}
+      else
+	{
+	  *offset = subreg_lsb (use);
+	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
+	}
+
+      return true;
+    }
+
+  return false;
+}
+
+/* Create a new empty entry in the uses[REGNO] vector.  */
+
+static use_type *
+new_use (unsigned int regno)
+{
+  if (uses[regno] == NULL)
+    uses[regno] = VEC_alloc (use_type, heap, 4);
+
+  VEC_safe_push (use_type, heap, uses[regno], NULL);
+
+  return VEC_last (use_type, uses[regno]);
+}
+
+/* Register a USE of reg REGNO with SIZE and OFFSET.  */
+
+static use_type *
+register_use (int size, unsigned int regno, int offset, rtx *use)
+{
+  int *current;
+  use_type *p;
+
+  gcc_assert (size >= 0);
+  gcc_assert (regno < (unsigned int)n_regs);
+
+  if (skip_reg_p (regno))
+    return NULL;
+
+  p = new_use (regno);
+  p->regno = regno;
+  p->size = size;
+  p->offset = offset;
+  p->use = use;
+
+  /* Update the bigest use.  */
+  current = &biggest_use[regno];
+  *current = MAX (*current, size);
+
+  return p;
+}
+
+/* Handle embedded uses in USE, which is a part of PATTERN.  */
+
+static void
+note_embedded_uses (rtx use, rtx pattern)
+{
+  const char *format_ptr;
+  int i, j;
+
+  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
+  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
+    if (format_ptr[i] == 'e')
+      note_use (&XEXP (use, i), pattern);
+    else if (format_ptr[i] == 'E')
+      for (j = 0; j < XVECLEN (use, i); j++)
+	note_use (&XVECEXP (use, i, j), pattern);
+}
+
+/* Get the set in PATTERN that has USE as its src operand.  */
+
+static rtx
+get_set (rtx use, rtx pattern)
+{
+  rtx sub;
+  int i;
+
+  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
+    return pattern;
+
+  if (GET_CODE (pattern) == PARALLEL)
+    for (i = 0; i < XVECLEN (pattern, 0); ++i)
+      {
+	sub = XVECEXP (pattern, 0, i);
+	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
+	  return sub;
+      }
+
+  return NULL_RTX;
+}
+
+/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
+   a part of PATTERN.  In this context restricted means that a bit in
+   an operand influences only the same bit or more significant bits in the
+   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
+
+static void
+note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
+{
+  unsigned int i, smallest;
+  int operand_size[2];
+  int operand_offset[2];
+  int used_size;
+  unsigned int operand_regno[2];
+  bool operand_reg[2];
+  bool operand_ignore[2];
+  use_type *p;
+
+  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
+  for (i = 0; i < nr_operands; ++i)
+    {
+      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
+				  &operand_regno[i], &operand_offset[i]);
+      operand_ignore[i] = false;
+    }
+
+  /* Handle case of reg and-masked with const.  */
+  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
+    {
+      used_size =
+	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
+      operand_size[0] = MIN (operand_size[0], used_size);
+    }
+
+  /* Handle case of reg or-masked with const.  */
+  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
+    {
+      used_size =
+	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
+      operand_size[0] = MIN (operand_size[0], used_size);
+    }
+
+  /* Ignore the use of a in 'a = a + b'.  */
+  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
+  if (set != NULL_RTX && REG_P (SET_DEST (set)))
+    for (i = 0; i < nr_operands; ++i)
+      operand_ignore[i] = (operand_reg[i]
+			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
+
+  /* Handle the case a reg is combined with don't care bits.  */
+  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
+      && operand_size[0] != operand_size[1])
+    {
+      smallest = operand_size[0] > operand_size[1];
+
+      if (paradoxical_subreg_p (XEXP (use, smallest)))
+	operand_size[1 - smallest] = operand_size[smallest];
+    }
+
+  /* Register the operand use, if necessary.  */
+  for (i = 0; i < nr_operands; ++i)
+    if (!operand_reg[i])
+      note_use (&XEXP (use, i), pattern);
+    else if (!operand_ignore[i])
+      {
+	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
+			  &XEXP (use, i));
+	register_prop (set, p);
+      }
+}
+
+/* Register promoted SUBREG in promoted_subreg.  */
+
+static void
+register_promoted_subreg (rtx subreg)
+{
+  int index = REGNO (SUBREG_REG (subreg));
+
+  if (promoted_subreg[index] == NULL)
+    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
+
+  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
+}
+
+/* Note promoted subregs in X.  */
+
+static int
+note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
+{
+  rtx subreg = *x;
+
+  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
+      && REG_P (SUBREG_REG (subreg)))
+    register_promoted_subreg (subreg);
+
+  return 0;
+}
+
+/* Handle use X in pattern DATA noted by note_uses.  */
+
+static void
+note_use (rtx *x, void *data)
+{
+  rtx use = *x;
+  rtx pattern = (rtx)data;
+  int use_size, use_offset;
+  unsigned int use_regno;
+  rtx set;
+  use_type *p;
+
+  for_each_rtx (x, note_promoted_subreg, NULL);
+
+  set = get_set (use, pattern);
+
+  switch (GET_CODE (use))
+    {
+    case REG:
+    case SUBREG:
+      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
+	{
+	  note_embedded_uses (use, pattern);
+	  return;
+	}
+      p = register_use (use_size, use_regno, use_offset, x);
+      register_prop (set, p);
+      return;
+    case SIGN_EXTEND:
+    case ZERO_EXTEND:
+      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
+	{
+	  note_embedded_uses (use, pattern);
+	  return;
+	}
+      p = register_use (use_size, use_regno, use_offset, x);
+      register_prop (set, p);
+      return;
+    case IOR:
+    case AND:
+    case XOR:
+    case PLUS:
+    case MINUS:
+      note_restricted_op_use (set, use, 2, pattern);
+      return;
+    case NOT:
+    case NEG:
+      note_restricted_op_use (set, use, 1, pattern);
+      return;
+    case ASHIFT:
+      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
+	  || !CONST_INT_P (XEXP (use, 1))
+	  || INTVAL (XEXP (use, 1)) <= 0
+	  || paradoxical_subreg_p (XEXP (use, 0)))
+	{
+	  note_embedded_uses (use, pattern);
+	  return;
+	}
+      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
+			  use_offset, x);
+      return;
+    default:
+      note_embedded_uses (use, pattern);
+      return;
+    }
+}
+
+/* Check whether reg REGNO is implicitly used.  */
+
+static bool
+implicit_use_p (int regno ATTRIBUTE_UNUSED)
+{
+#ifdef EPILOGUE_USES
+  if (EPILOGUE_USES (regno))
+    return true;
+#endif
+
+#ifdef EH_USES
+  if (EH_USES (regno))
+    return true;
+#endif
+
+  return false;
+}
+
+/* Check whether reg REGNO should be skipped in analysis.  */
+
+static bool
+skip_reg_p (int regno)
+{
+  /* TODO: handle hard registers.  The problem with hard registers is that
+     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
+     We don't handle that properly.  */
+  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
+}
+
+/* Note the uses of argument registers in call INSN.  */
+
+static void
+note_call_uses (rtx insn)
+{
+  rtx link, link_expr;
+
+  if (!CALL_P (insn))
+    return;
+
+  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
+    {
+      link_expr = XEXP (link, 0);
+
+      if (GET_CODE (link_expr) == USE)
+	note_use (&XEXP (link_expr, 0), link);
+    }
+}
+
+/* Dump the biggest uses found.  */
+
+static void
+dump_biggest_use (void)
+{
+  int i;
+
+  if (!dump_file)
+    return;
+
+  fprintf (dump_file, "biggest_use:\n");
+
+  for (i = 0; i < n_regs; i++)
+    if (biggest_use[i] > 0)
+      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
+
+  fprintf (dump_file, "\n");
+}
+
+/* Calculate the biggest use mode for all regs.  */
+
+static void
+calculate_biggest_use (void)
+{
+  basic_block bb;
+  rtx insn;
+
+  /* For all insns, call note_use for each use in insn.  */
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      {
+	if (!NONDEBUG_INSN_P (insn))
+	  continue;
+
+	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
+
+	if (CALL_P (insn))
+	  note_call_uses (insn);
+      }
+
+  dump_biggest_use ();
+}
+
+/* Register a propagation USE in SET in the props vector.  */
+
+static void
+register_prop (rtx set, use_type *use)
+{
+  prop_type *p;
+  int regno;
+
+  if (set == NULL_RTX || use == NULL)
+    return;
+
+  if (!REG_P (SET_DEST (set)))
+    return;
+
+  regno = REGNO (SET_DEST (set));
+
+  if (skip_reg_p (regno))
+    return;
+
+  if (props[regno] == NULL)
+    props[regno] = VEC_alloc (prop_type, heap, 4);
+
+  VEC_safe_push (prop_type, heap, props[regno], NULL);
+  p = VEC_last (prop_type, props[regno]);
+  p->set = set;
+  p->uses_regno = use->regno;
+  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
+}
+
+/* Add REGNO to the worklist.  */
+
+static void
+add_to_wl (int regno)
+{
+  if (in_wl[regno])
+    return;
+
+  if (biggest_use[regno] > 0
+      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
+    return;
+
+  if (VEC_empty (prop_type, props[regno]))
+    return;
+
+  if (propagated_size[regno] != NONE
+      && propagated_size[regno] == biggest_use[regno])
+    return;
+
+  VEC_safe_push (int, heap, wl, regno);
+  in_wl[regno] = true;
+}
+
+/* Pop a reg from the worklist and return it.  */
+
+static int
+pop_wl (void)
+{
+  int regno = VEC_pop (int, wl);
+  in_wl[regno] = false;
+  return regno;
+}
+
+/* Propagate the use size DEST_SIZE of a reg to use P.  */
+
+static int
+propagate_size (int dest_size, use_type *p)
+{
+  if (dest_size == 0)
+    return 0;
+
+  return p->offset + MIN (p->size - p->offset, dest_size);
+}
+
+/* Get the biggest use of REGNO from the uses vector.  */
+
+static int
+get_biggest_use (unsigned int regno)
+{
+  int ix;
+  use_type *p;
+  int max = 0;
+
+  gcc_assert (uses[regno] != NULL);
+
+  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
+    max = MAX (max, p->size);
+
+  return max;
+}
+
+/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
+
+static void
+propagate_to_use (int dest_size, use_type *use)
+{
+  int new_use_size;
+  int prev_biggest_use;
+  int *current;
+
+  new_use_size = propagate_size (dest_size, use);
+
+  if (new_use_size >= use->size)
+    return;
+
+  use->size = new_use_size;
+
+  current = &biggest_use[use->regno];
+
+  prev_biggest_use = *current;
+  *current = get_biggest_use (use->regno);
+
+  if (*current >= prev_biggest_use)
+    return;
+
+  add_to_wl (use->regno);
+
+  if (dump_file)
+    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
+	     *current);
+
+}
+
+/* Propagate the biggest use of a reg REGNO to all its uses, and note
+   propagations in NR_PROPAGATIONS.  */
+
+static void
+propagate_to_uses (int regno, int *nr_propagations)
+{
+  int ix;
+  prop_type *p;
+
+  gcc_assert (!(propagated_size[regno] == NONE
+		&& propagated_size[regno] == biggest_use[regno]));
+
+  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
+    {
+      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
+      propagate_to_use (biggest_use[regno], use);
+      ++(*nr_propagations);
+    }
+
+  propagated_size[regno] = biggest_use[regno];
+}
+
+/* Improve biggest_use array iteratively.  */
+
+static void
+propagate (void)
+{
+  int i;
+  int nr_propagations = 0;
+
+  /* Initialize work list.  */
+
+  for (i = 0; i < n_regs; ++i)
+    add_to_wl (i);
+
+  /* Work the work list.  */
+
+  if (dump_file)
+    fprintf (dump_file, "propagations: \n");
+  while (!VEC_empty (int, wl))
+    propagate_to_uses (pop_wl (), &nr_propagations);
+
+  if (dump_file)
+    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
+}
+
+/* Check whether this is a sign/zero extension.  */
+
+static bool
+extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
+{
+  rtx src, op0;
+
+  /* Detect set of reg.  */
+  if (GET_CODE (PATTERN (insn)) != SET)
+    return false;
+
+  src = SET_SRC (PATTERN (insn));
+  *dest = SET_DEST (PATTERN (insn));
+
+  if (!REG_P (*dest))
+    return false;
+
+  /* Detect sign or zero extension.  */
+  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
+      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
+    {
+      op0 = XEXP (src, 0);
+
+      /* Determine amount of least significant bits preserved by operation.  */
+      if (GET_CODE (src) == AND)
+	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
+      else
+	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
+
+      if (GET_CODE (op0) == SUBREG)
+	{
+	  if (subreg_lsb (op0) != 0)
+	    return false;
+
+	  *inner = SUBREG_REG (op0);
+
+	  if (GET_MODE_CLASS (GET_MODE (*dest))
+	      != GET_MODE_CLASS (GET_MODE (*inner)))
+	    return false;
+
+	  return true;
+	}
+      else if (REG_P (op0))
+	{
+	  *inner = op0;
+
+	  if (GET_MODE_CLASS (GET_MODE (*dest))
+	      != GET_MODE_CLASS (GET_MODE (*inner)))
+	    return false;
+
+	  return true;
+	}
+      else if (GET_CODE (op0) == TRUNCATE)
+	{
+	  *inner = XEXP (op0, 0);
+	  return true;
+	}
+    }
+
+  return false;
+}
+
+/* Find extensions and store them in the extensions vector.  */
+
+static bool
+find_extensions (void)
+{
+  basic_block bb;
+  rtx insn, dest, inner;
+  int preserved_size;
+
+  /* For all insns, call note_use for each use in insn.  */
+  FOR_EACH_BB (bb)
+    FOR_BB_INSNS (bb, insn)
+      {
+	if (!NONDEBUG_INSN_P (insn))
+	  continue;
+
+	if (!extension_p (insn, &dest, &inner, &preserved_size))
+	  continue;
+
+	VEC_safe_push (rtx, heap, extensions, insn);
+
+	if (dump_file)
+	  fprintf (dump_file,
+		   "found extension %u with preserved size %d defining"
+		   " reg %d\n",
+		   INSN_UID (insn), preserved_size, REGNO (dest));
+      }
+
+  if (dump_file)
+    {
+      if (!VEC_empty (rtx, extensions))
+	fprintf (dump_file, "\n");
+      else
+	fprintf (dump_file, "no extensions found.\n");
+    }
+
+  return !VEC_empty (rtx, extensions);
+}
+
+/* Check whether this is a redundant sign/zero extension.  */
+
+static bool
+redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
+{
+  int biggest_dest_use;
+
+  if (!extension_p (insn, dest, inner, preserved_size))
+    gcc_unreachable ();
+
+  biggest_dest_use = biggest_use[REGNO (*dest)];
+
+  if (biggest_dest_use == SKIP_REG)
+    return false;
+
+  if (*preserved_size < biggest_dest_use)
+    return false;
+
+  return true;
+}
+
+/* Find the redundant extensions in the extensions vector and move them to the
+   redundant_extensions vector.  */
+
+static void
+find_redundant_extensions (void)
+{
+  rtx insn, dest, inner;
+  int ix;
+  bool found = false;
+  int preserved_size;
+
+  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
+    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
+      {
+	VEC_safe_push (rtx, heap, redundant_extensions, insn);
+	VEC_unordered_remove (rtx, extensions, ix);
+
+	if (dump_file)
+	  fprintf (dump_file,
+		   "found redundant extension %u with preserved size %d"
+		   " defining reg %d\n",
+		   INSN_UID (insn), preserved_size, REGNO (dest));
+	found = true;
+      }
+
+  if (dump_file && found)
+    fprintf (dump_file, "\n");
+}
+
+/* Reset promotion of subregs or REG.  */
+
+static void
+reset_promoted_subreg (rtx reg)
+{
+  int ix;
+  rtx subreg;
+
+  if (promoted_subreg[REGNO (reg)] == NULL)
+    return;
+
+  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
+    {
+      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
+      SUBREG_PROMOTED_VAR_P (subreg) = 0;
+    }
+
+  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
+}
+
+/* Try to remove or replace the redundant extension INSN which extends INNER and
+   writes to DEST.  */
+
+static void
+try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
+{
+  rtx cp_src, cp_dest, seq = NULL_RTX, one;
+
+  /* Check whether replacement is needed.  */
+  if (dest != inner)
+    {
+      start_sequence ();
+
+      /* Determine the proper replacement operation.  */
+      if (GET_MODE (dest) == GET_MODE (inner))
+	{
+	  cp_src = inner;
+	  cp_dest = dest;
+	}
+      else if (GET_MODE_SIZE (GET_MODE (dest))
+	       > GET_MODE_SIZE (GET_MODE (inner)))
+	{
+	  emit_clobber (dest);
+	  cp_src = inner;
+	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
+	}
+      else
+	{
+	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
+	  cp_dest = dest;
+	}
+
+      emit_move_insn (cp_dest, cp_src);
+
+      seq = get_insns ();
+      end_sequence ();
+
+      /* If the replacement is not supported, bail out.  */
+      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
+	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
+	  return;
+
+      /* Insert the replacement.  */
+      emit_insn_before (seq, insn);
+    }
+
+  /* Note replacement/removal in the dump.  */
+  if (dump_file)
+    {
+      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
+      if (dest != inner)
+	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
+      else
+	fprintf (dump_file, "removed\n");
+    }
+
+  /* Remove the extension.  */
+  delete_insn (insn);
+
+  reset_promoted_subreg (dest);
+}
+
+/* Setup the variables at the start of the pass.  */
+
+static void
+init_pass (void)
+{
+  int i;
+
+  biggest_use = XNEWVEC (int, n_regs);
+  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
+  propagated_size = XNEWVEC (int, n_regs);
+
+  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
+     handle that reg conservatively and set it to SKIP_REG instead.  */
+  for (i = 0; i < n_regs; i++)
+    {
+      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
+      propagated_size[i] = NONE;
+    }
+
+  extensions = VEC_alloc (rtx, heap, 10);
+  redundant_extensions = VEC_alloc (rtx, heap, 10);
+
+  wl = VEC_alloc (int, heap, 50);
+  in_wl = XNEWVEC (bool, n_regs);
+
+  uses = XNEWVEC (typeof (*uses), n_regs);
+  props = XNEWVEC (typeof (*props), n_regs);
+
+  for (i = 0; i < n_regs; ++i)
+    {
+      uses[i] = NULL;
+      props[i] = NULL;
+      in_wl[i] = false;
+    }
+}
+
+/* Find redundant extensions and remove or replace them if possible.  */
+
+static void
+remove_redundant_extensions (void)
+{
+  rtx insn, dest, inner;
+  int preserved_size;
+  int ix;
+
+  if (!find_extensions ())
+    return;
+
+  calculate_biggest_use ();
+
+  find_redundant_extensions ();
+
+  if (!VEC_empty (rtx, extensions))
+    {
+      propagate ();
+
+      find_redundant_extensions ();
+    }
+
+  gcc_checking_assert (n_regs == max_reg_num ());
+
+  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
+    {
+      extension_p (insn, &dest, &inner, &preserved_size);
+      try_remove_or_replace_extension (insn, dest, inner);
+    }
+
+  if (dump_file)
+    fprintf (dump_file, "\n");
+}
+
+/* Free the variables at the end of the pass.  */
+
+static void
+finish_pass (void)
+{
+  int i;
+
+  XDELETEVEC (propagated_size);
+
+  VEC_free (rtx, heap, extensions);
+  VEC_free (rtx, heap, redundant_extensions);
+
+  VEC_free (int, heap, wl);
+
+  for (i = 0; i < n_regs; ++i)
+    {
+      if (uses[i] != NULL)
+	VEC_free (use_type, heap, uses[i]);
+
+      if (props[i] != NULL)
+	VEC_free (prop_type, heap, props[i]);
+    }
+
+  XDELETEVEC (uses);
+  XDELETEVEC (props);
+  XDELETEVEC (biggest_use);
+
+  for (i = 0; i < n_regs; ++i)
+    if (promoted_subreg[i] != NULL)
+      VEC_free (rtx, heap, promoted_subreg[i]);
+  XDELETEVEC (promoted_subreg);
+}
+
+/* Remove redundant extensions.  */
+
+static unsigned int
+rest_of_handle_ee (void)
+{
+  n_regs = max_reg_num ();
+
+  init_pass ();
+  remove_redundant_extensions ();
+  finish_pass ();
+  return 0;
+}
+
+/* Run ee pass when flag_ee is set at optimization level > 0.  */
+
+static bool
+gate_handle_ee (void)
+{
+  return (optimize > 0 && flag_ee);
+}
+
+struct rtl_opt_pass pass_ee =
+{
+ {
+  RTL_PASS,
+  "ee",                                 /* name */
+  gate_handle_ee,                       /* gate */
+  rest_of_handle_ee,                    /* execute */
+  NULL,                                 /* sub */
+  NULL,                                 /* next */
+  0,                                    /* static_pass_number */
+  TV_EE,                                /* tv_id */
+  0,                                    /* properties_required */
+  0,                                    /* properties_provided */
+  0,                                    /* properties_destroyed */
+  0,                                    /* todo_flags_start */
+  TODO_ggc_collect |
+  TODO_verify_rtl_sharing,              /* todo_flags_finish */
+ }
+};
Index: gcc/common.opt
===================================================================
--- gcc/common.opt (revision 189409)
+++ gcc/common.opt (working copy)
@@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
 Common Report Var(flag_eliminate_dwarf2_dups)
 Perform DWARF2 duplicate elimination
 
+fextension-elimination
+Common Report Var(flag_ee) Init(0) Optimization
+Perform extension elimination
+
 fipa-sra
 Common Report Var(flag_ipa_sra) Init(0) Optimization
 Perform interprocedural reduction of aggregates
Index: gcc/Makefile.in
===================================================================
--- gcc/Makefile.in (revision 189409)
+++ gcc/Makefile.in (working copy)
@@ -1218,6 +1218,7 @@ OBJS = \
 	dwarf2asm.o \
 	dwarf2cfi.o \
 	dwarf2out.o \
+	ee.o \
 	ebitmap.o \
 	emit-rtl.o \
 	et-forest.o \
@@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
    $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
    intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
    $(DF_H) $(CFGLOOP_H)
+ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
+   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
+   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
+   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
+   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
+   $(PARAMS_H) $(CGRAPH_H)
 gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
    $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
    $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
Index: gcc/passes.c
===================================================================
--- gcc/passes.c (revision 189409)
+++ gcc/passes.c (working copy)
@@ -1552,6 +1552,7 @@ init_optimization_passes (void)
       NEXT_PASS (pass_initialize_regs);
       NEXT_PASS (pass_ud_rtl_dce);
       NEXT_PASS (pass_combine);
+      NEXT_PASS (pass_ee);
       NEXT_PASS (pass_if_after_combine);
       NEXT_PASS (pass_partition_blocks);
       NEXT_PASS (pass_regmove);

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-11 10:31             ` Tom de Vries
@ 2012-07-11 11:42               ` Jakub Jelinek
  2012-07-11 13:01                 ` Tom de Vries
  2012-07-12  1:52               ` Kenneth Zadeck
                                 ` (2 subsequent siblings)
  3 siblings, 1 reply; 43+ messages in thread
From: Jakub Jelinek @ 2012-07-11 11:42 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Eric Botcazou, Tom de Vries, Paolo Bonzini, gcc-patches, Bernd Schmidt

On Wed, Jul 11, 2012 at 12:30:12PM +0200, Tom de Vries wrote:
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
> 
> bootstrapped and reg-tested on x86_64 and i686.
> 
> build and reg-tested on mips, mips64, and arm.

How does this relate to the ree.c pass we already have?
Why is not REE sufficient for you?  Having two different zero/sign extension
elimination passes would be really wierd.

	Jakub

^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-11 11:42               ` Jakub Jelinek
@ 2012-07-11 13:01                 ` Tom de Vries
  0 siblings, 0 replies; 43+ messages in thread
From: Tom de Vries @ 2012-07-11 13:01 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Tom de Vries, Eric Botcazou, Tom de Vries, Paolo Bonzini,
	gcc-patches, Bernd Schmidt

On 11/07/12 13:41, Jakub Jelinek wrote:
> On Wed, Jul 11, 2012 at 12:30:12PM +0200, Tom de Vries wrote:
>> I've done the following:
>> - refactored the pass such that it now scans at most twice over all
>>   instructions.
>> - updated the patch to be applicable to current trunk
>> - updated the motivating example to a more applicable one (as discussed in
>>   this thread), and added that one as test-case.
>> - added a part in the header comment illustrating the working of the pass
>>   on the motivating example.
>>
>> bootstrapped and reg-tested on x86_64 and i686.
>>
>> build and reg-tested on mips, mips64, and arm.
> 
> How does this relate to the ree.c pass we already have?
> Why is not REE sufficient for you?  Having two different zero/sign extension
> elimination passes would be really wierd.
> 

Jakub,

pass_ree eliminates extensions by merging an extension with the definitions of
the register it extends. So pass_ree can perhaps be described as an inter-bb
combiner that is targeted at extensions.

AFAIU there is no analysis in pass_ree that concludes that an extension is
redundant. If it manages to combine an extension with all the insns feeding into
the extension, the extension has been made redundant.
Both redundant and non-redundant extensions can be eliminated by pass_ree.

pass_ee does an analysis of what parts of registers are used, and concludes
based on that analysis that an extension is redundant, meaning it can be
replaced by a regcopy without changing the semantics of the program.

We currently describe pass_ree as 'redundant extension elimination',
and pass_ee as 'extension elimination'.
Perhaps 'redundant extension elimination' is a more appropriate name for pass_ee
and pass_ree is better described as 'extension combiner'.

In the motivating example of the pass, there are 2 extensions. Those extensions
cannot be combined into the insns feeding into them (or into the insns they feed
into). They are redundant however, something which is analyzed by pass_ee. It
replaces the 2 extensions with regcopies, and in the resulting assembly the 2
redundant extensions are removed.

Thanks,
- Tom

> 	Jakub
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-11 10:31             ` Tom de Vries
  2012-07-11 11:42               ` Jakub Jelinek
@ 2012-07-12  1:52               ` Kenneth Zadeck
       [not found]               ` <4FFE2ADF.2060806@naturalbridge.com>
  2012-08-20 13:40               ` Tom de Vries
  3 siblings, 0 replies; 43+ messages in thread
From: Kenneth Zadeck @ 2012-07-12  1:52 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Eric Botcazou, tom, gcc-patches, Paolo Bonzini

Tom,

I have a problem with the approach that you have taken here.   I believe 
that this could be a very useful addition to gcc so I am in general very 
supportive, but i think you are missing an important case.

My problem is that it the pass does not actually look at the target and 
make any decisions based on that target.

for instance, we have a llp64 target.   As with many targets, the target 
has a rich set of compare and branch instructions.  In particular, it 
can do both 32 and 64 bit comparisons.    We see that many of the 
upstream optimizations that take int (SI mode) index variables generate 
extension operations before doing 64 bit compare and branch 
instructions, even though there are 32 bit comparison and branches on 
the machine.     There are a lot of machines that can do more than one 
size of comparison.

This optimization pass, as it is currently written will not remove those 
extensions because it believes that the length of the destination is the 
"final answer" unless it is wrapped in an explicit truncation.    
Instead it needs to ask the port if there is a shorted compare and 
branch instruction that does not cost more. in that case, those 
instructions should be rewritten to use the shorted compare and branch.

There are many operations other than compare and branch where the pass 
should be asking "can i shorten the target for free and therefore get 
rid of the extension?"   right shifts, rotates, and stores are not in 
this class, but left shifts are as are all comparisons, compare and 
branches, conditional moves.   There may even be machines that have this 
for divide, but i do not know of any off the top of my head.

What i am suggesting moves this pass into the target specific set of 
optimizations rather than target independent set, but at where this pass 
is to be put this is completely appropriate.    Any dest instruction 
where all of the operands have been extended should be checked to see if 
it was really necessary to use the longer form before doing the 
propagation pass.

kenny


On 07/11/2012 06:30 AM, Tom de Vries wrote:
> On 13/11/10 10:50, Eric Botcazou wrote:
>>> I profiled the pass on spec2000:
>>>
>>>                     -mabi=32     -mabi=64
>>> ee-pass (usr time):     0.70         1.16
>>> total   (usr time):   919.30       879.26
>>> ee-pass        (%):     0.08         0.13
>>>
>>> The pass takes 0.13% or less of the total usr runtime.
>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>
>>> Is it necessary to improve the runtime of this pass?
>> I've already given my opinion about the implementation.  The other passes in
>> the compiler try hard not to rescan everything when a single bit changes; as
>> currently written, yours doesn't.
>>
> Eric,
>
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
>
> bootstrapped and reg-tested on x86_64 and i686.
>
> build and reg-tested on mips, mips64, and arm.
>
> OK for trunk?
>
> Thanks,
> - Tom
>
> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>
> 	* ee.c: New file.
> 	* tree-pass.h (pass_ee): Declare.
> 	* opts.c ( default_options_table): Set flag_ee at -O2.
> 	* timevar.def (TV_EE): New timevar.
> 	* common.opt (fextension-elimination): New option.
> 	* Makefile.in (ee.o): New rule.
> 	* passes.c (pass_ee): Add it.
>
> 	* gcc.dg/extend-1.c: New test.
> 	* gcc.dg/extend-2.c: Same.
> 	* gcc.dg/extend-2-64.c: Same.
> 	* gcc.dg/extend-3.c: Same.
> 	* gcc.dg/extend-4.c: Same.
> 	* gcc.dg/extend-5.c: Same.
> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
> Index: gcc/tree-pass.h
> ===================================================================
> --- gcc/tree-pass.h (revision 189409)
> +++ gcc/tree-pass.h (working copy)
> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>
> extern struct rtl_opt_pass pass_expand;
> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
> +extern struct rtl_opt_pass pass_ee;
> extern struct rtl_opt_pass pass_rtl_fwprop;
> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
> extern struct rtl_opt_pass pass_jump;
> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
> ===================================================================
> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
> @@ -5,19 +5,19 @@
> /* { dg-final { scan-assembler "\tbnel\t" } } */
> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>
> -NOMIPS16 int
> -f (int n, int i)
> +NOMIPS16 long int
> +f (long int n, long int i)
> {
> -  int s = 0;
> +  long int s = 0;
>    for (; i & 1; i++)
>      s += i;
>    return s;
> }
>
> -NOMIPS16 int
> -g (int n, int i)
> +NOMIPS16 long int
> +g (long int n, long int i)
> {
> -  int s = 0;
> +  long int s = 0;
>    for (i = 0; i < n; i++)
>      s += i;
>    return s;
> Index: gcc/testsuite/gcc.dg/extend-4.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +
> +unsigned char f(unsigned int a, int c)
> +{
> +  unsigned int b = a;
> +  if (c)
> +    b = a & 0x10ff;
> +  return b;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/testsuite/gcc.dg/extend-1.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +
> +void f(unsigned char * p, short s, int c, int *z)
> +{
> +  if (c)
> +    *z = 0;
> +  *p ^= (unsigned char)s;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> Index: gcc/testsuite/gcc.dg/extend-5.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +
> +void f (short d[2][2])
> +{
> +  int d0 = d[0][0] + d[0][1];
> +  int d1 = d[1][0] + d[1][1];
> +  d[0][0] = d0 + d1;
> +      d[0][1] = d0 - d1;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> Index: gcc/testsuite/gcc.dg/extend-2.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee" } */
> +/* { dg-require-effective-target ilp32 } */
> +
> +void f(unsigned char * p, short *s, int c)
> +{
> +  short or = 0;
> +  while (c)
> +    {
> +      or = or | s[c];
> +      c --;
> +    }
> +  *p = (unsigned char)or;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/testsuite/gcc.dg/extend-2-64.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
> +/* { dg-require-effective-target mips64 } */
> +
> +void f(unsigned char * p, short *s, int c)
> +{
> +  short or = 0;
> +  while (c)
> +    {
> +      or = or | s[c];
> +      c --;
> +    }
> +  *p = (unsigned char)or;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/testsuite/gcc.dg/extend-3.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
> @@ -0,0 +1,13 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
> +/* { dg-require-effective-target mips64 } */
> +
> +unsigned int f(unsigned char byte)
> +{
> +  return byte << 25;
> +}
> +
> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
> +/* { dg-final { cleanup-rtl-dump "ee" } } */
> +
> Index: gcc/opts.c
> ===================================================================
> --- gcc/opts.c (revision 189409)
> +++ gcc/opts.c (working copy)
> @@ -490,6 +490,7 @@ static const struct default_options defa
>      { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>      { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>      { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>
>      /* -O3 optimizations.  */
>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
> Index: gcc/timevar.def
> ===================================================================
> --- gcc/timevar.def (revision 189409)
> +++ gcc/timevar.def (working copy)
> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
> DEFTIMEVAR (TV_VARCONST              , "varconst")
> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
> DEFTIMEVAR (TV_JUMP                  , "jump")
> +DEFTIMEVAR (TV_EE                    , "extension elimination")
> DEFTIMEVAR (TV_FWPROP                , "forward prop")
> DEFTIMEVAR (TV_CSE                   , "CSE")
> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
> Index: gcc/ee.c
> ===================================================================
> --- /dev/null (new file)
> +++ gcc/ee.c (revision 0)
> @@ -0,0 +1,1190 @@
> +/* Redundant extension elimination.
> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
> +   Contributed by Tom de Vries (tom@codesourcery.com)
> +
> +This file is part of GCC.
> +
> +GCC is free software; you can redistribute it and/or modify it under
> +the terms of the GNU General Public License as published by the Free
> +Software Foundation; either version 3, or (at your option) any later
> +version.
> +
> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
> +for more details.
> +
> +You should have received a copy of the GNU General Public License
> +along with GCC; see the file COPYING3.  If not see
> +<http://www.gnu.org/licenses/>.  */
> +
> +/*
> +
> +  MOTIVATING EXAMPLE
> +
> +  The motivating example for this pass is the example from PR 40893:
> +
> +    void f (short d[2][2])
> +    {
> +      int d0 = d[0][0] + d[0][1];
> +      int d1 = d[1][0] + d[1][1];
> +      d[0][0] = d0 + d1;
> +      d[0][1] = d0 - d1;
> +    }
> +
> +  For MIPS, compilation results in the following insns.
> +
> +    (set (reg:SI 204)
> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
> +
> +    (set (reg:SI 205)
> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
> +
> +    (set (reg:SI 217)
> +         (plus:SI (reg:SI 205)
> +                  (reg:SI 204)))
> +
> +    (set (reg:SI 218)
> +         (minus:SI (reg:SI 204)
> +                   (reg:SI 205)))
> +
> +    (set (mem:HI (reg/v/f:SI 210))
> +         (subreg:HI (reg:SI 217) 2))
> +
> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
> +                 (const_int 2 [0x2])))
> +         (subreg:HI (reg:SI 218) 2))
> +
> +
> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
> +  are the only uses.  And the plus and minus operators belong to the class of
> +  operators where a bit in the result is only influenced by same-or-less
> +  significant bitss in the operands, so the plus and minus insns only use the
> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
> +  204 and 205, so the zero_extends are redundant.
> +
> +
> +  INTENDED EFFECT
> +
> +  This pass works by removing sign/zero-extensions, or replacing them with
> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
> +  than the extension.
> +
> +
> +  IMPLEMENTATION
> +
> +  The pass scans at most two times over all instructions.
> +
> +  The first scan collects all extensions.  If there are no extensions, we're
> +  done.
> +
> +  The second scan registers all uses of a reg in the biggest_use array.
> +  Additionally, it registers how the use size of a pseudo is propagated to the
> +  operands of the insns defining the pseudo.
> +
> +  The biggest_use array now contains the size in bits of the biggest use
> +  of each reg, which allows us to find redundant extensions.
> +
> +  If there are still non-redundant extensions left, we use the propagation
> +  information in an iterative fashion to improve the biggest_use array, after
> +  which we may find more redundant extensions.
> +
> +  Finally, redundant extensions are deleted or replaced.
> +
> +  In case that the src and dest reg of the replacement are not of the same size,
> +  we do not replace with a normal regcopy, but with a truncate or with the copy
> +  of a paradoxical subreg instead.
> +
> +
> +  ILLUSTRATION OF PASS
> +
> +  The dump of the pass shows us how the pass works on the motivating example.
> +
> +  We find the 2 extensions:
> +    found extension with preserved size 16 defining reg 204
> +    found extension with preserved size 16 defining reg 205
> +
> +  We calculate the biggests uses of a register:
> +    biggest_use
> +    reg 204: size 32
> +    reg 205: size 32
> +    reg 217: size 16
> +    reg 218: size 16
> +
> +  We propagate the biggest uses where possible:
> +    propagations
> +    205: 32 -> 16
> +    204: 32 -> 16
> +    214: 32 -> 16
> +    215: 32 -> 16
> +
> +  We conclude that the extensions are redundant:
> +    found redundant extension with preserved size 16 defining reg 205
> +    found redundant extension with preserved size 16 defining reg 204
> +
> +  And we replace them with regcopies:
> +    (set (reg:SI 204)
> +        (reg:SI 213))
> +
> +    (set (reg:SI 205)
> +        (reg:SI 216))
> +
> +
> +  LIMITATIONS
> +
> +  The scope of the analysis is limited to an extension and its uses.  The other
> +  type of analysis (related to the defs of the operand of an extension) is not
> +  done.
> +
> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
> +  whether an extension is redundant, we take all uses of a dest reg into
> +  account, also the ones that are not uses of the extension.
> +  The consideration is that using use-def chains will give a more precise
> +  analysis, but is much more expensive in terms of runtime.  */
> +
> +#include "config.h"
> +#include "system.h"
> +#include "coretypes.h"
> +#include "tm.h"
> +#include "rtl.h"
> +#include "tree.h"
> +#include "tm_p.h"
> +#include "flags.h"
> +#include "regs.h"
> +#include "hard-reg-set.h"
> +#include "basic-block.h"
> +#include "insn-config.h"
> +#include "function.h"
> +#include "expr.h"
> +#include "insn-attr.h"
> +#include "recog.h"
> +#include "toplev.h"
> +#include "target.h"
> +#include "timevar.h"
> +#include "optabs.h"
> +#include "insn-codes.h"
> +#include "rtlhooks-def.h"
> +#include "output.h"
> +#include "params.h"
> +#include "timevar.h"
> +#include "tree-pass.h"
> +#include "cgraph.h"
> +#include "vec.h"
> +
> +#define SKIP_REG (-1)
> +#define NONE (-1)
> +
> +/* Number of registers at start of pass.  */
> +
> +static int n_regs;
> +
> +/* Array to register the biggest use of a reg, in bits.  */
> +
> +static int *biggest_use;
> +
> +/* Array to register the promoted subregs.  */
> +
> +static VEC (rtx,heap) **promoted_subreg;
> +
> +/* Array to register for a reg what the last propagated size is.  */
> +
> +static int *propagated_size;
> +
> +typedef struct use
> +{
> +  int regno;
> +  int size;
> +  int offset;
> +  rtx *use;
> +} use_type;
> +
> +DEF_VEC_O(use_type);
> +DEF_VEC_ALLOC_O(use_type,heap);
> +
> +/* Vector to register the uses.  */
> +
> +static VEC (use_type,heap) **uses;
> +
> +typedef struct prop
> +{
> +  rtx set;
> +  int uses_regno;
> +  int uses_index;
> +} prop_type;
> +
> +DEF_VEC_O(prop_type);
> +DEF_VEC_ALLOC_O(prop_type,heap);
> +
> +/* Vector to register the propagations.  */
> +
> +static VEC (prop_type,heap) **props;
> +
> +/* Work list for propragation.  */
> +
> +static VEC (int,heap) *wl;
> +
> +/* Array to register what regs are in the work list.  */
> +
> +static bool *in_wl;
> +
> +/* Vector that contains the extensions in the function.  */
> +
> +static VEC (rtx,heap) *extensions;
> +
> +/* Vector that contains the extensions in the function that are going to be
> +   removed or replaced.  */
> +
> +static VEC (rtx,heap) *redundant_extensions;
> +
> +/* Forward declaration.  */
> +
> +static void note_use (rtx *x, void *data);
> +static bool skip_reg_p (int regno);
> +static void register_prop (rtx set, use_type *use);
> +
> +/* Check whether SUBREG is a promoted subreg.  */
> +
> +static bool
> +promoted_subreg_p (rtx subreg)
> +{
> +  return (GET_CODE (subreg) == SUBREG
> +	  && SUBREG_PROMOTED_VAR_P (subreg));
> +}
> +
> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
> +   promotion.  */
> +
> +static bool
> +fixed_promoted_subreg_p (rtx subreg)
> +{
> +  int mre;
> +
> +  if (!promoted_subreg_p (subreg))
> +    return false;
> +
> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
> +				   GET_MODE (SUBREG_REG (subreg)));
> +  return mre != UNKNOWN;
> +}
> +
> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
> +   OFFSET.  Return true if successful.  */
> +
> +static bool
> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
> +{
> +  rtx reg;
> +
> +  if (REG_P (use))
> +    {
> +      *regno = REGNO (use);
> +      *offset = 0;
> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
> +      return true;
> +    }
> +  else if (GET_CODE (use) == SUBREG)
> +    {
> +      reg = SUBREG_REG (use);
> +
> +      if (!REG_P (reg))
> +	return false;
> +
> +      *regno = REGNO (reg);
> +
> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
> +	{
> +	  *offset = 0;
> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
> +	}
> +      else
> +	{
> +	  *offset = subreg_lsb (use);
> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
> +	}
> +
> +      return true;
> +    }
> +
> +  return false;
> +}
> +
> +/* Create a new empty entry in the uses[REGNO] vector.  */
> +
> +static use_type *
> +new_use (unsigned int regno)
> +{
> +  if (uses[regno] == NULL)
> +    uses[regno] = VEC_alloc (use_type, heap, 4);
> +
> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
> +
> +  return VEC_last (use_type, uses[regno]);
> +}
> +
> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
> +
> +static use_type *
> +register_use (int size, unsigned int regno, int offset, rtx *use)
> +{
> +  int *current;
> +  use_type *p;
> +
> +  gcc_assert (size >= 0);
> +  gcc_assert (regno < (unsigned int)n_regs);
> +
> +  if (skip_reg_p (regno))
> +    return NULL;
> +
> +  p = new_use (regno);
> +  p->regno = regno;
> +  p->size = size;
> +  p->offset = offset;
> +  p->use = use;
> +
> +  /* Update the bigest use.  */
> +  current = &biggest_use[regno];
> +  *current = MAX (*current, size);
> +
> +  return p;
> +}
> +
> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
> +
> +static void
> +note_embedded_uses (rtx use, rtx pattern)
> +{
> +  const char *format_ptr;
> +  int i, j;
> +
> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
> +    if (format_ptr[i] == 'e')
> +      note_use (&XEXP (use, i), pattern);
> +    else if (format_ptr[i] == 'E')
> +      for (j = 0; j < XVECLEN (use, i); j++)
> +	note_use (&XVECEXP (use, i, j), pattern);
> +}
> +
> +/* Get the set in PATTERN that has USE as its src operand.  */
> +
> +static rtx
> +get_set (rtx use, rtx pattern)
> +{
> +  rtx sub;
> +  int i;
> +
> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
> +    return pattern;
> +
> +  if (GET_CODE (pattern) == PARALLEL)
> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
> +      {
> +	sub = XVECEXP (pattern, 0, i);
> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
> +	  return sub;
> +      }
> +
> +  return NULL_RTX;
> +}
> +
> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
> +   a part of PATTERN.  In this context restricted means that a bit in
> +   an operand influences only the same bit or more significant bits in the
> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
> +
> +static void
> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
> +{
> +  unsigned int i, smallest;
> +  int operand_size[2];
> +  int operand_offset[2];
> +  int used_size;
> +  unsigned int operand_regno[2];
> +  bool operand_reg[2];
> +  bool operand_ignore[2];
> +  use_type *p;
> +
> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
> +  for (i = 0; i < nr_operands; ++i)
> +    {
> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
> +				  &operand_regno[i], &operand_offset[i]);
> +      operand_ignore[i] = false;
> +    }
> +
> +  /* Handle case of reg and-masked with const.  */
> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
> +    {
> +      used_size =
> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
> +      operand_size[0] = MIN (operand_size[0], used_size);
> +    }
> +
> +  /* Handle case of reg or-masked with const.  */
> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
> +    {
> +      used_size =
> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
> +      operand_size[0] = MIN (operand_size[0], used_size);
> +    }
> +
> +  /* Ignore the use of a in 'a = a + b'.  */
> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
> +    for (i = 0; i < nr_operands; ++i)
> +      operand_ignore[i] = (operand_reg[i]
> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
> +
> +  /* Handle the case a reg is combined with don't care bits.  */
> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
> +      && operand_size[0] != operand_size[1])
> +    {
> +      smallest = operand_size[0] > operand_size[1];
> +
> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
> +	operand_size[1 - smallest] = operand_size[smallest];
> +    }
> +
> +  /* Register the operand use, if necessary.  */
> +  for (i = 0; i < nr_operands; ++i)
> +    if (!operand_reg[i])
> +      note_use (&XEXP (use, i), pattern);
> +    else if (!operand_ignore[i])
> +      {
> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
> +			  &XEXP (use, i));
> +	register_prop (set, p);
> +      }
> +}
> +
> +/* Register promoted SUBREG in promoted_subreg.  */
> +
> +static void
> +register_promoted_subreg (rtx subreg)
> +{
> +  int index = REGNO (SUBREG_REG (subreg));
> +
> +  if (promoted_subreg[index] == NULL)
> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
> +
> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
> +}
> +
> +/* Note promoted subregs in X.  */
> +
> +static int
> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
> +{
> +  rtx subreg = *x;
> +
> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
> +      && REG_P (SUBREG_REG (subreg)))
> +    register_promoted_subreg (subreg);
> +
> +  return 0;
> +}
> +
> +/* Handle use X in pattern DATA noted by note_uses.  */
> +
> +static void
> +note_use (rtx *x, void *data)
> +{
> +  rtx use = *x;
> +  rtx pattern = (rtx)data;
> +  int use_size, use_offset;
> +  unsigned int use_regno;
> +  rtx set;
> +  use_type *p;
> +
> +  for_each_rtx (x, note_promoted_subreg, NULL);
> +
> +  set = get_set (use, pattern);
> +
> +  switch (GET_CODE (use))
> +    {
> +    case REG:
> +    case SUBREG:
> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
> +	{
> +	  note_embedded_uses (use, pattern);
> +	  return;
> +	}
> +      p = register_use (use_size, use_regno, use_offset, x);
> +      register_prop (set, p);
> +      return;
> +    case SIGN_EXTEND:
> +    case ZERO_EXTEND:
> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
> +	{
> +	  note_embedded_uses (use, pattern);
> +	  return;
> +	}
> +      p = register_use (use_size, use_regno, use_offset, x);
> +      register_prop (set, p);
> +      return;
> +    case IOR:
> +    case AND:
> +    case XOR:
> +    case PLUS:
> +    case MINUS:
> +      note_restricted_op_use (set, use, 2, pattern);
> +      return;
> +    case NOT:
> +    case NEG:
> +      note_restricted_op_use (set, use, 1, pattern);
> +      return;
> +    case ASHIFT:
> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
> +	  || !CONST_INT_P (XEXP (use, 1))
> +	  || INTVAL (XEXP (use, 1)) <= 0
> +	  || paradoxical_subreg_p (XEXP (use, 0)))
> +	{
> +	  note_embedded_uses (use, pattern);
> +	  return;
> +	}
> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
> +			  use_offset, x);
> +      return;
> +    default:
> +      note_embedded_uses (use, pattern);
> +      return;
> +    }
> +}
> +
> +/* Check whether reg REGNO is implicitly used.  */
> +
> +static bool
> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
> +{
> +#ifdef EPILOGUE_USES
> +  if (EPILOGUE_USES (regno))
> +    return true;
> +#endif
> +
> +#ifdef EH_USES
> +  if (EH_USES (regno))
> +    return true;
> +#endif
> +
> +  return false;
> +}
> +
> +/* Check whether reg REGNO should be skipped in analysis.  */
> +
> +static bool
> +skip_reg_p (int regno)
> +{
> +  /* TODO: handle hard registers.  The problem with hard registers is that
> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
> +     We don't handle that properly.  */
> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
> +}
> +
> +/* Note the uses of argument registers in call INSN.  */
> +
> +static void
> +note_call_uses (rtx insn)
> +{
> +  rtx link, link_expr;
> +
> +  if (!CALL_P (insn))
> +    return;
> +
> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
> +    {
> +      link_expr = XEXP (link, 0);
> +
> +      if (GET_CODE (link_expr) == USE)
> +	note_use (&XEXP (link_expr, 0), link);
> +    }
> +}
> +
> +/* Dump the biggest uses found.  */
> +
> +static void
> +dump_biggest_use (void)
> +{
> +  int i;
> +
> +  if (!dump_file)
> +    return;
> +
> +  fprintf (dump_file, "biggest_use:\n");
> +
> +  for (i = 0; i < n_regs; i++)
> +    if (biggest_use[i] > 0)
> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
> +
> +  fprintf (dump_file, "\n");
> +}
> +
> +/* Calculate the biggest use mode for all regs.  */
> +
> +static void
> +calculate_biggest_use (void)
> +{
> +  basic_block bb;
> +  rtx insn;
> +
> +  /* For all insns, call note_use for each use in insn.  */
> +  FOR_EACH_BB (bb)
> +    FOR_BB_INSNS (bb, insn)
> +      {
> +	if (!NONDEBUG_INSN_P (insn))
> +	  continue;
> +
> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
> +
> +	if (CALL_P (insn))
> +	  note_call_uses (insn);
> +      }
> +
> +  dump_biggest_use ();
> +}
> +
> +/* Register a propagation USE in SET in the props vector.  */
> +
> +static void
> +register_prop (rtx set, use_type *use)
> +{
> +  prop_type *p;
> +  int regno;
> +
> +  if (set == NULL_RTX || use == NULL)
> +    return;
> +
> +  if (!REG_P (SET_DEST (set)))
> +    return;
> +
> +  regno = REGNO (SET_DEST (set));
> +
> +  if (skip_reg_p (regno))
> +    return;
> +
> +  if (props[regno] == NULL)
> +    props[regno] = VEC_alloc (prop_type, heap, 4);
> +
> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
> +  p = VEC_last (prop_type, props[regno]);
> +  p->set = set;
> +  p->uses_regno = use->regno;
> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
> +}
> +
> +/* Add REGNO to the worklist.  */
> +
> +static void
> +add_to_wl (int regno)
> +{
> +  if (in_wl[regno])
> +    return;
> +
> +  if (biggest_use[regno] > 0
> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
> +    return;
> +
> +  if (VEC_empty (prop_type, props[regno]))
> +    return;
> +
> +  if (propagated_size[regno] != NONE
> +      && propagated_size[regno] == biggest_use[regno])
> +    return;
> +
> +  VEC_safe_push (int, heap, wl, regno);
> +  in_wl[regno] = true;
> +}
> +
> +/* Pop a reg from the worklist and return it.  */
> +
> +static int
> +pop_wl (void)
> +{
> +  int regno = VEC_pop (int, wl);
> +  in_wl[regno] = false;
> +  return regno;
> +}
> +
> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
> +
> +static int
> +propagate_size (int dest_size, use_type *p)
> +{
> +  if (dest_size == 0)
> +    return 0;
> +
> +  return p->offset + MIN (p->size - p->offset, dest_size);
> +}
> +
> +/* Get the biggest use of REGNO from the uses vector.  */
> +
> +static int
> +get_biggest_use (unsigned int regno)
> +{
> +  int ix;
> +  use_type *p;
> +  int max = 0;
> +
> +  gcc_assert (uses[regno] != NULL);
> +
> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
> +    max = MAX (max, p->size);
> +
> +  return max;
> +}
> +
> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
> +
> +static void
> +propagate_to_use (int dest_size, use_type *use)
> +{
> +  int new_use_size;
> +  int prev_biggest_use;
> +  int *current;
> +
> +  new_use_size = propagate_size (dest_size, use);
> +
> +  if (new_use_size >= use->size)
> +    return;
> +
> +  use->size = new_use_size;
> +
> +  current = &biggest_use[use->regno];
> +
> +  prev_biggest_use = *current;
> +  *current = get_biggest_use (use->regno);
> +
> +  if (*current >= prev_biggest_use)
> +    return;
> +
> +  add_to_wl (use->regno);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
> +	     *current);
> +
> +}
> +
> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
> +   propagations in NR_PROPAGATIONS.  */
> +
> +static void
> +propagate_to_uses (int regno, int *nr_propagations)
> +{
> +  int ix;
> +  prop_type *p;
> +
> +  gcc_assert (!(propagated_size[regno] == NONE
> +		&& propagated_size[regno] == biggest_use[regno]));
> +
> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
> +    {
> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
> +      propagate_to_use (biggest_use[regno], use);
> +      ++(*nr_propagations);
> +    }
> +
> +  propagated_size[regno] = biggest_use[regno];
> +}
> +
> +/* Improve biggest_use array iteratively.  */
> +
> +static void
> +propagate (void)
> +{
> +  int i;
> +  int nr_propagations = 0;
> +
> +  /* Initialize work list.  */
> +
> +  for (i = 0; i < n_regs; ++i)
> +    add_to_wl (i);
> +
> +  /* Work the work list.  */
> +
> +  if (dump_file)
> +    fprintf (dump_file, "propagations: \n");
> +  while (!VEC_empty (int, wl))
> +    propagate_to_uses (pop_wl (), &nr_propagations);
> +
> +  if (dump_file)
> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
> +}
> +
> +/* Check whether this is a sign/zero extension.  */
> +
> +static bool
> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
> +{
> +  rtx src, op0;
> +
> +  /* Detect set of reg.  */
> +  if (GET_CODE (PATTERN (insn)) != SET)
> +    return false;
> +
> +  src = SET_SRC (PATTERN (insn));
> +  *dest = SET_DEST (PATTERN (insn));
> +
> +  if (!REG_P (*dest))
> +    return false;
> +
> +  /* Detect sign or zero extension.  */
> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
> +    {
> +      op0 = XEXP (src, 0);
> +
> +      /* Determine amount of least significant bits preserved by operation.  */
> +      if (GET_CODE (src) == AND)
> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
> +      else
> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
> +
> +      if (GET_CODE (op0) == SUBREG)
> +	{
> +	  if (subreg_lsb (op0) != 0)
> +	    return false;
> +
> +	  *inner = SUBREG_REG (op0);
> +
> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
> +	    return false;
> +
> +	  return true;
> +	}
> +      else if (REG_P (op0))
> +	{
> +	  *inner = op0;
> +
> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
> +	    return false;
> +
> +	  return true;
> +	}
> +      else if (GET_CODE (op0) == TRUNCATE)
> +	{
> +	  *inner = XEXP (op0, 0);
> +	  return true;
> +	}
> +    }
> +
> +  return false;
> +}
> +
> +/* Find extensions and store them in the extensions vector.  */
> +
> +static bool
> +find_extensions (void)
> +{
> +  basic_block bb;
> +  rtx insn, dest, inner;
> +  int preserved_size;
> +
> +  /* For all insns, call note_use for each use in insn.  */
> +  FOR_EACH_BB (bb)
> +    FOR_BB_INSNS (bb, insn)
> +      {
> +	if (!NONDEBUG_INSN_P (insn))
> +	  continue;
> +
> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
> +	  continue;
> +
> +	VEC_safe_push (rtx, heap, extensions, insn);
> +
> +	if (dump_file)
> +	  fprintf (dump_file,
> +		   "found extension %u with preserved size %d defining"
> +		   " reg %d\n",
> +		   INSN_UID (insn), preserved_size, REGNO (dest));
> +      }
> +
> +  if (dump_file)
> +    {
> +      if (!VEC_empty (rtx, extensions))
> +	fprintf (dump_file, "\n");
> +      else
> +	fprintf (dump_file, "no extensions found.\n");
> +    }
> +
> +  return !VEC_empty (rtx, extensions);
> +}
> +
> +/* Check whether this is a redundant sign/zero extension.  */
> +
> +static bool
> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
> +{
> +  int biggest_dest_use;
> +
> +  if (!extension_p (insn, dest, inner, preserved_size))
> +    gcc_unreachable ();
> +
> +  biggest_dest_use = biggest_use[REGNO (*dest)];
> +
> +  if (biggest_dest_use == SKIP_REG)
> +    return false;
> +
> +  if (*preserved_size < biggest_dest_use)
> +    return false;
> +
> +  return true;
> +}
> +
> +/* Find the redundant extensions in the extensions vector and move them to the
> +   redundant_extensions vector.  */
> +
> +static void
> +find_redundant_extensions (void)
> +{
> +  rtx insn, dest, inner;
> +  int ix;
> +  bool found = false;
> +  int preserved_size;
> +
> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
> +      {
> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
> +	VEC_unordered_remove (rtx, extensions, ix);
> +
> +	if (dump_file)
> +	  fprintf (dump_file,
> +		   "found redundant extension %u with preserved size %d"
> +		   " defining reg %d\n",
> +		   INSN_UID (insn), preserved_size, REGNO (dest));
> +	found = true;
> +      }
> +
> +  if (dump_file && found)
> +    fprintf (dump_file, "\n");
> +}
> +
> +/* Reset promotion of subregs or REG.  */
> +
> +static void
> +reset_promoted_subreg (rtx reg)
> +{
> +  int ix;
> +  rtx subreg;
> +
> +  if (promoted_subreg[REGNO (reg)] == NULL)
> +    return;
> +
> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
> +    {
> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
> +    }
> +
> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
> +}
> +
> +/* Try to remove or replace the redundant extension INSN which extends INNER and
> +   writes to DEST.  */
> +
> +static void
> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
> +{
> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
> +
> +  /* Check whether replacement is needed.  */
> +  if (dest != inner)
> +    {
> +      start_sequence ();
> +
> +      /* Determine the proper replacement operation.  */
> +      if (GET_MODE (dest) == GET_MODE (inner))
> +	{
> +	  cp_src = inner;
> +	  cp_dest = dest;
> +	}
> +      else if (GET_MODE_SIZE (GET_MODE (dest))
> +	       > GET_MODE_SIZE (GET_MODE (inner)))
> +	{
> +	  emit_clobber (dest);
> +	  cp_src = inner;
> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
> +	}
> +      else
> +	{
> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
> +	  cp_dest = dest;
> +	}
> +
> +      emit_move_insn (cp_dest, cp_src);
> +
> +      seq = get_insns ();
> +      end_sequence ();
> +
> +      /* If the replacement is not supported, bail out.  */
> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
> +	  return;
> +
> +      /* Insert the replacement.  */
> +      emit_insn_before (seq, insn);
> +    }
> +
> +  /* Note replacement/removal in the dump.  */
> +  if (dump_file)
> +    {
> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
> +      if (dest != inner)
> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
> +      else
> +	fprintf (dump_file, "removed\n");
> +    }
> +
> +  /* Remove the extension.  */
> +  delete_insn (insn);
> +
> +  reset_promoted_subreg (dest);
> +}
> +
> +/* Setup the variables at the start of the pass.  */
> +
> +static void
> +init_pass (void)
> +{
> +  int i;
> +
> +  biggest_use = XNEWVEC (int, n_regs);
> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
> +  propagated_size = XNEWVEC (int, n_regs);
> +
> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
> +     handle that reg conservatively and set it to SKIP_REG instead.  */
> +  for (i = 0; i < n_regs; i++)
> +    {
> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
> +      propagated_size[i] = NONE;
> +    }
> +
> +  extensions = VEC_alloc (rtx, heap, 10);
> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
> +
> +  wl = VEC_alloc (int, heap, 50);
> +  in_wl = XNEWVEC (bool, n_regs);
> +
> +  uses = XNEWVEC (typeof (*uses), n_regs);
> +  props = XNEWVEC (typeof (*props), n_regs);
> +
> +  for (i = 0; i < n_regs; ++i)
> +    {
> +      uses[i] = NULL;
> +      props[i] = NULL;
> +      in_wl[i] = false;
> +    }
> +}
> +
> +/* Find redundant extensions and remove or replace them if possible.  */
> +
> +static void
> +remove_redundant_extensions (void)
> +{
> +  rtx insn, dest, inner;
> +  int preserved_size;
> +  int ix;
> +
> +  if (!find_extensions ())
> +    return;
> +
> +  calculate_biggest_use ();
> +
> +  find_redundant_extensions ();
> +
> +  if (!VEC_empty (rtx, extensions))
> +    {
> +      propagate ();
> +
> +      find_redundant_extensions ();
> +    }
> +
> +  gcc_checking_assert (n_regs == max_reg_num ());
> +
> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
> +    {
> +      extension_p (insn, &dest, &inner, &preserved_size);
> +      try_remove_or_replace_extension (insn, dest, inner);
> +    }
> +
> +  if (dump_file)
> +    fprintf (dump_file, "\n");
> +}
> +
> +/* Free the variables at the end of the pass.  */
> +
> +static void
> +finish_pass (void)
> +{
> +  int i;
> +
> +  XDELETEVEC (propagated_size);
> +
> +  VEC_free (rtx, heap, extensions);
> +  VEC_free (rtx, heap, redundant_extensions);
> +
> +  VEC_free (int, heap, wl);
> +
> +  for (i = 0; i < n_regs; ++i)
> +    {
> +      if (uses[i] != NULL)
> +	VEC_free (use_type, heap, uses[i]);
> +
> +      if (props[i] != NULL)
> +	VEC_free (prop_type, heap, props[i]);
> +    }
> +
> +  XDELETEVEC (uses);
> +  XDELETEVEC (props);
> +  XDELETEVEC (biggest_use);
> +
> +  for (i = 0; i < n_regs; ++i)
> +    if (promoted_subreg[i] != NULL)
> +      VEC_free (rtx, heap, promoted_subreg[i]);
> +  XDELETEVEC (promoted_subreg);
> +}
> +
> +/* Remove redundant extensions.  */
> +
> +static unsigned int
> +rest_of_handle_ee (void)
> +{
> +  n_regs = max_reg_num ();
> +
> +  init_pass ();
> +  remove_redundant_extensions ();
> +  finish_pass ();
> +  return 0;
> +}
> +
> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
> +
> +static bool
> +gate_handle_ee (void)
> +{
> +  return (optimize > 0 && flag_ee);
> +}
> +
> +struct rtl_opt_pass pass_ee =
> +{
> + {
> +  RTL_PASS,
> +  "ee",                                 /* name */
> +  gate_handle_ee,                       /* gate */
> +  rest_of_handle_ee,                    /* execute */
> +  NULL,                                 /* sub */
> +  NULL,                                 /* next */
> +  0,                                    /* static_pass_number */
> +  TV_EE,                                /* tv_id */
> +  0,                                    /* properties_required */
> +  0,                                    /* properties_provided */
> +  0,                                    /* properties_destroyed */
> +  0,                                    /* todo_flags_start */
> +  TODO_ggc_collect |
> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
> + }
> +};
> Index: gcc/common.opt
> ===================================================================
> --- gcc/common.opt (revision 189409)
> +++ gcc/common.opt (working copy)
> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
> Common Report Var(flag_eliminate_dwarf2_dups)
> Perform DWARF2 duplicate elimination
>
> +fextension-elimination
> +Common Report Var(flag_ee) Init(0) Optimization
> +Perform extension elimination
> +
> fipa-sra
> Common Report Var(flag_ipa_sra) Init(0) Optimization
> Perform interprocedural reduction of aggregates
> Index: gcc/Makefile.in
> ===================================================================
> --- gcc/Makefile.in (revision 189409)
> +++ gcc/Makefile.in (working copy)
> @@ -1218,6 +1218,7 @@ OBJS = \
> 	dwarf2asm.o \
> 	dwarf2cfi.o \
> 	dwarf2out.o \
> +	ee.o \
> 	ebitmap.o \
> 	emit-rtl.o \
> 	et-forest.o \
> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>     $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>     intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>     $(DF_H) $(CFGLOOP_H)
> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
> +   $(PARAMS_H) $(CGRAPH_H)
> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>     $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>     $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
> Index: gcc/passes.c
> ===================================================================
> --- gcc/passes.c (revision 189409)
> +++ gcc/passes.c (working copy)
> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>        NEXT_PASS (pass_initialize_regs);
>        NEXT_PASS (pass_ud_rtl_dce);
>        NEXT_PASS (pass_combine);
> +      NEXT_PASS (pass_ee);
>        NEXT_PASS (pass_if_after_combine);
>        NEXT_PASS (pass_partition_blocks);
>        NEXT_PASS (pass_regmove);


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
       [not found]                 ` <4FFE9346.2070806@mentor.com>
@ 2012-07-12  9:21                   ` Tom de Vries
  2012-07-12 12:05                     ` Kenneth Zadeck
  0 siblings, 1 reply; 43+ messages in thread
From: Tom de Vries @ 2012-07-12  9:21 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

Kenneth,

I see I replied to your original message that had the wrong CC, I'm now CC-ing
gcc-patches@gcc.gnu.org.

Thanks,
- Tom

On 12/07/12 11:05, Tom de Vries wrote:
> On 12/07/12 03:39, Kenneth Zadeck wrote:
>> Tom,
>>
>> I have a problem with the approach that you have taken here.   I believe 
>> that this could be a very useful addition to gcc so I am in general very 
>> supportive, but i think you are missing an important case.
>>
>> My problem is that it the pass does not actually look at the target and 
>> make any decisions based on that target.
>>
>> for instance, we have a llp64 target.   As with many targets, the target 
>> has a rich set of compare and branch instructions.  In particular, it 
>> can do both 32 and 64 bit comparisons.    We see that many of the 
>> upstream optimizations that take int (SI mode) index variables generate 
>> extension operations before doing 64 bit compare and branch 
>> instructions, even though there are 32 bit comparison and branches on 
>> the machine.     There are a lot of machines that can do more than one 
>> size of comparison.
>>
> 	 This optimization pass, as it is currently written will not remove those
>> extensions because it believes that the length of the destination is the 
>> "final answer" unless it is wrapped in an explicit truncation.
>> Instead it needs to ask the port if there is a shorted compare and 
>> branch instruction that does not cost more. in that case, those 
>> instructions should be rewritten to use the shorted compare and branch.
>>
>> There are many operations other than compare and branch where the pass 
>> should be asking "can i shorten the target for free and therefore get 
>> rid of the extension?"
> 
> Kenneth,
> 
> I'm not sure I understand the optimization you're talking about, in particular
> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
> is the same.
> 
> Assuming it's the same, my understanding is that you're talking about an example
> like this:
> ...
>   (insn (set (reg:DI 5)
>              (zero_extend:DI (reg:SI 4))))
> 
>   (jump_insn (set (pc)
>                   (if_then_else (eq (reg:DI 5)
>                                     (const_int 0))
>                                 (label_ref:DI 62)
>                                 (pc))))
> 
>   ->
> 
>   (jump_insn (set (pc)
>                   (if_then_else (eq (reg:SI 4)
>                                     (const_int 0))
>                                 (label_ref:DI 62)
>                                 (pc))))
> 
> ...
> I would expect combine to optimize this.
> 
> In case I got the example all backwards or it is a too simple one, please
> provide an rtl example that illustrates the optimization.
> 
> Thanks,
> - Tom
> 
> 
>>  right shifts, rotates, and stores are not in 
>> this class, but left shifts are as are all comparisons, compare and 
>> branches, conditional moves.   There may even be machines that have this 
>> for divide, but i do not know of any off the top of my head.
>>
>> What i am suggesting moves this pass into the target specific set of 
>> optimizations rather than target independent set, but at where this pass 
>> is to be put this is completely appropriate.    Any dest instruction 
>> where all of the operands have been extended should be checked to see if 
>> it was really necessary to use the longer form before doing the 
>> propagation pass.
>>
>> kenny
>>
>>
>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>> I profiled the pass on spec2000:
>>>>>
>>>>>                     -mabi=32     -mabi=64
>>>>> ee-pass (usr time):     0.70         1.16
>>>>> total   (usr time):   919.30       879.26
>>>>> ee-pass        (%):     0.08         0.13
>>>>>
>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>
>>>>> Is it necessary to improve the runtime of this pass?
>>>> I've already given my opinion about the implementation.  The other passes in
>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>> currently written, yours doesn't.
>>>>
>>> Eric,
>>>
>>> I've done the following:
>>> - refactored the pass such that it now scans at most twice over all
>>>   instructions.
>>> - updated the patch to be applicable to current trunk
>>> - updated the motivating example to a more applicable one (as discussed in
>>>   this thread), and added that one as test-case.
>>> - added a part in the header comment illustrating the working of the pass
>>>   on the motivating example.
>>>
>>> bootstrapped and reg-tested on x86_64 and i686.
>>>
>>> build and reg-tested on mips, mips64, and arm.
>>>
>>> OK for trunk?
>>>
>>> Thanks,
>>> - Tom
>>>
>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>
>>> 	* ee.c: New file.
>>> 	* tree-pass.h (pass_ee): Declare.
>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>> 	* timevar.def (TV_EE): New timevar.
>>> 	* common.opt (fextension-elimination): New option.
>>> 	* Makefile.in (ee.o): New rule.
>>> 	* passes.c (pass_ee): Add it.
>>>
>>> 	* gcc.dg/extend-1.c: New test.
>>> 	* gcc.dg/extend-2.c: Same.
>>> 	* gcc.dg/extend-2-64.c: Same.
>>> 	* gcc.dg/extend-3.c: Same.
>>> 	* gcc.dg/extend-4.c: Same.
>>> 	* gcc.dg/extend-5.c: Same.
>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>> Index: gcc/tree-pass.h
>>> ===================================================================
>>> --- gcc/tree-pass.h (revision 189409)
>>> +++ gcc/tree-pass.h (working copy)
>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>
>>> extern struct rtl_opt_pass pass_expand;
>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>> +extern struct rtl_opt_pass pass_ee;
>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>> extern struct rtl_opt_pass pass_jump;
>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>> ===================================================================
>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>> @@ -5,19 +5,19 @@
>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>
>>> -NOMIPS16 int
>>> -f (int n, int i)
>>> +NOMIPS16 long int
>>> +f (long int n, long int i)
>>> {
>>> -  int s = 0;
>>> +  long int s = 0;
>>>    for (; i & 1; i++)
>>>      s += i;
>>>    return s;
>>> }
>>>
>>> -NOMIPS16 int
>>> -g (int n, int i)
>>> +NOMIPS16 long int
>>> +g (long int n, long int i)
>>> {
>>> -  int s = 0;
>>> +  long int s = 0;
>>>    for (i = 0; i < n; i++)
>>>      s += i;
>>>    return s;
>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>> @@ -0,0 +1,16 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +
>>> +unsigned char f(unsigned int a, int c)
>>> +{
>>> +  unsigned int b = a;
>>> +  if (c)
>>> +    b = a & 0x10ff;
>>> +  return b;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +
>>> +void f(unsigned char * p, short s, int c, int *z)
>>> +{
>>> +  if (c)
>>> +    *z = 0;
>>> +  *p ^= (unsigned char)s;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +
>>> +void f (short d[2][2])
>>> +{
>>> +  int d0 = d[0][0] + d[0][1];
>>> +  int d1 = d[1][0] + d[1][1];
>>> +  d[0][0] = d0 + d1;
>>> +      d[0][1] = d0 - d1;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>> @@ -0,0 +1,20 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>> +/* { dg-require-effective-target ilp32 } */
>>> +
>>> +void f(unsigned char * p, short *s, int c)
>>> +{
>>> +  short or = 0;
>>> +  while (c)
>>> +    {
>>> +      or = or | s[c];
>>> +      c --;
>>> +    }
>>> +  *p = (unsigned char)or;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>> @@ -0,0 +1,20 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>> +/* { dg-require-effective-target mips64 } */
>>> +
>>> +void f(unsigned char * p, short *s, int c)
>>> +{
>>> +  short or = 0;
>>> +  while (c)
>>> +    {
>>> +      or = or | s[c];
>>> +      c --;
>>> +    }
>>> +  *p = (unsigned char)or;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>> @@ -0,0 +1,13 @@
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>> +/* { dg-require-effective-target mips64 } */
>>> +
>>> +unsigned int f(unsigned char byte)
>>> +{
>>> +  return byte << 25;
>>> +}
>>> +
>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>> +
>>> Index: gcc/opts.c
>>> ===================================================================
>>> --- gcc/opts.c (revision 189409)
>>> +++ gcc/opts.c (working copy)
>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>      { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>      { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>      { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>
>>>      /* -O3 optimizations.  */
>>>      { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>> Index: gcc/timevar.def
>>> ===================================================================
>>> --- gcc/timevar.def (revision 189409)
>>> +++ gcc/timevar.def (working copy)
>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>> Index: gcc/ee.c
>>> ===================================================================
>>> --- /dev/null (new file)
>>> +++ gcc/ee.c (revision 0)
>>> @@ -0,0 +1,1190 @@
>>> +/* Redundant extension elimination.
>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>> +
>>> +This file is part of GCC.
>>> +
>>> +GCC is free software; you can redistribute it and/or modify it under
>>> +the terms of the GNU General Public License as published by the Free
>>> +Software Foundation; either version 3, or (at your option) any later
>>> +version.
>>> +
>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>> +for more details.
>>> +
>>> +You should have received a copy of the GNU General Public License
>>> +along with GCC; see the file COPYING3.  If not see
>>> +<http://www.gnu.org/licenses/>.  */
>>> +
>>> +/*
>>> +
>>> +  MOTIVATING EXAMPLE
>>> +
>>> +  The motivating example for this pass is the example from PR 40893:
>>> +
>>> +    void f (short d[2][2])
>>> +    {
>>> +      int d0 = d[0][0] + d[0][1];
>>> +      int d1 = d[1][0] + d[1][1];
>>> +      d[0][0] = d0 + d1;
>>> +      d[0][1] = d0 - d1;
>>> +    }
>>> +
>>> +  For MIPS, compilation results in the following insns.
>>> +
>>> +    (set (reg:SI 204)
>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>> +
>>> +    (set (reg:SI 205)
>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>> +
>>> +    (set (reg:SI 217)
>>> +         (plus:SI (reg:SI 205)
>>> +                  (reg:SI 204)))
>>> +
>>> +    (set (reg:SI 218)
>>> +         (minus:SI (reg:SI 204)
>>> +                   (reg:SI 205)))
>>> +
>>> +    (set (mem:HI (reg/v/f:SI 210))
>>> +         (subreg:HI (reg:SI 217) 2))
>>> +
>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>> +                 (const_int 2 [0x2])))
>>> +         (subreg:HI (reg:SI 218) 2))
>>> +
>>> +
>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>> +  operators where a bit in the result is only influenced by same-or-less
>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>> +  204 and 205, so the zero_extends are redundant.
>>> +
>>> +
>>> +  INTENDED EFFECT
>>> +
>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>> +  than the extension.
>>> +
>>> +
>>> +  IMPLEMENTATION
>>> +
>>> +  The pass scans at most two times over all instructions.
>>> +
>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>> +  done.
>>> +
>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>> +  operands of the insns defining the pseudo.
>>> +
>>> +  The biggest_use array now contains the size in bits of the biggest use
>>> +  of each reg, which allows us to find redundant extensions.
>>> +
>>> +  If there are still non-redundant extensions left, we use the propagation
>>> +  information in an iterative fashion to improve the biggest_use array, after
>>> +  which we may find more redundant extensions.
>>> +
>>> +  Finally, redundant extensions are deleted or replaced.
>>> +
>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>> +  of a paradoxical subreg instead.
>>> +
>>> +
>>> +  ILLUSTRATION OF PASS
>>> +
>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>> +
>>> +  We find the 2 extensions:
>>> +    found extension with preserved size 16 defining reg 204
>>> +    found extension with preserved size 16 defining reg 205
>>> +
>>> +  We calculate the biggests uses of a register:
>>> +    biggest_use
>>> +    reg 204: size 32
>>> +    reg 205: size 32
>>> +    reg 217: size 16
>>> +    reg 218: size 16
>>> +
>>> +  We propagate the biggest uses where possible:
>>> +    propagations
>>> +    205: 32 -> 16
>>> +    204: 32 -> 16
>>> +    214: 32 -> 16
>>> +    215: 32 -> 16
>>> +
>>> +  We conclude that the extensions are redundant:
>>> +    found redundant extension with preserved size 16 defining reg 205
>>> +    found redundant extension with preserved size 16 defining reg 204
>>> +
>>> +  And we replace them with regcopies:
>>> +    (set (reg:SI 204)
>>> +        (reg:SI 213))
>>> +
>>> +    (set (reg:SI 205)
>>> +        (reg:SI 216))
>>> +
>>> +
>>> +  LIMITATIONS
>>> +
>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>> +  done.
>>> +
>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>> +  account, also the ones that are not uses of the extension.
>>> +  The consideration is that using use-def chains will give a more precise
>>> +  analysis, but is much more expensive in terms of runtime.  */
>>> +
>>> +#include "config.h"
>>> +#include "system.h"
>>> +#include "coretypes.h"
>>> +#include "tm.h"
>>> +#include "rtl.h"
>>> +#include "tree.h"
>>> +#include "tm_p.h"
>>> +#include "flags.h"
>>> +#include "regs.h"
>>> +#include "hard-reg-set.h"
>>> +#include "basic-block.h"
>>> +#include "insn-config.h"
>>> +#include "function.h"
>>> +#include "expr.h"
>>> +#include "insn-attr.h"
>>> +#include "recog.h"
>>> +#include "toplev.h"
>>> +#include "target.h"
>>> +#include "timevar.h"
>>> +#include "optabs.h"
>>> +#include "insn-codes.h"
>>> +#include "rtlhooks-def.h"
>>> +#include "output.h"
>>> +#include "params.h"
>>> +#include "timevar.h"
>>> +#include "tree-pass.h"
>>> +#include "cgraph.h"
>>> +#include "vec.h"
>>> +
>>> +#define SKIP_REG (-1)
>>> +#define NONE (-1)
>>> +
>>> +/* Number of registers at start of pass.  */
>>> +
>>> +static int n_regs;
>>> +
>>> +/* Array to register the biggest use of a reg, in bits.  */
>>> +
>>> +static int *biggest_use;
>>> +
>>> +/* Array to register the promoted subregs.  */
>>> +
>>> +static VEC (rtx,heap) **promoted_subreg;
>>> +
>>> +/* Array to register for a reg what the last propagated size is.  */
>>> +
>>> +static int *propagated_size;
>>> +
>>> +typedef struct use
>>> +{
>>> +  int regno;
>>> +  int size;
>>> +  int offset;
>>> +  rtx *use;
>>> +} use_type;
>>> +
>>> +DEF_VEC_O(use_type);
>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>> +
>>> +/* Vector to register the uses.  */
>>> +
>>> +static VEC (use_type,heap) **uses;
>>> +
>>> +typedef struct prop
>>> +{
>>> +  rtx set;
>>> +  int uses_regno;
>>> +  int uses_index;
>>> +} prop_type;
>>> +
>>> +DEF_VEC_O(prop_type);
>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>> +
>>> +/* Vector to register the propagations.  */
>>> +
>>> +static VEC (prop_type,heap) **props;
>>> +
>>> +/* Work list for propragation.  */
>>> +
>>> +static VEC (int,heap) *wl;
>>> +
>>> +/* Array to register what regs are in the work list.  */
>>> +
>>> +static bool *in_wl;
>>> +
>>> +/* Vector that contains the extensions in the function.  */
>>> +
>>> +static VEC (rtx,heap) *extensions;
>>> +
>>> +/* Vector that contains the extensions in the function that are going to be
>>> +   removed or replaced.  */
>>> +
>>> +static VEC (rtx,heap) *redundant_extensions;
>>> +
>>> +/* Forward declaration.  */
>>> +
>>> +static void note_use (rtx *x, void *data);
>>> +static bool skip_reg_p (int regno);
>>> +static void register_prop (rtx set, use_type *use);
>>> +
>>> +/* Check whether SUBREG is a promoted subreg.  */
>>> +
>>> +static bool
>>> +promoted_subreg_p (rtx subreg)
>>> +{
>>> +  return (GET_CODE (subreg) == SUBREG
>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>> +}
>>> +
>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>> +   promotion.  */
>>> +
>>> +static bool
>>> +fixed_promoted_subreg_p (rtx subreg)
>>> +{
>>> +  int mre;
>>> +
>>> +  if (!promoted_subreg_p (subreg))
>>> +    return false;
>>> +
>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>> +  return mre != UNKNOWN;
>>> +}
>>> +
>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>> +   OFFSET.  Return true if successful.  */
>>> +
>>> +static bool
>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>> +{
>>> +  rtx reg;
>>> +
>>> +  if (REG_P (use))
>>> +    {
>>> +      *regno = REGNO (use);
>>> +      *offset = 0;
>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>> +      return true;
>>> +    }
>>> +  else if (GET_CODE (use) == SUBREG)
>>> +    {
>>> +      reg = SUBREG_REG (use);
>>> +
>>> +      if (!REG_P (reg))
>>> +	return false;
>>> +
>>> +      *regno = REGNO (reg);
>>> +
>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>> +	{
>>> +	  *offset = 0;
>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>> +	}
>>> +      else
>>> +	{
>>> +	  *offset = subreg_lsb (use);
>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>> +	}
>>> +
>>> +      return true;
>>> +    }
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>> +
>>> +static use_type *
>>> +new_use (unsigned int regno)
>>> +{
>>> +  if (uses[regno] == NULL)
>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>> +
>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>> +
>>> +  return VEC_last (use_type, uses[regno]);
>>> +}
>>> +
>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>> +
>>> +static use_type *
>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>> +{
>>> +  int *current;
>>> +  use_type *p;
>>> +
>>> +  gcc_assert (size >= 0);
>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>> +
>>> +  if (skip_reg_p (regno))
>>> +    return NULL;
>>> +
>>> +  p = new_use (regno);
>>> +  p->regno = regno;
>>> +  p->size = size;
>>> +  p->offset = offset;
>>> +  p->use = use;
>>> +
>>> +  /* Update the bigest use.  */
>>> +  current = &biggest_use[regno];
>>> +  *current = MAX (*current, size);
>>> +
>>> +  return p;
>>> +}
>>> +
>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>> +
>>> +static void
>>> +note_embedded_uses (rtx use, rtx pattern)
>>> +{
>>> +  const char *format_ptr;
>>> +  int i, j;
>>> +
>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>> +    if (format_ptr[i] == 'e')
>>> +      note_use (&XEXP (use, i), pattern);
>>> +    else if (format_ptr[i] == 'E')
>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>> +}
>>> +
>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>> +
>>> +static rtx
>>> +get_set (rtx use, rtx pattern)
>>> +{
>>> +  rtx sub;
>>> +  int i;
>>> +
>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>> +    return pattern;
>>> +
>>> +  if (GET_CODE (pattern) == PARALLEL)
>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>> +      {
>>> +	sub = XVECEXP (pattern, 0, i);
>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>> +	  return sub;
>>> +      }
>>> +
>>> +  return NULL_RTX;
>>> +}
>>> +
>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>> +   an operand influences only the same bit or more significant bits in the
>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>> +
>>> +static void
>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>> +{
>>> +  unsigned int i, smallest;
>>> +  int operand_size[2];
>>> +  int operand_offset[2];
>>> +  int used_size;
>>> +  unsigned int operand_regno[2];
>>> +  bool operand_reg[2];
>>> +  bool operand_ignore[2];
>>> +  use_type *p;
>>> +
>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>> +  for (i = 0; i < nr_operands; ++i)
>>> +    {
>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>> +				  &operand_regno[i], &operand_offset[i]);
>>> +      operand_ignore[i] = false;
>>> +    }
>>> +
>>> +  /* Handle case of reg and-masked with const.  */
>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>> +    {
>>> +      used_size =
>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>> +    }
>>> +
>>> +  /* Handle case of reg or-masked with const.  */
>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>> +    {
>>> +      used_size =
>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>> +    }
>>> +
>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>> +    for (i = 0; i < nr_operands; ++i)
>>> +      operand_ignore[i] = (operand_reg[i]
>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>> +
>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>> +      && operand_size[0] != operand_size[1])
>>> +    {
>>> +      smallest = operand_size[0] > operand_size[1];
>>> +
>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>> +    }
>>> +
>>> +  /* Register the operand use, if necessary.  */
>>> +  for (i = 0; i < nr_operands; ++i)
>>> +    if (!operand_reg[i])
>>> +      note_use (&XEXP (use, i), pattern);
>>> +    else if (!operand_ignore[i])
>>> +      {
>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>> +			  &XEXP (use, i));
>>> +	register_prop (set, p);
>>> +      }
>>> +}
>>> +
>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>> +
>>> +static void
>>> +register_promoted_subreg (rtx subreg)
>>> +{
>>> +  int index = REGNO (SUBREG_REG (subreg));
>>> +
>>> +  if (promoted_subreg[index] == NULL)
>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>> +
>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>> +}
>>> +
>>> +/* Note promoted subregs in X.  */
>>> +
>>> +static int
>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>> +{
>>> +  rtx subreg = *x;
>>> +
>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>> +      && REG_P (SUBREG_REG (subreg)))
>>> +    register_promoted_subreg (subreg);
>>> +
>>> +  return 0;
>>> +}
>>> +
>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>> +
>>> +static void
>>> +note_use (rtx *x, void *data)
>>> +{
>>> +  rtx use = *x;
>>> +  rtx pattern = (rtx)data;
>>> +  int use_size, use_offset;
>>> +  unsigned int use_regno;
>>> +  rtx set;
>>> +  use_type *p;
>>> +
>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>> +
>>> +  set = get_set (use, pattern);
>>> +
>>> +  switch (GET_CODE (use))
>>> +    {
>>> +    case REG:
>>> +    case SUBREG:
>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>> +	{
>>> +	  note_embedded_uses (use, pattern);
>>> +	  return;
>>> +	}
>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>> +      register_prop (set, p);
>>> +      return;
>>> +    case SIGN_EXTEND:
>>> +    case ZERO_EXTEND:
>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>> +	{
>>> +	  note_embedded_uses (use, pattern);
>>> +	  return;
>>> +	}
>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>> +      register_prop (set, p);
>>> +      return;
>>> +    case IOR:
>>> +    case AND:
>>> +    case XOR:
>>> +    case PLUS:
>>> +    case MINUS:
>>> +      note_restricted_op_use (set, use, 2, pattern);
>>> +      return;
>>> +    case NOT:
>>> +    case NEG:
>>> +      note_restricted_op_use (set, use, 1, pattern);
>>> +      return;
>>> +    case ASHIFT:
>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>> +	{
>>> +	  note_embedded_uses (use, pattern);
>>> +	  return;
>>> +	}
>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>> +			  use_offset, x);
>>> +      return;
>>> +    default:
>>> +      note_embedded_uses (use, pattern);
>>> +      return;
>>> +    }
>>> +}
>>> +
>>> +/* Check whether reg REGNO is implicitly used.  */
>>> +
>>> +static bool
>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>> +{
>>> +#ifdef EPILOGUE_USES
>>> +  if (EPILOGUE_USES (regno))
>>> +    return true;
>>> +#endif
>>> +
>>> +#ifdef EH_USES
>>> +  if (EH_USES (regno))
>>> +    return true;
>>> +#endif
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>> +
>>> +static bool
>>> +skip_reg_p (int regno)
>>> +{
>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>> +     We don't handle that properly.  */
>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>> +}
>>> +
>>> +/* Note the uses of argument registers in call INSN.  */
>>> +
>>> +static void
>>> +note_call_uses (rtx insn)
>>> +{
>>> +  rtx link, link_expr;
>>> +
>>> +  if (!CALL_P (insn))
>>> +    return;
>>> +
>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>> +    {
>>> +      link_expr = XEXP (link, 0);
>>> +
>>> +      if (GET_CODE (link_expr) == USE)
>>> +	note_use (&XEXP (link_expr, 0), link);
>>> +    }
>>> +}
>>> +
>>> +/* Dump the biggest uses found.  */
>>> +
>>> +static void
>>> +dump_biggest_use (void)
>>> +{
>>> +  int i;
>>> +
>>> +  if (!dump_file)
>>> +    return;
>>> +
>>> +  fprintf (dump_file, "biggest_use:\n");
>>> +
>>> +  for (i = 0; i < n_regs; i++)
>>> +    if (biggest_use[i] > 0)
>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>> +
>>> +  fprintf (dump_file, "\n");
>>> +}
>>> +
>>> +/* Calculate the biggest use mode for all regs.  */
>>> +
>>> +static void
>>> +calculate_biggest_use (void)
>>> +{
>>> +  basic_block bb;
>>> +  rtx insn;
>>> +
>>> +  /* For all insns, call note_use for each use in insn.  */
>>> +  FOR_EACH_BB (bb)
>>> +    FOR_BB_INSNS (bb, insn)
>>> +      {
>>> +	if (!NONDEBUG_INSN_P (insn))
>>> +	  continue;
>>> +
>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>> +
>>> +	if (CALL_P (insn))
>>> +	  note_call_uses (insn);
>>> +      }
>>> +
>>> +  dump_biggest_use ();
>>> +}
>>> +
>>> +/* Register a propagation USE in SET in the props vector.  */
>>> +
>>> +static void
>>> +register_prop (rtx set, use_type *use)
>>> +{
>>> +  prop_type *p;
>>> +  int regno;
>>> +
>>> +  if (set == NULL_RTX || use == NULL)
>>> +    return;
>>> +
>>> +  if (!REG_P (SET_DEST (set)))
>>> +    return;
>>> +
>>> +  regno = REGNO (SET_DEST (set));
>>> +
>>> +  if (skip_reg_p (regno))
>>> +    return;
>>> +
>>> +  if (props[regno] == NULL)
>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>> +
>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>> +  p = VEC_last (prop_type, props[regno]);
>>> +  p->set = set;
>>> +  p->uses_regno = use->regno;
>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>> +}
>>> +
>>> +/* Add REGNO to the worklist.  */
>>> +
>>> +static void
>>> +add_to_wl (int regno)
>>> +{
>>> +  if (in_wl[regno])
>>> +    return;
>>> +
>>> +  if (biggest_use[regno] > 0
>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>> +    return;
>>> +
>>> +  if (VEC_empty (prop_type, props[regno]))
>>> +    return;
>>> +
>>> +  if (propagated_size[regno] != NONE
>>> +      && propagated_size[regno] == biggest_use[regno])
>>> +    return;
>>> +
>>> +  VEC_safe_push (int, heap, wl, regno);
>>> +  in_wl[regno] = true;
>>> +}
>>> +
>>> +/* Pop a reg from the worklist and return it.  */
>>> +
>>> +static int
>>> +pop_wl (void)
>>> +{
>>> +  int regno = VEC_pop (int, wl);
>>> +  in_wl[regno] = false;
>>> +  return regno;
>>> +}
>>> +
>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>> +
>>> +static int
>>> +propagate_size (int dest_size, use_type *p)
>>> +{
>>> +  if (dest_size == 0)
>>> +    return 0;
>>> +
>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>> +}
>>> +
>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>> +
>>> +static int
>>> +get_biggest_use (unsigned int regno)
>>> +{
>>> +  int ix;
>>> +  use_type *p;
>>> +  int max = 0;
>>> +
>>> +  gcc_assert (uses[regno] != NULL);
>>> +
>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>> +    max = MAX (max, p->size);
>>> +
>>> +  return max;
>>> +}
>>> +
>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>> +
>>> +static void
>>> +propagate_to_use (int dest_size, use_type *use)
>>> +{
>>> +  int new_use_size;
>>> +  int prev_biggest_use;
>>> +  int *current;
>>> +
>>> +  new_use_size = propagate_size (dest_size, use);
>>> +
>>> +  if (new_use_size >= use->size)
>>> +    return;
>>> +
>>> +  use->size = new_use_size;
>>> +
>>> +  current = &biggest_use[use->regno];
>>> +
>>> +  prev_biggest_use = *current;
>>> +  *current = get_biggest_use (use->regno);
>>> +
>>> +  if (*current >= prev_biggest_use)
>>> +    return;
>>> +
>>> +  add_to_wl (use->regno);
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>> +	     *current);
>>> +
>>> +}
>>> +
>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>> +   propagations in NR_PROPAGATIONS.  */
>>> +
>>> +static void
>>> +propagate_to_uses (int regno, int *nr_propagations)
>>> +{
>>> +  int ix;
>>> +  prop_type *p;
>>> +
>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>> +
>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>> +    {
>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>> +      propagate_to_use (biggest_use[regno], use);
>>> +      ++(*nr_propagations);
>>> +    }
>>> +
>>> +  propagated_size[regno] = biggest_use[regno];
>>> +}
>>> +
>>> +/* Improve biggest_use array iteratively.  */
>>> +
>>> +static void
>>> +propagate (void)
>>> +{
>>> +  int i;
>>> +  int nr_propagations = 0;
>>> +
>>> +  /* Initialize work list.  */
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    add_to_wl (i);
>>> +
>>> +  /* Work the work list.  */
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "propagations: \n");
>>> +  while (!VEC_empty (int, wl))
>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>> +}
>>> +
>>> +/* Check whether this is a sign/zero extension.  */
>>> +
>>> +static bool
>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>> +{
>>> +  rtx src, op0;
>>> +
>>> +  /* Detect set of reg.  */
>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>> +    return false;
>>> +
>>> +  src = SET_SRC (PATTERN (insn));
>>> +  *dest = SET_DEST (PATTERN (insn));
>>> +
>>> +  if (!REG_P (*dest))
>>> +    return false;
>>> +
>>> +  /* Detect sign or zero extension.  */
>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>> +    {
>>> +      op0 = XEXP (src, 0);
>>> +
>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>> +      if (GET_CODE (src) == AND)
>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>> +      else
>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>> +
>>> +      if (GET_CODE (op0) == SUBREG)
>>> +	{
>>> +	  if (subreg_lsb (op0) != 0)
>>> +	    return false;
>>> +
>>> +	  *inner = SUBREG_REG (op0);
>>> +
>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>> +	    return false;
>>> +
>>> +	  return true;
>>> +	}
>>> +      else if (REG_P (op0))
>>> +	{
>>> +	  *inner = op0;
>>> +
>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>> +	    return false;
>>> +
>>> +	  return true;
>>> +	}
>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>> +	{
>>> +	  *inner = XEXP (op0, 0);
>>> +	  return true;
>>> +	}
>>> +    }
>>> +
>>> +  return false;
>>> +}
>>> +
>>> +/* Find extensions and store them in the extensions vector.  */
>>> +
>>> +static bool
>>> +find_extensions (void)
>>> +{
>>> +  basic_block bb;
>>> +  rtx insn, dest, inner;
>>> +  int preserved_size;
>>> +
>>> +  /* For all insns, call note_use for each use in insn.  */
>>> +  FOR_EACH_BB (bb)
>>> +    FOR_BB_INSNS (bb, insn)
>>> +      {
>>> +	if (!NONDEBUG_INSN_P (insn))
>>> +	  continue;
>>> +
>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>> +	  continue;
>>> +
>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>> +
>>> +	if (dump_file)
>>> +	  fprintf (dump_file,
>>> +		   "found extension %u with preserved size %d defining"
>>> +		   " reg %d\n",
>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>> +      }
>>> +
>>> +  if (dump_file)
>>> +    {
>>> +      if (!VEC_empty (rtx, extensions))
>>> +	fprintf (dump_file, "\n");
>>> +      else
>>> +	fprintf (dump_file, "no extensions found.\n");
>>> +    }
>>> +
>>> +  return !VEC_empty (rtx, extensions);
>>> +}
>>> +
>>> +/* Check whether this is a redundant sign/zero extension.  */
>>> +
>>> +static bool
>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>> +{
>>> +  int biggest_dest_use;
>>> +
>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>> +    gcc_unreachable ();
>>> +
>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>> +
>>> +  if (biggest_dest_use == SKIP_REG)
>>> +    return false;
>>> +
>>> +  if (*preserved_size < biggest_dest_use)
>>> +    return false;
>>> +
>>> +  return true;
>>> +}
>>> +
>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>> +   redundant_extensions vector.  */
>>> +
>>> +static void
>>> +find_redundant_extensions (void)
>>> +{
>>> +  rtx insn, dest, inner;
>>> +  int ix;
>>> +  bool found = false;
>>> +  int preserved_size;
>>> +
>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>> +      {
>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>> +
>>> +	if (dump_file)
>>> +	  fprintf (dump_file,
>>> +		   "found redundant extension %u with preserved size %d"
>>> +		   " defining reg %d\n",
>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>> +	found = true;
>>> +      }
>>> +
>>> +  if (dump_file && found)
>>> +    fprintf (dump_file, "\n");
>>> +}
>>> +
>>> +/* Reset promotion of subregs or REG.  */
>>> +
>>> +static void
>>> +reset_promoted_subreg (rtx reg)
>>> +{
>>> +  int ix;
>>> +  rtx subreg;
>>> +
>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>> +    return;
>>> +
>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>> +    {
>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>> +    }
>>> +
>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>> +}
>>> +
>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>> +   writes to DEST.  */
>>> +
>>> +static void
>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>> +{
>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>> +
>>> +  /* Check whether replacement is needed.  */
>>> +  if (dest != inner)
>>> +    {
>>> +      start_sequence ();
>>> +
>>> +      /* Determine the proper replacement operation.  */
>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>> +	{
>>> +	  cp_src = inner;
>>> +	  cp_dest = dest;
>>> +	}
>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>> +	{
>>> +	  emit_clobber (dest);
>>> +	  cp_src = inner;
>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>> +	}
>>> +      else
>>> +	{
>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>> +	  cp_dest = dest;
>>> +	}
>>> +
>>> +      emit_move_insn (cp_dest, cp_src);
>>> +
>>> +      seq = get_insns ();
>>> +      end_sequence ();
>>> +
>>> +      /* If the replacement is not supported, bail out.  */
>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>> +	  return;
>>> +
>>> +      /* Insert the replacement.  */
>>> +      emit_insn_before (seq, insn);
>>> +    }
>>> +
>>> +  /* Note replacement/removal in the dump.  */
>>> +  if (dump_file)
>>> +    {
>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>> +      if (dest != inner)
>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>> +      else
>>> +	fprintf (dump_file, "removed\n");
>>> +    }
>>> +
>>> +  /* Remove the extension.  */
>>> +  delete_insn (insn);
>>> +
>>> +  reset_promoted_subreg (dest);
>>> +}
>>> +
>>> +/* Setup the variables at the start of the pass.  */
>>> +
>>> +static void
>>> +init_pass (void)
>>> +{
>>> +  int i;
>>> +
>>> +  biggest_use = XNEWVEC (int, n_regs);
>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>> +  propagated_size = XNEWVEC (int, n_regs);
>>> +
>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>> +  for (i = 0; i < n_regs; i++)
>>> +    {
>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>> +      propagated_size[i] = NONE;
>>> +    }
>>> +
>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>> +
>>> +  wl = VEC_alloc (int, heap, 50);
>>> +  in_wl = XNEWVEC (bool, n_regs);
>>> +
>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    {
>>> +      uses[i] = NULL;
>>> +      props[i] = NULL;
>>> +      in_wl[i] = false;
>>> +    }
>>> +}
>>> +
>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>> +
>>> +static void
>>> +remove_redundant_extensions (void)
>>> +{
>>> +  rtx insn, dest, inner;
>>> +  int preserved_size;
>>> +  int ix;
>>> +
>>> +  if (!find_extensions ())
>>> +    return;
>>> +
>>> +  calculate_biggest_use ();
>>> +
>>> +  find_redundant_extensions ();
>>> +
>>> +  if (!VEC_empty (rtx, extensions))
>>> +    {
>>> +      propagate ();
>>> +
>>> +      find_redundant_extensions ();
>>> +    }
>>> +
>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>> +
>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>> +    {
>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>> +    }
>>> +
>>> +  if (dump_file)
>>> +    fprintf (dump_file, "\n");
>>> +}
>>> +
>>> +/* Free the variables at the end of the pass.  */
>>> +
>>> +static void
>>> +finish_pass (void)
>>> +{
>>> +  int i;
>>> +
>>> +  XDELETEVEC (propagated_size);
>>> +
>>> +  VEC_free (rtx, heap, extensions);
>>> +  VEC_free (rtx, heap, redundant_extensions);
>>> +
>>> +  VEC_free (int, heap, wl);
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    {
>>> +      if (uses[i] != NULL)
>>> +	VEC_free (use_type, heap, uses[i]);
>>> +
>>> +      if (props[i] != NULL)
>>> +	VEC_free (prop_type, heap, props[i]);
>>> +    }
>>> +
>>> +  XDELETEVEC (uses);
>>> +  XDELETEVEC (props);
>>> +  XDELETEVEC (biggest_use);
>>> +
>>> +  for (i = 0; i < n_regs; ++i)
>>> +    if (promoted_subreg[i] != NULL)
>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>> +  XDELETEVEC (promoted_subreg);
>>> +}
>>> +
>>> +/* Remove redundant extensions.  */
>>> +
>>> +static unsigned int
>>> +rest_of_handle_ee (void)
>>> +{
>>> +  n_regs = max_reg_num ();
>>> +
>>> +  init_pass ();
>>> +  remove_redundant_extensions ();
>>> +  finish_pass ();
>>> +  return 0;
>>> +}
>>> +
>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>> +
>>> +static bool
>>> +gate_handle_ee (void)
>>> +{
>>> +  return (optimize > 0 && flag_ee);
>>> +}
>>> +
>>> +struct rtl_opt_pass pass_ee =
>>> +{
>>> + {
>>> +  RTL_PASS,
>>> +  "ee",                                 /* name */
>>> +  gate_handle_ee,                       /* gate */
>>> +  rest_of_handle_ee,                    /* execute */
>>> +  NULL,                                 /* sub */
>>> +  NULL,                                 /* next */
>>> +  0,                                    /* static_pass_number */
>>> +  TV_EE,                                /* tv_id */
>>> +  0,                                    /* properties_required */
>>> +  0,                                    /* properties_provided */
>>> +  0,                                    /* properties_destroyed */
>>> +  0,                                    /* todo_flags_start */
>>> +  TODO_ggc_collect |
>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>> + }
>>> +};
>>> Index: gcc/common.opt
>>> ===================================================================
>>> --- gcc/common.opt (revision 189409)
>>> +++ gcc/common.opt (working copy)
>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>> Perform DWARF2 duplicate elimination
>>>
>>> +fextension-elimination
>>> +Common Report Var(flag_ee) Init(0) Optimization
>>> +Perform extension elimination
>>> +
>>> fipa-sra
>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>> Perform interprocedural reduction of aggregates
>>> Index: gcc/Makefile.in
>>> ===================================================================
>>> --- gcc/Makefile.in (revision 189409)
>>> +++ gcc/Makefile.in (working copy)
>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>> 	dwarf2asm.o \
>>> 	dwarf2cfi.o \
>>> 	dwarf2out.o \
>>> +	ee.o \
>>> 	ebitmap.o \
>>> 	emit-rtl.o \
>>> 	et-forest.o \
>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>     $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>     intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>     $(DF_H) $(CFGLOOP_H)
>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>> +   $(PARAMS_H) $(CGRAPH_H)
>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>     $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>     $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>> Index: gcc/passes.c
>>> ===================================================================
>>> --- gcc/passes.c (revision 189409)
>>> +++ gcc/passes.c (working copy)
>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>        NEXT_PASS (pass_initialize_regs);
>>>        NEXT_PASS (pass_ud_rtl_dce);
>>>        NEXT_PASS (pass_combine);
>>> +      NEXT_PASS (pass_ee);
>>>        NEXT_PASS (pass_if_after_combine);
>>>        NEXT_PASS (pass_partition_blocks);
>>>        NEXT_PASS (pass_regmove);
>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-12  9:21                   ` Tom de Vries
@ 2012-07-12 12:05                     ` Kenneth Zadeck
  2012-07-13  7:54                       ` Tom de Vries
  0 siblings, 1 reply; 43+ messages in thread
From: Kenneth Zadeck @ 2012-07-12 12:05 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

sorry about the two messages. i mis spelled the gcc-patches on the first 
try.

you are on the right track with the example but combine will not get 
this unless everything is in the same bb.
the whole point of having a separate pass for doing extension 
elimination is that it needs to be done over the entire function.

my example is also a little more complex because, since we are talking 
about induction vars, you have an initial assignment outside of a loop, 
and increment inside the loop and the test you describe at the bottom of 
the loop.

I would point out that with respect to speed optimizations, the case i 
am describing is in fact very important because getting code out of 
loops is were the important gains are.   I believe that the ppc has a 
some significant performance issues because of this kind of thing.

kenny


On 07/12/2012 05:20 AM, Tom de Vries wrote:
> Kenneth,
>
> I see I replied to your original message that had the wrong CC, I'm now CC-ing
> gcc-patches@gcc.gnu.org.
>
> Thanks,
> - Tom
>
> On 12/07/12 11:05, Tom de Vries wrote:
>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>> Tom,
>>>
>>> I have a problem with the approach that you have taken here.   I believe
>>> that this could be a very useful addition to gcc so I am in general very
>>> supportive, but i think you are missing an important case.
>>>
>>> My problem is that it the pass does not actually look at the target and
>>> make any decisions based on that target.
>>>
>>> for instance, we have a llp64 target.   As with many targets, the target
>>> has a rich set of compare and branch instructions.  In particular, it
>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>> upstream optimizations that take int (SI mode) index variables generate
>>> extension operations before doing 64 bit compare and branch
>>> instructions, even though there are 32 bit comparison and branches on
>>> the machine.     There are a lot of machines that can do more than one
>>> size of comparison.
>>>
>> 	 This optimization pass, as it is currently written will not remove those
>>> extensions because it believes that the length of the destination is the
>>> "final answer" unless it is wrapped in an explicit truncation.
>>> Instead it needs to ask the port if there is a shorted compare and
>>> branch instruction that does not cost more. in that case, those
>>> instructions should be rewritten to use the shorted compare and branch.
>>>
>>> There are many operations other than compare and branch where the pass
>>> should be asking "can i shorten the target for free and therefore get
>>> rid of the extension?"
>> Kenneth,
>>
>> I'm not sure I understand the optimization you're talking about, in particular
>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>> is the same.
>>
>> Assuming it's the same, my understanding is that you're talking about an example
>> like this:
>> ...
>>    (insn (set (reg:DI 5)
>>               (zero_extend:DI (reg:SI 4))))
>>
>>    (jump_insn (set (pc)
>>                    (if_then_else (eq (reg:DI 5)
>>                                      (const_int 0))
>>                                  (label_ref:DI 62)
>>                                  (pc))))
>>
>>    ->
>>
>>    (jump_insn (set (pc)
>>                    (if_then_else (eq (reg:SI 4)
>>                                      (const_int 0))
>>                                  (label_ref:DI 62)
>>                                  (pc))))
>>
>> ...
>> I would expect combine to optimize this.
>>
>> In case I got the example all backwards or it is a too simple one, please
>> provide an rtl example that illustrates the optimization.
>>
>> Thanks,
>> - Tom
>>
>>
>>>   right shifts, rotates, and stores are not in
>>> this class, but left shifts are as are all comparisons, compare and
>>> branches, conditional moves.   There may even be machines that have this
>>> for divide, but i do not know of any off the top of my head.
>>>
>>> What i am suggesting moves this pass into the target specific set of
>>> optimizations rather than target independent set, but at where this pass
>>> is to be put this is completely appropriate.    Any dest instruction
>>> where all of the operands have been extended should be checked to see if
>>> it was really necessary to use the longer form before doing the
>>> propagation pass.
>>>
>>> kenny
>>>
>>>
>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>> I profiled the pass on spec2000:
>>>>>>
>>>>>>                      -mabi=32     -mabi=64
>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>> total   (usr time):   919.30       879.26
>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>
>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>
>>>>>> Is it necessary to improve the runtime of this pass?
>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>> currently written, yours doesn't.
>>>>>
>>>> Eric,
>>>>
>>>> I've done the following:
>>>> - refactored the pass such that it now scans at most twice over all
>>>>    instructions.
>>>> - updated the patch to be applicable to current trunk
>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>    this thread), and added that one as test-case.
>>>> - added a part in the header comment illustrating the working of the pass
>>>>    on the motivating example.
>>>>
>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>
>>>> build and reg-tested on mips, mips64, and arm.
>>>>
>>>> OK for trunk?
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>
>>>> 	* ee.c: New file.
>>>> 	* tree-pass.h (pass_ee): Declare.
>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>> 	* timevar.def (TV_EE): New timevar.
>>>> 	* common.opt (fextension-elimination): New option.
>>>> 	* Makefile.in (ee.o): New rule.
>>>> 	* passes.c (pass_ee): Add it.
>>>>
>>>> 	* gcc.dg/extend-1.c: New test.
>>>> 	* gcc.dg/extend-2.c: Same.
>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>> 	* gcc.dg/extend-3.c: Same.
>>>> 	* gcc.dg/extend-4.c: Same.
>>>> 	* gcc.dg/extend-5.c: Same.
>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>> Index: gcc/tree-pass.h
>>>> ===================================================================
>>>> --- gcc/tree-pass.h (revision 189409)
>>>> +++ gcc/tree-pass.h (working copy)
>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>
>>>> extern struct rtl_opt_pass pass_expand;
>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>> +extern struct rtl_opt_pass pass_ee;
>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>> extern struct rtl_opt_pass pass_jump;
>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>> ===================================================================
>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>> @@ -5,19 +5,19 @@
>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>
>>>> -NOMIPS16 int
>>>> -f (int n, int i)
>>>> +NOMIPS16 long int
>>>> +f (long int n, long int i)
>>>> {
>>>> -  int s = 0;
>>>> +  long int s = 0;
>>>>     for (; i & 1; i++)
>>>>       s += i;
>>>>     return s;
>>>> }
>>>>
>>>> -NOMIPS16 int
>>>> -g (int n, int i)
>>>> +NOMIPS16 long int
>>>> +g (long int n, long int i)
>>>> {
>>>> -  int s = 0;
>>>> +  long int s = 0;
>>>>     for (i = 0; i < n; i++)
>>>>       s += i;
>>>>     return s;
>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>> @@ -0,0 +1,16 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +unsigned char f(unsigned int a, int c)
>>>> +{
>>>> +  unsigned int b = a;
>>>> +  if (c)
>>>> +    b = a & 0x10ff;
>>>> +  return b;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>> +{
>>>> +  if (c)
>>>> +    *z = 0;
>>>> +  *p ^= (unsigned char)s;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +
>>>> +void f (short d[2][2])
>>>> +{
>>>> +  int d0 = d[0][0] + d[0][1];
>>>> +  int d1 = d[1][0] + d[1][1];
>>>> +  d[0][0] = d0 + d1;
>>>> +      d[0][1] = d0 - d1;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>> @@ -0,0 +1,20 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>> +/* { dg-require-effective-target ilp32 } */
>>>> +
>>>> +void f(unsigned char * p, short *s, int c)
>>>> +{
>>>> +  short or = 0;
>>>> +  while (c)
>>>> +    {
>>>> +      or = or | s[c];
>>>> +      c --;
>>>> +    }
>>>> +  *p = (unsigned char)or;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>> @@ -0,0 +1,20 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>> +/* { dg-require-effective-target mips64 } */
>>>> +
>>>> +void f(unsigned char * p, short *s, int c)
>>>> +{
>>>> +  short or = 0;
>>>> +  while (c)
>>>> +    {
>>>> +      or = or | s[c];
>>>> +      c --;
>>>> +    }
>>>> +  *p = (unsigned char)or;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>> @@ -0,0 +1,13 @@
>>>> +/* { dg-do compile } */
>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>> +/* { dg-require-effective-target mips64 } */
>>>> +
>>>> +unsigned int f(unsigned char byte)
>>>> +{
>>>> +  return byte << 25;
>>>> +}
>>>> +
>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>> +
>>>> Index: gcc/opts.c
>>>> ===================================================================
>>>> --- gcc/opts.c (revision 189409)
>>>> +++ gcc/opts.c (working copy)
>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>       { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>       { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>       { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>
>>>>       /* -O3 optimizations.  */
>>>>       { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>> Index: gcc/timevar.def
>>>> ===================================================================
>>>> --- gcc/timevar.def (revision 189409)
>>>> +++ gcc/timevar.def (working copy)
>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>> Index: gcc/ee.c
>>>> ===================================================================
>>>> --- /dev/null (new file)
>>>> +++ gcc/ee.c (revision 0)
>>>> @@ -0,0 +1,1190 @@
>>>> +/* Redundant extension elimination.
>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>> +
>>>> +This file is part of GCC.
>>>> +
>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>> +the terms of the GNU General Public License as published by the Free
>>>> +Software Foundation; either version 3, or (at your option) any later
>>>> +version.
>>>> +
>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>> +for more details.
>>>> +
>>>> +You should have received a copy of the GNU General Public License
>>>> +along with GCC; see the file COPYING3.  If not see
>>>> +<http://www.gnu.org/licenses/>.  */
>>>> +
>>>> +/*
>>>> +
>>>> +  MOTIVATING EXAMPLE
>>>> +
>>>> +  The motivating example for this pass is the example from PR 40893:
>>>> +
>>>> +    void f (short d[2][2])
>>>> +    {
>>>> +      int d0 = d[0][0] + d[0][1];
>>>> +      int d1 = d[1][0] + d[1][1];
>>>> +      d[0][0] = d0 + d1;
>>>> +      d[0][1] = d0 - d1;
>>>> +    }
>>>> +
>>>> +  For MIPS, compilation results in the following insns.
>>>> +
>>>> +    (set (reg:SI 204)
>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>> +
>>>> +    (set (reg:SI 205)
>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>> +
>>>> +    (set (reg:SI 217)
>>>> +         (plus:SI (reg:SI 205)
>>>> +                  (reg:SI 204)))
>>>> +
>>>> +    (set (reg:SI 218)
>>>> +         (minus:SI (reg:SI 204)
>>>> +                   (reg:SI 205)))
>>>> +
>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>> +         (subreg:HI (reg:SI 217) 2))
>>>> +
>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>> +                 (const_int 2 [0x2])))
>>>> +         (subreg:HI (reg:SI 218) 2))
>>>> +
>>>> +
>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>> +  204 and 205, so the zero_extends are redundant.
>>>> +
>>>> +
>>>> +  INTENDED EFFECT
>>>> +
>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>> +  than the extension.
>>>> +
>>>> +
>>>> +  IMPLEMENTATION
>>>> +
>>>> +  The pass scans at most two times over all instructions.
>>>> +
>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>> +  done.
>>>> +
>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>> +  operands of the insns defining the pseudo.
>>>> +
>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>> +  of each reg, which allows us to find redundant extensions.
>>>> +
>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>> +  which we may find more redundant extensions.
>>>> +
>>>> +  Finally, redundant extensions are deleted or replaced.
>>>> +
>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>> +  of a paradoxical subreg instead.
>>>> +
>>>> +
>>>> +  ILLUSTRATION OF PASS
>>>> +
>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>> +
>>>> +  We find the 2 extensions:
>>>> +    found extension with preserved size 16 defining reg 204
>>>> +    found extension with preserved size 16 defining reg 205
>>>> +
>>>> +  We calculate the biggests uses of a register:
>>>> +    biggest_use
>>>> +    reg 204: size 32
>>>> +    reg 205: size 32
>>>> +    reg 217: size 16
>>>> +    reg 218: size 16
>>>> +
>>>> +  We propagate the biggest uses where possible:
>>>> +    propagations
>>>> +    205: 32 -> 16
>>>> +    204: 32 -> 16
>>>> +    214: 32 -> 16
>>>> +    215: 32 -> 16
>>>> +
>>>> +  We conclude that the extensions are redundant:
>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>> +
>>>> +  And we replace them with regcopies:
>>>> +    (set (reg:SI 204)
>>>> +        (reg:SI 213))
>>>> +
>>>> +    (set (reg:SI 205)
>>>> +        (reg:SI 216))
>>>> +
>>>> +
>>>> +  LIMITATIONS
>>>> +
>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>> +  done.
>>>> +
>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>> +  account, also the ones that are not uses of the extension.
>>>> +  The consideration is that using use-def chains will give a more precise
>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>> +
>>>> +#include "config.h"
>>>> +#include "system.h"
>>>> +#include "coretypes.h"
>>>> +#include "tm.h"
>>>> +#include "rtl.h"
>>>> +#include "tree.h"
>>>> +#include "tm_p.h"
>>>> +#include "flags.h"
>>>> +#include "regs.h"
>>>> +#include "hard-reg-set.h"
>>>> +#include "basic-block.h"
>>>> +#include "insn-config.h"
>>>> +#include "function.h"
>>>> +#include "expr.h"
>>>> +#include "insn-attr.h"
>>>> +#include "recog.h"
>>>> +#include "toplev.h"
>>>> +#include "target.h"
>>>> +#include "timevar.h"
>>>> +#include "optabs.h"
>>>> +#include "insn-codes.h"
>>>> +#include "rtlhooks-def.h"
>>>> +#include "output.h"
>>>> +#include "params.h"
>>>> +#include "timevar.h"
>>>> +#include "tree-pass.h"
>>>> +#include "cgraph.h"
>>>> +#include "vec.h"
>>>> +
>>>> +#define SKIP_REG (-1)
>>>> +#define NONE (-1)
>>>> +
>>>> +/* Number of registers at start of pass.  */
>>>> +
>>>> +static int n_regs;
>>>> +
>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>> +
>>>> +static int *biggest_use;
>>>> +
>>>> +/* Array to register the promoted subregs.  */
>>>> +
>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>> +
>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>> +
>>>> +static int *propagated_size;
>>>> +
>>>> +typedef struct use
>>>> +{
>>>> +  int regno;
>>>> +  int size;
>>>> +  int offset;
>>>> +  rtx *use;
>>>> +} use_type;
>>>> +
>>>> +DEF_VEC_O(use_type);
>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>> +
>>>> +/* Vector to register the uses.  */
>>>> +
>>>> +static VEC (use_type,heap) **uses;
>>>> +
>>>> +typedef struct prop
>>>> +{
>>>> +  rtx set;
>>>> +  int uses_regno;
>>>> +  int uses_index;
>>>> +} prop_type;
>>>> +
>>>> +DEF_VEC_O(prop_type);
>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>> +
>>>> +/* Vector to register the propagations.  */
>>>> +
>>>> +static VEC (prop_type,heap) **props;
>>>> +
>>>> +/* Work list for propragation.  */
>>>> +
>>>> +static VEC (int,heap) *wl;
>>>> +
>>>> +/* Array to register what regs are in the work list.  */
>>>> +
>>>> +static bool *in_wl;
>>>> +
>>>> +/* Vector that contains the extensions in the function.  */
>>>> +
>>>> +static VEC (rtx,heap) *extensions;
>>>> +
>>>> +/* Vector that contains the extensions in the function that are going to be
>>>> +   removed or replaced.  */
>>>> +
>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>> +
>>>> +/* Forward declaration.  */
>>>> +
>>>> +static void note_use (rtx *x, void *data);
>>>> +static bool skip_reg_p (int regno);
>>>> +static void register_prop (rtx set, use_type *use);
>>>> +
>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>> +
>>>> +static bool
>>>> +promoted_subreg_p (rtx subreg)
>>>> +{
>>>> +  return (GET_CODE (subreg) == SUBREG
>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>> +}
>>>> +
>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>> +   promotion.  */
>>>> +
>>>> +static bool
>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>> +{
>>>> +  int mre;
>>>> +
>>>> +  if (!promoted_subreg_p (subreg))
>>>> +    return false;
>>>> +
>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>> +  return mre != UNKNOWN;
>>>> +}
>>>> +
>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>> +   OFFSET.  Return true if successful.  */
>>>> +
>>>> +static bool
>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>> +{
>>>> +  rtx reg;
>>>> +
>>>> +  if (REG_P (use))
>>>> +    {
>>>> +      *regno = REGNO (use);
>>>> +      *offset = 0;
>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>> +      return true;
>>>> +    }
>>>> +  else if (GET_CODE (use) == SUBREG)
>>>> +    {
>>>> +      reg = SUBREG_REG (use);
>>>> +
>>>> +      if (!REG_P (reg))
>>>> +	return false;
>>>> +
>>>> +      *regno = REGNO (reg);
>>>> +
>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>> +	{
>>>> +	  *offset = 0;
>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  *offset = subreg_lsb (use);
>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>> +	}
>>>> +
>>>> +      return true;
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>> +
>>>> +static use_type *
>>>> +new_use (unsigned int regno)
>>>> +{
>>>> +  if (uses[regno] == NULL)
>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>> +
>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>> +
>>>> +  return VEC_last (use_type, uses[regno]);
>>>> +}
>>>> +
>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>> +
>>>> +static use_type *
>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>> +{
>>>> +  int *current;
>>>> +  use_type *p;
>>>> +
>>>> +  gcc_assert (size >= 0);
>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>> +
>>>> +  if (skip_reg_p (regno))
>>>> +    return NULL;
>>>> +
>>>> +  p = new_use (regno);
>>>> +  p->regno = regno;
>>>> +  p->size = size;
>>>> +  p->offset = offset;
>>>> +  p->use = use;
>>>> +
>>>> +  /* Update the bigest use.  */
>>>> +  current = &biggest_use[regno];
>>>> +  *current = MAX (*current, size);
>>>> +
>>>> +  return p;
>>>> +}
>>>> +
>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>> +
>>>> +static void
>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>> +{
>>>> +  const char *format_ptr;
>>>> +  int i, j;
>>>> +
>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>> +    if (format_ptr[i] == 'e')
>>>> +      note_use (&XEXP (use, i), pattern);
>>>> +    else if (format_ptr[i] == 'E')
>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>> +}
>>>> +
>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>> +
>>>> +static rtx
>>>> +get_set (rtx use, rtx pattern)
>>>> +{
>>>> +  rtx sub;
>>>> +  int i;
>>>> +
>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>> +    return pattern;
>>>> +
>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>> +      {
>>>> +	sub = XVECEXP (pattern, 0, i);
>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>> +	  return sub;
>>>> +      }
>>>> +
>>>> +  return NULL_RTX;
>>>> +}
>>>> +
>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>> +   an operand influences only the same bit or more significant bits in the
>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>> +
>>>> +static void
>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>> +{
>>>> +  unsigned int i, smallest;
>>>> +  int operand_size[2];
>>>> +  int operand_offset[2];
>>>> +  int used_size;
>>>> +  unsigned int operand_regno[2];
>>>> +  bool operand_reg[2];
>>>> +  bool operand_ignore[2];
>>>> +  use_type *p;
>>>> +
>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>> +  for (i = 0; i < nr_operands; ++i)
>>>> +    {
>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>> +      operand_ignore[i] = false;
>>>> +    }
>>>> +
>>>> +  /* Handle case of reg and-masked with const.  */
>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>> +    {
>>>> +      used_size =
>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>> +    }
>>>> +
>>>> +  /* Handle case of reg or-masked with const.  */
>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>> +    {
>>>> +      used_size =
>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>> +    }
>>>> +
>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>> +    for (i = 0; i < nr_operands; ++i)
>>>> +      operand_ignore[i] = (operand_reg[i]
>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>> +
>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>> +      && operand_size[0] != operand_size[1])
>>>> +    {
>>>> +      smallest = operand_size[0] > operand_size[1];
>>>> +
>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>> +    }
>>>> +
>>>> +  /* Register the operand use, if necessary.  */
>>>> +  for (i = 0; i < nr_operands; ++i)
>>>> +    if (!operand_reg[i])
>>>> +      note_use (&XEXP (use, i), pattern);
>>>> +    else if (!operand_ignore[i])
>>>> +      {
>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>> +			  &XEXP (use, i));
>>>> +	register_prop (set, p);
>>>> +      }
>>>> +}
>>>> +
>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>> +
>>>> +static void
>>>> +register_promoted_subreg (rtx subreg)
>>>> +{
>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>> +
>>>> +  if (promoted_subreg[index] == NULL)
>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>> +
>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>> +}
>>>> +
>>>> +/* Note promoted subregs in X.  */
>>>> +
>>>> +static int
>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>> +{
>>>> +  rtx subreg = *x;
>>>> +
>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>> +    register_promoted_subreg (subreg);
>>>> +
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>> +
>>>> +static void
>>>> +note_use (rtx *x, void *data)
>>>> +{
>>>> +  rtx use = *x;
>>>> +  rtx pattern = (rtx)data;
>>>> +  int use_size, use_offset;
>>>> +  unsigned int use_regno;
>>>> +  rtx set;
>>>> +  use_type *p;
>>>> +
>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>> +
>>>> +  set = get_set (use, pattern);
>>>> +
>>>> +  switch (GET_CODE (use))
>>>> +    {
>>>> +    case REG:
>>>> +    case SUBREG:
>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>> +      register_prop (set, p);
>>>> +      return;
>>>> +    case SIGN_EXTEND:
>>>> +    case ZERO_EXTEND:
>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>> +      register_prop (set, p);
>>>> +      return;
>>>> +    case IOR:
>>>> +    case AND:
>>>> +    case XOR:
>>>> +    case PLUS:
>>>> +    case MINUS:
>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>> +      return;
>>>> +    case NOT:
>>>> +    case NEG:
>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>> +      return;
>>>> +    case ASHIFT:
>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>> +	{
>>>> +	  note_embedded_uses (use, pattern);
>>>> +	  return;
>>>> +	}
>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>> +			  use_offset, x);
>>>> +      return;
>>>> +    default:
>>>> +      note_embedded_uses (use, pattern);
>>>> +      return;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>> +
>>>> +static bool
>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>> +{
>>>> +#ifdef EPILOGUE_USES
>>>> +  if (EPILOGUE_USES (regno))
>>>> +    return true;
>>>> +#endif
>>>> +
>>>> +#ifdef EH_USES
>>>> +  if (EH_USES (regno))
>>>> +    return true;
>>>> +#endif
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>> +
>>>> +static bool
>>>> +skip_reg_p (int regno)
>>>> +{
>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>> +     We don't handle that properly.  */
>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>> +}
>>>> +
>>>> +/* Note the uses of argument registers in call INSN.  */
>>>> +
>>>> +static void
>>>> +note_call_uses (rtx insn)
>>>> +{
>>>> +  rtx link, link_expr;
>>>> +
>>>> +  if (!CALL_P (insn))
>>>> +    return;
>>>> +
>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>> +    {
>>>> +      link_expr = XEXP (link, 0);
>>>> +
>>>> +      if (GET_CODE (link_expr) == USE)
>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Dump the biggest uses found.  */
>>>> +
>>>> +static void
>>>> +dump_biggest_use (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  if (!dump_file)
>>>> +    return;
>>>> +
>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>> +
>>>> +  for (i = 0; i < n_regs; i++)
>>>> +    if (biggest_use[i] > 0)
>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>> +
>>>> +  fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Calculate the biggest use mode for all regs.  */
>>>> +
>>>> +static void
>>>> +calculate_biggest_use (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx insn;
>>>> +
>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>> +  FOR_EACH_BB (bb)
>>>> +    FOR_BB_INSNS (bb, insn)
>>>> +      {
>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>> +	  continue;
>>>> +
>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>> +
>>>> +	if (CALL_P (insn))
>>>> +	  note_call_uses (insn);
>>>> +      }
>>>> +
>>>> +  dump_biggest_use ();
>>>> +}
>>>> +
>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>> +
>>>> +static void
>>>> +register_prop (rtx set, use_type *use)
>>>> +{
>>>> +  prop_type *p;
>>>> +  int regno;
>>>> +
>>>> +  if (set == NULL_RTX || use == NULL)
>>>> +    return;
>>>> +
>>>> +  if (!REG_P (SET_DEST (set)))
>>>> +    return;
>>>> +
>>>> +  regno = REGNO (SET_DEST (set));
>>>> +
>>>> +  if (skip_reg_p (regno))
>>>> +    return;
>>>> +
>>>> +  if (props[regno] == NULL)
>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>> +
>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>> +  p = VEC_last (prop_type, props[regno]);
>>>> +  p->set = set;
>>>> +  p->uses_regno = use->regno;
>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>> +}
>>>> +
>>>> +/* Add REGNO to the worklist.  */
>>>> +
>>>> +static void
>>>> +add_to_wl (int regno)
>>>> +{
>>>> +  if (in_wl[regno])
>>>> +    return;
>>>> +
>>>> +  if (biggest_use[regno] > 0
>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>> +    return;
>>>> +
>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>> +    return;
>>>> +
>>>> +  if (propagated_size[regno] != NONE
>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>> +    return;
>>>> +
>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>> +  in_wl[regno] = true;
>>>> +}
>>>> +
>>>> +/* Pop a reg from the worklist and return it.  */
>>>> +
>>>> +static int
>>>> +pop_wl (void)
>>>> +{
>>>> +  int regno = VEC_pop (int, wl);
>>>> +  in_wl[regno] = false;
>>>> +  return regno;
>>>> +}
>>>> +
>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>> +
>>>> +static int
>>>> +propagate_size (int dest_size, use_type *p)
>>>> +{
>>>> +  if (dest_size == 0)
>>>> +    return 0;
>>>> +
>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>> +}
>>>> +
>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>> +
>>>> +static int
>>>> +get_biggest_use (unsigned int regno)
>>>> +{
>>>> +  int ix;
>>>> +  use_type *p;
>>>> +  int max = 0;
>>>> +
>>>> +  gcc_assert (uses[regno] != NULL);
>>>> +
>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>> +    max = MAX (max, p->size);
>>>> +
>>>> +  return max;
>>>> +}
>>>> +
>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>> +
>>>> +static void
>>>> +propagate_to_use (int dest_size, use_type *use)
>>>> +{
>>>> +  int new_use_size;
>>>> +  int prev_biggest_use;
>>>> +  int *current;
>>>> +
>>>> +  new_use_size = propagate_size (dest_size, use);
>>>> +
>>>> +  if (new_use_size >= use->size)
>>>> +    return;
>>>> +
>>>> +  use->size = new_use_size;
>>>> +
>>>> +  current = &biggest_use[use->regno];
>>>> +
>>>> +  prev_biggest_use = *current;
>>>> +  *current = get_biggest_use (use->regno);
>>>> +
>>>> +  if (*current >= prev_biggest_use)
>>>> +    return;
>>>> +
>>>> +  add_to_wl (use->regno);
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>> +	     *current);
>>>> +
>>>> +}
>>>> +
>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>> +   propagations in NR_PROPAGATIONS.  */
>>>> +
>>>> +static void
>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>> +{
>>>> +  int ix;
>>>> +  prop_type *p;
>>>> +
>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>> +
>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>> +    {
>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>> +      propagate_to_use (biggest_use[regno], use);
>>>> +      ++(*nr_propagations);
>>>> +    }
>>>> +
>>>> +  propagated_size[regno] = biggest_use[regno];
>>>> +}
>>>> +
>>>> +/* Improve biggest_use array iteratively.  */
>>>> +
>>>> +static void
>>>> +propagate (void)
>>>> +{
>>>> +  int i;
>>>> +  int nr_propagations = 0;
>>>> +
>>>> +  /* Initialize work list.  */
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    add_to_wl (i);
>>>> +
>>>> +  /* Work the work list.  */
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "propagations: \n");
>>>> +  while (!VEC_empty (int, wl))
>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>> +}
>>>> +
>>>> +/* Check whether this is a sign/zero extension.  */
>>>> +
>>>> +static bool
>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>> +{
>>>> +  rtx src, op0;
>>>> +
>>>> +  /* Detect set of reg.  */
>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>> +    return false;
>>>> +
>>>> +  src = SET_SRC (PATTERN (insn));
>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>> +
>>>> +  if (!REG_P (*dest))
>>>> +    return false;
>>>> +
>>>> +  /* Detect sign or zero extension.  */
>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>> +    {
>>>> +      op0 = XEXP (src, 0);
>>>> +
>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>> +      if (GET_CODE (src) == AND)
>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>> +      else
>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>> +
>>>> +      if (GET_CODE (op0) == SUBREG)
>>>> +	{
>>>> +	  if (subreg_lsb (op0) != 0)
>>>> +	    return false;
>>>> +
>>>> +	  *inner = SUBREG_REG (op0);
>>>> +
>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>> +	    return false;
>>>> +
>>>> +	  return true;
>>>> +	}
>>>> +      else if (REG_P (op0))
>>>> +	{
>>>> +	  *inner = op0;
>>>> +
>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>> +	    return false;
>>>> +
>>>> +	  return true;
>>>> +	}
>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>> +	{
>>>> +	  *inner = XEXP (op0, 0);
>>>> +	  return true;
>>>> +	}
>>>> +    }
>>>> +
>>>> +  return false;
>>>> +}
>>>> +
>>>> +/* Find extensions and store them in the extensions vector.  */
>>>> +
>>>> +static bool
>>>> +find_extensions (void)
>>>> +{
>>>> +  basic_block bb;
>>>> +  rtx insn, dest, inner;
>>>> +  int preserved_size;
>>>> +
>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>> +  FOR_EACH_BB (bb)
>>>> +    FOR_BB_INSNS (bb, insn)
>>>> +      {
>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>> +	  continue;
>>>> +
>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>> +	  continue;
>>>> +
>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>> +
>>>> +	if (dump_file)
>>>> +	  fprintf (dump_file,
>>>> +		   "found extension %u with preserved size %d defining"
>>>> +		   " reg %d\n",
>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>> +      }
>>>> +
>>>> +  if (dump_file)
>>>> +    {
>>>> +      if (!VEC_empty (rtx, extensions))
>>>> +	fprintf (dump_file, "\n");
>>>> +      else
>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>> +    }
>>>> +
>>>> +  return !VEC_empty (rtx, extensions);
>>>> +}
>>>> +
>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>> +
>>>> +static bool
>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>> +{
>>>> +  int biggest_dest_use;
>>>> +
>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>> +    gcc_unreachable ();
>>>> +
>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>> +
>>>> +  if (biggest_dest_use == SKIP_REG)
>>>> +    return false;
>>>> +
>>>> +  if (*preserved_size < biggest_dest_use)
>>>> +    return false;
>>>> +
>>>> +  return true;
>>>> +}
>>>> +
>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>> +   redundant_extensions vector.  */
>>>> +
>>>> +static void
>>>> +find_redundant_extensions (void)
>>>> +{
>>>> +  rtx insn, dest, inner;
>>>> +  int ix;
>>>> +  bool found = false;
>>>> +  int preserved_size;
>>>> +
>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>> +      {
>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>> +
>>>> +	if (dump_file)
>>>> +	  fprintf (dump_file,
>>>> +		   "found redundant extension %u with preserved size %d"
>>>> +		   " defining reg %d\n",
>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>> +	found = true;
>>>> +      }
>>>> +
>>>> +  if (dump_file && found)
>>>> +    fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Reset promotion of subregs or REG.  */
>>>> +
>>>> +static void
>>>> +reset_promoted_subreg (rtx reg)
>>>> +{
>>>> +  int ix;
>>>> +  rtx subreg;
>>>> +
>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>> +    return;
>>>> +
>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>> +    {
>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>> +    }
>>>> +
>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>> +}
>>>> +
>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>> +   writes to DEST.  */
>>>> +
>>>> +static void
>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>> +{
>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>> +
>>>> +  /* Check whether replacement is needed.  */
>>>> +  if (dest != inner)
>>>> +    {
>>>> +      start_sequence ();
>>>> +
>>>> +      /* Determine the proper replacement operation.  */
>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>> +	{
>>>> +	  cp_src = inner;
>>>> +	  cp_dest = dest;
>>>> +	}
>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>> +	{
>>>> +	  emit_clobber (dest);
>>>> +	  cp_src = inner;
>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>> +	}
>>>> +      else
>>>> +	{
>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>> +	  cp_dest = dest;
>>>> +	}
>>>> +
>>>> +      emit_move_insn (cp_dest, cp_src);
>>>> +
>>>> +      seq = get_insns ();
>>>> +      end_sequence ();
>>>> +
>>>> +      /* If the replacement is not supported, bail out.  */
>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>> +	  return;
>>>> +
>>>> +      /* Insert the replacement.  */
>>>> +      emit_insn_before (seq, insn);
>>>> +    }
>>>> +
>>>> +  /* Note replacement/removal in the dump.  */
>>>> +  if (dump_file)
>>>> +    {
>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>> +      if (dest != inner)
>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>> +      else
>>>> +	fprintf (dump_file, "removed\n");
>>>> +    }
>>>> +
>>>> +  /* Remove the extension.  */
>>>> +  delete_insn (insn);
>>>> +
>>>> +  reset_promoted_subreg (dest);
>>>> +}
>>>> +
>>>> +/* Setup the variables at the start of the pass.  */
>>>> +
>>>> +static void
>>>> +init_pass (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>> +
>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>> +  for (i = 0; i < n_regs; i++)
>>>> +    {
>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>> +      propagated_size[i] = NONE;
>>>> +    }
>>>> +
>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>> +
>>>> +  wl = VEC_alloc (int, heap, 50);
>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>> +
>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    {
>>>> +      uses[i] = NULL;
>>>> +      props[i] = NULL;
>>>> +      in_wl[i] = false;
>>>> +    }
>>>> +}
>>>> +
>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>> +
>>>> +static void
>>>> +remove_redundant_extensions (void)
>>>> +{
>>>> +  rtx insn, dest, inner;
>>>> +  int preserved_size;
>>>> +  int ix;
>>>> +
>>>> +  if (!find_extensions ())
>>>> +    return;
>>>> +
>>>> +  calculate_biggest_use ();
>>>> +
>>>> +  find_redundant_extensions ();
>>>> +
>>>> +  if (!VEC_empty (rtx, extensions))
>>>> +    {
>>>> +      propagate ();
>>>> +
>>>> +      find_redundant_extensions ();
>>>> +    }
>>>> +
>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>> +
>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>> +    {
>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>> +    }
>>>> +
>>>> +  if (dump_file)
>>>> +    fprintf (dump_file, "\n");
>>>> +}
>>>> +
>>>> +/* Free the variables at the end of the pass.  */
>>>> +
>>>> +static void
>>>> +finish_pass (void)
>>>> +{
>>>> +  int i;
>>>> +
>>>> +  XDELETEVEC (propagated_size);
>>>> +
>>>> +  VEC_free (rtx, heap, extensions);
>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>> +
>>>> +  VEC_free (int, heap, wl);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    {
>>>> +      if (uses[i] != NULL)
>>>> +	VEC_free (use_type, heap, uses[i]);
>>>> +
>>>> +      if (props[i] != NULL)
>>>> +	VEC_free (prop_type, heap, props[i]);
>>>> +    }
>>>> +
>>>> +  XDELETEVEC (uses);
>>>> +  XDELETEVEC (props);
>>>> +  XDELETEVEC (biggest_use);
>>>> +
>>>> +  for (i = 0; i < n_regs; ++i)
>>>> +    if (promoted_subreg[i] != NULL)
>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>> +  XDELETEVEC (promoted_subreg);
>>>> +}
>>>> +
>>>> +/* Remove redundant extensions.  */
>>>> +
>>>> +static unsigned int
>>>> +rest_of_handle_ee (void)
>>>> +{
>>>> +  n_regs = max_reg_num ();
>>>> +
>>>> +  init_pass ();
>>>> +  remove_redundant_extensions ();
>>>> +  finish_pass ();
>>>> +  return 0;
>>>> +}
>>>> +
>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>> +
>>>> +static bool
>>>> +gate_handle_ee (void)
>>>> +{
>>>> +  return (optimize > 0 && flag_ee);
>>>> +}
>>>> +
>>>> +struct rtl_opt_pass pass_ee =
>>>> +{
>>>> + {
>>>> +  RTL_PASS,
>>>> +  "ee",                                 /* name */
>>>> +  gate_handle_ee,                       /* gate */
>>>> +  rest_of_handle_ee,                    /* execute */
>>>> +  NULL,                                 /* sub */
>>>> +  NULL,                                 /* next */
>>>> +  0,                                    /* static_pass_number */
>>>> +  TV_EE,                                /* tv_id */
>>>> +  0,                                    /* properties_required */
>>>> +  0,                                    /* properties_provided */
>>>> +  0,                                    /* properties_destroyed */
>>>> +  0,                                    /* todo_flags_start */
>>>> +  TODO_ggc_collect |
>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>> + }
>>>> +};
>>>> Index: gcc/common.opt
>>>> ===================================================================
>>>> --- gcc/common.opt (revision 189409)
>>>> +++ gcc/common.opt (working copy)
>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>> Perform DWARF2 duplicate elimination
>>>>
>>>> +fextension-elimination
>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>> +Perform extension elimination
>>>> +
>>>> fipa-sra
>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>> Perform interprocedural reduction of aggregates
>>>> Index: gcc/Makefile.in
>>>> ===================================================================
>>>> --- gcc/Makefile.in (revision 189409)
>>>> +++ gcc/Makefile.in (working copy)
>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>> 	dwarf2asm.o \
>>>> 	dwarf2cfi.o \
>>>> 	dwarf2out.o \
>>>> +	ee.o \
>>>> 	ebitmap.o \
>>>> 	emit-rtl.o \
>>>> 	et-forest.o \
>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>      $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>      intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>      $(DF_H) $(CFGLOOP_H)
>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>      $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>      $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>> Index: gcc/passes.c
>>>> ===================================================================
>>>> --- gcc/passes.c (revision 189409)
>>>> +++ gcc/passes.c (working copy)
>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>         NEXT_PASS (pass_initialize_regs);
>>>>         NEXT_PASS (pass_ud_rtl_dce);
>>>>         NEXT_PASS (pass_combine);
>>>> +      NEXT_PASS (pass_ee);
>>>>         NEXT_PASS (pass_if_after_combine);
>>>>         NEXT_PASS (pass_partition_blocks);
>>>>         NEXT_PASS (pass_regmove);
>>>
>>
>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-12 12:05                     ` Kenneth Zadeck
@ 2012-07-13  7:54                       ` Tom de Vries
  2012-07-13 11:39                         ` Kenneth Zadeck
  2012-07-17 15:17                         ` Kenneth Zadeck
  0 siblings, 2 replies; 43+ messages in thread
From: Tom de Vries @ 2012-07-13  7:54 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

On 12/07/12 14:04, Kenneth Zadeck wrote:
> you are on the right track with the example but combine will not get 
> this unless everything is in the same bb.
> the whole point of having a separate pass for doing extension 
> elimination is that it needs to be done over the entire function.
> 

There is a pass_ree, which does inter-bb combine targeted at extensions.
However, that pass is currently limited to combining extensions with the
definitions of the register it extends. The way your example sounds, you want
the reverse, where extensions are combined with all their uses.
I would say pass_ree is the natural place to add this and handle the example you
describe.

Thanks,
- Tom

> my example is also a little more complex because, since we are talking 
> about induction vars, you have an initial assignment outside of a loop, 
> and increment inside the loop and the test you describe at the bottom of 
> the loop.
> 
> I would point out that with respect to speed optimizations, the case i 
> am describing is in fact very important because getting code out of 
> loops is were the important gains are.   I believe that the ppc has a 
> some significant performance issues because of this kind of thing.
> 
> kenny
> 
> 
> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>> On 12/07/12 11:05, Tom de Vries wrote:
>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>> Tom,
>>>>
>>>> I have a problem with the approach that you have taken here.   I believe
>>>> that this could be a very useful addition to gcc so I am in general very
>>>> supportive, but i think you are missing an important case.
>>>>
>>>> My problem is that it the pass does not actually look at the target and
>>>> make any decisions based on that target.
>>>>
>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>> has a rich set of compare and branch instructions.  In particular, it
>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>> upstream optimizations that take int (SI mode) index variables generate
>>>> extension operations before doing 64 bit compare and branch
>>>> instructions, even though there are 32 bit comparison and branches on
>>>> the machine.     There are a lot of machines that can do more than one
>>>> size of comparison.
>>>>
>>> 	 This optimization pass, as it is currently written will not remove those
>>>> extensions because it believes that the length of the destination is the
>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>> Instead it needs to ask the port if there is a shorted compare and
>>>> branch instruction that does not cost more. in that case, those
>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>
>>>> There are many operations other than compare and branch where the pass
>>>> should be asking "can i shorten the target for free and therefore get
>>>> rid of the extension?"
>>> Kenneth,
>>>
>>> I'm not sure I understand the optimization you're talking about, in particular
>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>> is the same.
>>>
>>> Assuming it's the same, my understanding is that you're talking about an example
>>> like this:
>>> ...
>>>    (insn (set (reg:DI 5)
>>>               (zero_extend:DI (reg:SI 4))))
>>>
>>>    (jump_insn (set (pc)
>>>                    (if_then_else (eq (reg:DI 5)
>>>                                      (const_int 0))
>>>                                  (label_ref:DI 62)
>>>                                  (pc))))
>>>
>>>    ->
>>>
>>>    (jump_insn (set (pc)
>>>                    (if_then_else (eq (reg:SI 4)
>>>                                      (const_int 0))
>>>                                  (label_ref:DI 62)
>>>                                  (pc))))
>>>
>>> ...
>>> I would expect combine to optimize this.
>>>
>>> In case I got the example all backwards or it is a too simple one, please
>>> provide an rtl example that illustrates the optimization.
>>>
>>> Thanks,
>>> - Tom
>>>
>>>
>>>>   right shifts, rotates, and stores are not in
>>>> this class, but left shifts are as are all comparisons, compare and
>>>> branches, conditional moves.   There may even be machines that have this
>>>> for divide, but i do not know of any off the top of my head.
>>>>
>>>> What i am suggesting moves this pass into the target specific set of
>>>> optimizations rather than target independent set, but at where this pass
>>>> is to be put this is completely appropriate.    Any dest instruction
>>>> where all of the operands have been extended should be checked to see if
>>>> it was really necessary to use the longer form before doing the
>>>> propagation pass.
>>>>
>>>> kenny
>>>>
>>>>
>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>> I profiled the pass on spec2000:
>>>>>>>
>>>>>>>                      -mabi=32     -mabi=64
>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>> total   (usr time):   919.30       879.26
>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>
>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>
>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>> currently written, yours doesn't.
>>>>>>
>>>>> Eric,
>>>>>
>>>>> I've done the following:
>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>    instructions.
>>>>> - updated the patch to be applicable to current trunk
>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>    this thread), and added that one as test-case.
>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>    on the motivating example.
>>>>>
>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>
>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>
>>>>> OK for trunk?
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>
>>>>> 	* ee.c: New file.
>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>> 	* common.opt (fextension-elimination): New option.
>>>>> 	* Makefile.in (ee.o): New rule.
>>>>> 	* passes.c (pass_ee): Add it.
>>>>>
>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>> Index: gcc/tree-pass.h
>>>>> ===================================================================
>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>> +++ gcc/tree-pass.h (working copy)
>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>
>>>>> extern struct rtl_opt_pass pass_expand;
>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>> extern struct rtl_opt_pass pass_jump;
>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>> ===================================================================
>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>> @@ -5,19 +5,19 @@
>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>
>>>>> -NOMIPS16 int
>>>>> -f (int n, int i)
>>>>> +NOMIPS16 long int
>>>>> +f (long int n, long int i)
>>>>> {
>>>>> -  int s = 0;
>>>>> +  long int s = 0;
>>>>>     for (; i & 1; i++)
>>>>>       s += i;
>>>>>     return s;
>>>>> }
>>>>>
>>>>> -NOMIPS16 int
>>>>> -g (int n, int i)
>>>>> +NOMIPS16 long int
>>>>> +g (long int n, long int i)
>>>>> {
>>>>> -  int s = 0;
>>>>> +  long int s = 0;
>>>>>     for (i = 0; i < n; i++)
>>>>>       s += i;
>>>>>     return s;
>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>> @@ -0,0 +1,16 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +
>>>>> +unsigned char f(unsigned int a, int c)
>>>>> +{
>>>>> +  unsigned int b = a;
>>>>> +  if (c)
>>>>> +    b = a & 0x10ff;
>>>>> +  return b;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>> @@ -0,0 +1,13 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +
>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>> +{
>>>>> +  if (c)
>>>>> +    *z = 0;
>>>>> +  *p ^= (unsigned char)s;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>> @@ -0,0 +1,13 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +
>>>>> +void f (short d[2][2])
>>>>> +{
>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>> +  d[0][0] = d0 + d1;
>>>>> +      d[0][1] = d0 - d1;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>> @@ -0,0 +1,20 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>> +
>>>>> +void f(unsigned char * p, short *s, int c)
>>>>> +{
>>>>> +  short or = 0;
>>>>> +  while (c)
>>>>> +    {
>>>>> +      or = or | s[c];
>>>>> +      c --;
>>>>> +    }
>>>>> +  *p = (unsigned char)or;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>> @@ -0,0 +1,20 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>> +/* { dg-require-effective-target mips64 } */
>>>>> +
>>>>> +void f(unsigned char * p, short *s, int c)
>>>>> +{
>>>>> +  short or = 0;
>>>>> +  while (c)
>>>>> +    {
>>>>> +      or = or | s[c];
>>>>> +      c --;
>>>>> +    }
>>>>> +  *p = (unsigned char)or;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>> @@ -0,0 +1,13 @@
>>>>> +/* { dg-do compile } */
>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>> +/* { dg-require-effective-target mips64 } */
>>>>> +
>>>>> +unsigned int f(unsigned char byte)
>>>>> +{
>>>>> +  return byte << 25;
>>>>> +}
>>>>> +
>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>> +
>>>>> Index: gcc/opts.c
>>>>> ===================================================================
>>>>> --- gcc/opts.c (revision 189409)
>>>>> +++ gcc/opts.c (working copy)
>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>       { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>       { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>       { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>
>>>>>       /* -O3 optimizations.  */
>>>>>       { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>> Index: gcc/timevar.def
>>>>> ===================================================================
>>>>> --- gcc/timevar.def (revision 189409)
>>>>> +++ gcc/timevar.def (working copy)
>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>> Index: gcc/ee.c
>>>>> ===================================================================
>>>>> --- /dev/null (new file)
>>>>> +++ gcc/ee.c (revision 0)
>>>>> @@ -0,0 +1,1190 @@
>>>>> +/* Redundant extension elimination.
>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>> +
>>>>> +This file is part of GCC.
>>>>> +
>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>> +the terms of the GNU General Public License as published by the Free
>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>> +version.
>>>>> +
>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>> +for more details.
>>>>> +
>>>>> +You should have received a copy of the GNU General Public License
>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>> +
>>>>> +/*
>>>>> +
>>>>> +  MOTIVATING EXAMPLE
>>>>> +
>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>> +
>>>>> +    void f (short d[2][2])
>>>>> +    {
>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>> +      d[0][0] = d0 + d1;
>>>>> +      d[0][1] = d0 - d1;
>>>>> +    }
>>>>> +
>>>>> +  For MIPS, compilation results in the following insns.
>>>>> +
>>>>> +    (set (reg:SI 204)
>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>> +
>>>>> +    (set (reg:SI 205)
>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>> +
>>>>> +    (set (reg:SI 217)
>>>>> +         (plus:SI (reg:SI 205)
>>>>> +                  (reg:SI 204)))
>>>>> +
>>>>> +    (set (reg:SI 218)
>>>>> +         (minus:SI (reg:SI 204)
>>>>> +                   (reg:SI 205)))
>>>>> +
>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>> +
>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>> +                 (const_int 2 [0x2])))
>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>> +
>>>>> +
>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>> +
>>>>> +
>>>>> +  INTENDED EFFECT
>>>>> +
>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>> +  than the extension.
>>>>> +
>>>>> +
>>>>> +  IMPLEMENTATION
>>>>> +
>>>>> +  The pass scans at most two times over all instructions.
>>>>> +
>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>> +  done.
>>>>> +
>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>> +  operands of the insns defining the pseudo.
>>>>> +
>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>> +
>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>> +  which we may find more redundant extensions.
>>>>> +
>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>> +
>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>> +  of a paradoxical subreg instead.
>>>>> +
>>>>> +
>>>>> +  ILLUSTRATION OF PASS
>>>>> +
>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>> +
>>>>> +  We find the 2 extensions:
>>>>> +    found extension with preserved size 16 defining reg 204
>>>>> +    found extension with preserved size 16 defining reg 205
>>>>> +
>>>>> +  We calculate the biggests uses of a register:
>>>>> +    biggest_use
>>>>> +    reg 204: size 32
>>>>> +    reg 205: size 32
>>>>> +    reg 217: size 16
>>>>> +    reg 218: size 16
>>>>> +
>>>>> +  We propagate the biggest uses where possible:
>>>>> +    propagations
>>>>> +    205: 32 -> 16
>>>>> +    204: 32 -> 16
>>>>> +    214: 32 -> 16
>>>>> +    215: 32 -> 16
>>>>> +
>>>>> +  We conclude that the extensions are redundant:
>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>> +
>>>>> +  And we replace them with regcopies:
>>>>> +    (set (reg:SI 204)
>>>>> +        (reg:SI 213))
>>>>> +
>>>>> +    (set (reg:SI 205)
>>>>> +        (reg:SI 216))
>>>>> +
>>>>> +
>>>>> +  LIMITATIONS
>>>>> +
>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>> +  done.
>>>>> +
>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>> +  account, also the ones that are not uses of the extension.
>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>> +
>>>>> +#include "config.h"
>>>>> +#include "system.h"
>>>>> +#include "coretypes.h"
>>>>> +#include "tm.h"
>>>>> +#include "rtl.h"
>>>>> +#include "tree.h"
>>>>> +#include "tm_p.h"
>>>>> +#include "flags.h"
>>>>> +#include "regs.h"
>>>>> +#include "hard-reg-set.h"
>>>>> +#include "basic-block.h"
>>>>> +#include "insn-config.h"
>>>>> +#include "function.h"
>>>>> +#include "expr.h"
>>>>> +#include "insn-attr.h"
>>>>> +#include "recog.h"
>>>>> +#include "toplev.h"
>>>>> +#include "target.h"
>>>>> +#include "timevar.h"
>>>>> +#include "optabs.h"
>>>>> +#include "insn-codes.h"
>>>>> +#include "rtlhooks-def.h"
>>>>> +#include "output.h"
>>>>> +#include "params.h"
>>>>> +#include "timevar.h"
>>>>> +#include "tree-pass.h"
>>>>> +#include "cgraph.h"
>>>>> +#include "vec.h"
>>>>> +
>>>>> +#define SKIP_REG (-1)
>>>>> +#define NONE (-1)
>>>>> +
>>>>> +/* Number of registers at start of pass.  */
>>>>> +
>>>>> +static int n_regs;
>>>>> +
>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>> +
>>>>> +static int *biggest_use;
>>>>> +
>>>>> +/* Array to register the promoted subregs.  */
>>>>> +
>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>> +
>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>> +
>>>>> +static int *propagated_size;
>>>>> +
>>>>> +typedef struct use
>>>>> +{
>>>>> +  int regno;
>>>>> +  int size;
>>>>> +  int offset;
>>>>> +  rtx *use;
>>>>> +} use_type;
>>>>> +
>>>>> +DEF_VEC_O(use_type);
>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>> +
>>>>> +/* Vector to register the uses.  */
>>>>> +
>>>>> +static VEC (use_type,heap) **uses;
>>>>> +
>>>>> +typedef struct prop
>>>>> +{
>>>>> +  rtx set;
>>>>> +  int uses_regno;
>>>>> +  int uses_index;
>>>>> +} prop_type;
>>>>> +
>>>>> +DEF_VEC_O(prop_type);
>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>> +
>>>>> +/* Vector to register the propagations.  */
>>>>> +
>>>>> +static VEC (prop_type,heap) **props;
>>>>> +
>>>>> +/* Work list for propragation.  */
>>>>> +
>>>>> +static VEC (int,heap) *wl;
>>>>> +
>>>>> +/* Array to register what regs are in the work list.  */
>>>>> +
>>>>> +static bool *in_wl;
>>>>> +
>>>>> +/* Vector that contains the extensions in the function.  */
>>>>> +
>>>>> +static VEC (rtx,heap) *extensions;
>>>>> +
>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>> +   removed or replaced.  */
>>>>> +
>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>> +
>>>>> +/* Forward declaration.  */
>>>>> +
>>>>> +static void note_use (rtx *x, void *data);
>>>>> +static bool skip_reg_p (int regno);
>>>>> +static void register_prop (rtx set, use_type *use);
>>>>> +
>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>> +
>>>>> +static bool
>>>>> +promoted_subreg_p (rtx subreg)
>>>>> +{
>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>> +}
>>>>> +
>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>> +   promotion.  */
>>>>> +
>>>>> +static bool
>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>> +{
>>>>> +  int mre;
>>>>> +
>>>>> +  if (!promoted_subreg_p (subreg))
>>>>> +    return false;
>>>>> +
>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>> +  return mre != UNKNOWN;
>>>>> +}
>>>>> +
>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>> +   OFFSET.  Return true if successful.  */
>>>>> +
>>>>> +static bool
>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>> +{
>>>>> +  rtx reg;
>>>>> +
>>>>> +  if (REG_P (use))
>>>>> +    {
>>>>> +      *regno = REGNO (use);
>>>>> +      *offset = 0;
>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>> +      return true;
>>>>> +    }
>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>> +    {
>>>>> +      reg = SUBREG_REG (use);
>>>>> +
>>>>> +      if (!REG_P (reg))
>>>>> +	return false;
>>>>> +
>>>>> +      *regno = REGNO (reg);
>>>>> +
>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>> +	{
>>>>> +	  *offset = 0;
>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>> +	}
>>>>> +      else
>>>>> +	{
>>>>> +	  *offset = subreg_lsb (use);
>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>> +	}
>>>>> +
>>>>> +      return true;
>>>>> +    }
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>> +
>>>>> +static use_type *
>>>>> +new_use (unsigned int regno)
>>>>> +{
>>>>> +  if (uses[regno] == NULL)
>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>> +
>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>> +
>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>> +}
>>>>> +
>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>> +
>>>>> +static use_type *
>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>> +{
>>>>> +  int *current;
>>>>> +  use_type *p;
>>>>> +
>>>>> +  gcc_assert (size >= 0);
>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>> +
>>>>> +  if (skip_reg_p (regno))
>>>>> +    return NULL;
>>>>> +
>>>>> +  p = new_use (regno);
>>>>> +  p->regno = regno;
>>>>> +  p->size = size;
>>>>> +  p->offset = offset;
>>>>> +  p->use = use;
>>>>> +
>>>>> +  /* Update the bigest use.  */
>>>>> +  current = &biggest_use[regno];
>>>>> +  *current = MAX (*current, size);
>>>>> +
>>>>> +  return p;
>>>>> +}
>>>>> +
>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>> +
>>>>> +static void
>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>> +{
>>>>> +  const char *format_ptr;
>>>>> +  int i, j;
>>>>> +
>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>> +    if (format_ptr[i] == 'e')
>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>> +    else if (format_ptr[i] == 'E')
>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>> +}
>>>>> +
>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>> +
>>>>> +static rtx
>>>>> +get_set (rtx use, rtx pattern)
>>>>> +{
>>>>> +  rtx sub;
>>>>> +  int i;
>>>>> +
>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>> +    return pattern;
>>>>> +
>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>> +      {
>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>> +	  return sub;
>>>>> +      }
>>>>> +
>>>>> +  return NULL_RTX;
>>>>> +}
>>>>> +
>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>> +
>>>>> +static void
>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>> +{
>>>>> +  unsigned int i, smallest;
>>>>> +  int operand_size[2];
>>>>> +  int operand_offset[2];
>>>>> +  int used_size;
>>>>> +  unsigned int operand_regno[2];
>>>>> +  bool operand_reg[2];
>>>>> +  bool operand_ignore[2];
>>>>> +  use_type *p;
>>>>> +
>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>> +    {
>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>> +      operand_ignore[i] = false;
>>>>> +    }
>>>>> +
>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>> +    {
>>>>> +      used_size =
>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>> +    }
>>>>> +
>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>> +    {
>>>>> +      used_size =
>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>> +    }
>>>>> +
>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>> +
>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>> +      && operand_size[0] != operand_size[1])
>>>>> +    {
>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>> +
>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>> +    }
>>>>> +
>>>>> +  /* Register the operand use, if necessary.  */
>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>> +    if (!operand_reg[i])
>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>> +    else if (!operand_ignore[i])
>>>>> +      {
>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>> +			  &XEXP (use, i));
>>>>> +	register_prop (set, p);
>>>>> +      }
>>>>> +}
>>>>> +
>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>> +
>>>>> +static void
>>>>> +register_promoted_subreg (rtx subreg)
>>>>> +{
>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>> +
>>>>> +  if (promoted_subreg[index] == NULL)
>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>> +
>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>> +}
>>>>> +
>>>>> +/* Note promoted subregs in X.  */
>>>>> +
>>>>> +static int
>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> +  rtx subreg = *x;
>>>>> +
>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>> +    register_promoted_subreg (subreg);
>>>>> +
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>> +
>>>>> +static void
>>>>> +note_use (rtx *x, void *data)
>>>>> +{
>>>>> +  rtx use = *x;
>>>>> +  rtx pattern = (rtx)data;
>>>>> +  int use_size, use_offset;
>>>>> +  unsigned int use_regno;
>>>>> +  rtx set;
>>>>> +  use_type *p;
>>>>> +
>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>> +
>>>>> +  set = get_set (use, pattern);
>>>>> +
>>>>> +  switch (GET_CODE (use))
>>>>> +    {
>>>>> +    case REG:
>>>>> +    case SUBREG:
>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>> +	{
>>>>> +	  note_embedded_uses (use, pattern);
>>>>> +	  return;
>>>>> +	}
>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>> +      register_prop (set, p);
>>>>> +      return;
>>>>> +    case SIGN_EXTEND:
>>>>> +    case ZERO_EXTEND:
>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>> +	{
>>>>> +	  note_embedded_uses (use, pattern);
>>>>> +	  return;
>>>>> +	}
>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>> +      register_prop (set, p);
>>>>> +      return;
>>>>> +    case IOR:
>>>>> +    case AND:
>>>>> +    case XOR:
>>>>> +    case PLUS:
>>>>> +    case MINUS:
>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>> +      return;
>>>>> +    case NOT:
>>>>> +    case NEG:
>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>> +      return;
>>>>> +    case ASHIFT:
>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>> +	{
>>>>> +	  note_embedded_uses (use, pattern);
>>>>> +	  return;
>>>>> +	}
>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>> +			  use_offset, x);
>>>>> +      return;
>>>>> +    default:
>>>>> +      note_embedded_uses (use, pattern);
>>>>> +      return;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>> +
>>>>> +static bool
>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>> +{
>>>>> +#ifdef EPILOGUE_USES
>>>>> +  if (EPILOGUE_USES (regno))
>>>>> +    return true;
>>>>> +#endif
>>>>> +
>>>>> +#ifdef EH_USES
>>>>> +  if (EH_USES (regno))
>>>>> +    return true;
>>>>> +#endif
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>> +
>>>>> +static bool
>>>>> +skip_reg_p (int regno)
>>>>> +{
>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>> +     We don't handle that properly.  */
>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>> +}
>>>>> +
>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>> +
>>>>> +static void
>>>>> +note_call_uses (rtx insn)
>>>>> +{
>>>>> +  rtx link, link_expr;
>>>>> +
>>>>> +  if (!CALL_P (insn))
>>>>> +    return;
>>>>> +
>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>> +    {
>>>>> +      link_expr = XEXP (link, 0);
>>>>> +
>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Dump the biggest uses found.  */
>>>>> +
>>>>> +static void
>>>>> +dump_biggest_use (void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  if (!dump_file)
>>>>> +    return;
>>>>> +
>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>> +
>>>>> +  for (i = 0; i < n_regs; i++)
>>>>> +    if (biggest_use[i] > 0)
>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>> +
>>>>> +  fprintf (dump_file, "\n");
>>>>> +}
>>>>> +
>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>> +
>>>>> +static void
>>>>> +calculate_biggest_use (void)
>>>>> +{
>>>>> +  basic_block bb;
>>>>> +  rtx insn;
>>>>> +
>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>> +  FOR_EACH_BB (bb)
>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>> +      {
>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>> +	  continue;
>>>>> +
>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>> +
>>>>> +	if (CALL_P (insn))
>>>>> +	  note_call_uses (insn);
>>>>> +      }
>>>>> +
>>>>> +  dump_biggest_use ();
>>>>> +}
>>>>> +
>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>> +
>>>>> +static void
>>>>> +register_prop (rtx set, use_type *use)
>>>>> +{
>>>>> +  prop_type *p;
>>>>> +  int regno;
>>>>> +
>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>> +    return;
>>>>> +
>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>> +    return;
>>>>> +
>>>>> +  regno = REGNO (SET_DEST (set));
>>>>> +
>>>>> +  if (skip_reg_p (regno))
>>>>> +    return;
>>>>> +
>>>>> +  if (props[regno] == NULL)
>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>> +
>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>> +  p->set = set;
>>>>> +  p->uses_regno = use->regno;
>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>> +}
>>>>> +
>>>>> +/* Add REGNO to the worklist.  */
>>>>> +
>>>>> +static void
>>>>> +add_to_wl (int regno)
>>>>> +{
>>>>> +  if (in_wl[regno])
>>>>> +    return;
>>>>> +
>>>>> +  if (biggest_use[regno] > 0
>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>> +    return;
>>>>> +
>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>> +    return;
>>>>> +
>>>>> +  if (propagated_size[regno] != NONE
>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>> +    return;
>>>>> +
>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>> +  in_wl[regno] = true;
>>>>> +}
>>>>> +
>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>> +
>>>>> +static int
>>>>> +pop_wl (void)
>>>>> +{
>>>>> +  int regno = VEC_pop (int, wl);
>>>>> +  in_wl[regno] = false;
>>>>> +  return regno;
>>>>> +}
>>>>> +
>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>> +
>>>>> +static int
>>>>> +propagate_size (int dest_size, use_type *p)
>>>>> +{
>>>>> +  if (dest_size == 0)
>>>>> +    return 0;
>>>>> +
>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>> +}
>>>>> +
>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>> +
>>>>> +static int
>>>>> +get_biggest_use (unsigned int regno)
>>>>> +{
>>>>> +  int ix;
>>>>> +  use_type *p;
>>>>> +  int max = 0;
>>>>> +
>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>> +    max = MAX (max, p->size);
>>>>> +
>>>>> +  return max;
>>>>> +}
>>>>> +
>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>> +
>>>>> +static void
>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>> +{
>>>>> +  int new_use_size;
>>>>> +  int prev_biggest_use;
>>>>> +  int *current;
>>>>> +
>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>> +
>>>>> +  if (new_use_size >= use->size)
>>>>> +    return;
>>>>> +
>>>>> +  use->size = new_use_size;
>>>>> +
>>>>> +  current = &biggest_use[use->regno];
>>>>> +
>>>>> +  prev_biggest_use = *current;
>>>>> +  *current = get_biggest_use (use->regno);
>>>>> +
>>>>> +  if (*current >= prev_biggest_use)
>>>>> +    return;
>>>>> +
>>>>> +  add_to_wl (use->regno);
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>> +	     *current);
>>>>> +
>>>>> +}
>>>>> +
>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>> +
>>>>> +static void
>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>> +{
>>>>> +  int ix;
>>>>> +  prop_type *p;
>>>>> +
>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>> +    {
>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>> +      ++(*nr_propagations);
>>>>> +    }
>>>>> +
>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>> +}
>>>>> +
>>>>> +/* Improve biggest_use array iteratively.  */
>>>>> +
>>>>> +static void
>>>>> +propagate (void)
>>>>> +{
>>>>> +  int i;
>>>>> +  int nr_propagations = 0;
>>>>> +
>>>>> +  /* Initialize work list.  */
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    add_to_wl (i);
>>>>> +
>>>>> +  /* Work the work list.  */
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>> +  while (!VEC_empty (int, wl))
>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>> +}
>>>>> +
>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>> +
>>>>> +static bool
>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>> +{
>>>>> +  rtx src, op0;
>>>>> +
>>>>> +  /* Detect set of reg.  */
>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>> +    return false;
>>>>> +
>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>> +
>>>>> +  if (!REG_P (*dest))
>>>>> +    return false;
>>>>> +
>>>>> +  /* Detect sign or zero extension.  */
>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>> +    {
>>>>> +      op0 = XEXP (src, 0);
>>>>> +
>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>> +      if (GET_CODE (src) == AND)
>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>> +      else
>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>> +
>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>> +	{
>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>> +	    return false;
>>>>> +
>>>>> +	  *inner = SUBREG_REG (op0);
>>>>> +
>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>> +	    return false;
>>>>> +
>>>>> +	  return true;
>>>>> +	}
>>>>> +      else if (REG_P (op0))
>>>>> +	{
>>>>> +	  *inner = op0;
>>>>> +
>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>> +	    return false;
>>>>> +
>>>>> +	  return true;
>>>>> +	}
>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>> +	{
>>>>> +	  *inner = XEXP (op0, 0);
>>>>> +	  return true;
>>>>> +	}
>>>>> +    }
>>>>> +
>>>>> +  return false;
>>>>> +}
>>>>> +
>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>> +
>>>>> +static bool
>>>>> +find_extensions (void)
>>>>> +{
>>>>> +  basic_block bb;
>>>>> +  rtx insn, dest, inner;
>>>>> +  int preserved_size;
>>>>> +
>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>> +  FOR_EACH_BB (bb)
>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>> +      {
>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>> +	  continue;
>>>>> +
>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>> +	  continue;
>>>>> +
>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>> +
>>>>> +	if (dump_file)
>>>>> +	  fprintf (dump_file,
>>>>> +		   "found extension %u with preserved size %d defining"
>>>>> +		   " reg %d\n",
>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>> +      }
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    {
>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>> +	fprintf (dump_file, "\n");
>>>>> +      else
>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>> +    }
>>>>> +
>>>>> +  return !VEC_empty (rtx, extensions);
>>>>> +}
>>>>> +
>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>> +
>>>>> +static bool
>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>> +{
>>>>> +  int biggest_dest_use;
>>>>> +
>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>> +    gcc_unreachable ();
>>>>> +
>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>> +
>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>> +    return false;
>>>>> +
>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>> +    return false;
>>>>> +
>>>>> +  return true;
>>>>> +}
>>>>> +
>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>> +   redundant_extensions vector.  */
>>>>> +
>>>>> +static void
>>>>> +find_redundant_extensions (void)
>>>>> +{
>>>>> +  rtx insn, dest, inner;
>>>>> +  int ix;
>>>>> +  bool found = false;
>>>>> +  int preserved_size;
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>> +      {
>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>> +
>>>>> +	if (dump_file)
>>>>> +	  fprintf (dump_file,
>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>> +		   " defining reg %d\n",
>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>> +	found = true;
>>>>> +      }
>>>>> +
>>>>> +  if (dump_file && found)
>>>>> +    fprintf (dump_file, "\n");
>>>>> +}
>>>>> +
>>>>> +/* Reset promotion of subregs or REG.  */
>>>>> +
>>>>> +static void
>>>>> +reset_promoted_subreg (rtx reg)
>>>>> +{
>>>>> +  int ix;
>>>>> +  rtx subreg;
>>>>> +
>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>> +    return;
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>> +    {
>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>> +    }
>>>>> +
>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>> +}
>>>>> +
>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>> +   writes to DEST.  */
>>>>> +
>>>>> +static void
>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>> +{
>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>> +
>>>>> +  /* Check whether replacement is needed.  */
>>>>> +  if (dest != inner)
>>>>> +    {
>>>>> +      start_sequence ();
>>>>> +
>>>>> +      /* Determine the proper replacement operation.  */
>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>> +	{
>>>>> +	  cp_src = inner;
>>>>> +	  cp_dest = dest;
>>>>> +	}
>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>> +	{
>>>>> +	  emit_clobber (dest);
>>>>> +	  cp_src = inner;
>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>> +	}
>>>>> +      else
>>>>> +	{
>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>> +	  cp_dest = dest;
>>>>> +	}
>>>>> +
>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>> +
>>>>> +      seq = get_insns ();
>>>>> +      end_sequence ();
>>>>> +
>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>> +	  return;
>>>>> +
>>>>> +      /* Insert the replacement.  */
>>>>> +      emit_insn_before (seq, insn);
>>>>> +    }
>>>>> +
>>>>> +  /* Note replacement/removal in the dump.  */
>>>>> +  if (dump_file)
>>>>> +    {
>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>> +      if (dest != inner)
>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>> +      else
>>>>> +	fprintf (dump_file, "removed\n");
>>>>> +    }
>>>>> +
>>>>> +  /* Remove the extension.  */
>>>>> +  delete_insn (insn);
>>>>> +
>>>>> +  reset_promoted_subreg (dest);
>>>>> +}
>>>>> +
>>>>> +/* Setup the variables at the start of the pass.  */
>>>>> +
>>>>> +static void
>>>>> +init_pass (void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>> +
>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>> +  for (i = 0; i < n_regs; i++)
>>>>> +    {
>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>> +      propagated_size[i] = NONE;
>>>>> +    }
>>>>> +
>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>> +
>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>> +
>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    {
>>>>> +      uses[i] = NULL;
>>>>> +      props[i] = NULL;
>>>>> +      in_wl[i] = false;
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>> +
>>>>> +static void
>>>>> +remove_redundant_extensions (void)
>>>>> +{
>>>>> +  rtx insn, dest, inner;
>>>>> +  int preserved_size;
>>>>> +  int ix;
>>>>> +
>>>>> +  if (!find_extensions ())
>>>>> +    return;
>>>>> +
>>>>> +  calculate_biggest_use ();
>>>>> +
>>>>> +  find_redundant_extensions ();
>>>>> +
>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>> +    {
>>>>> +      propagate ();
>>>>> +
>>>>> +      find_redundant_extensions ();
>>>>> +    }
>>>>> +
>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>> +
>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>> +    {
>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>> +    }
>>>>> +
>>>>> +  if (dump_file)
>>>>> +    fprintf (dump_file, "\n");
>>>>> +}
>>>>> +
>>>>> +/* Free the variables at the end of the pass.  */
>>>>> +
>>>>> +static void
>>>>> +finish_pass (void)
>>>>> +{
>>>>> +  int i;
>>>>> +
>>>>> +  XDELETEVEC (propagated_size);
>>>>> +
>>>>> +  VEC_free (rtx, heap, extensions);
>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>> +
>>>>> +  VEC_free (int, heap, wl);
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    {
>>>>> +      if (uses[i] != NULL)
>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>> +
>>>>> +      if (props[i] != NULL)
>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>> +    }
>>>>> +
>>>>> +  XDELETEVEC (uses);
>>>>> +  XDELETEVEC (props);
>>>>> +  XDELETEVEC (biggest_use);
>>>>> +
>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>> +    if (promoted_subreg[i] != NULL)
>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>> +  XDELETEVEC (promoted_subreg);
>>>>> +}
>>>>> +
>>>>> +/* Remove redundant extensions.  */
>>>>> +
>>>>> +static unsigned int
>>>>> +rest_of_handle_ee (void)
>>>>> +{
>>>>> +  n_regs = max_reg_num ();
>>>>> +
>>>>> +  init_pass ();
>>>>> +  remove_redundant_extensions ();
>>>>> +  finish_pass ();
>>>>> +  return 0;
>>>>> +}
>>>>> +
>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>> +
>>>>> +static bool
>>>>> +gate_handle_ee (void)
>>>>> +{
>>>>> +  return (optimize > 0 && flag_ee);
>>>>> +}
>>>>> +
>>>>> +struct rtl_opt_pass pass_ee =
>>>>> +{
>>>>> + {
>>>>> +  RTL_PASS,
>>>>> +  "ee",                                 /* name */
>>>>> +  gate_handle_ee,                       /* gate */
>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>> +  NULL,                                 /* sub */
>>>>> +  NULL,                                 /* next */
>>>>> +  0,                                    /* static_pass_number */
>>>>> +  TV_EE,                                /* tv_id */
>>>>> +  0,                                    /* properties_required */
>>>>> +  0,                                    /* properties_provided */
>>>>> +  0,                                    /* properties_destroyed */
>>>>> +  0,                                    /* todo_flags_start */
>>>>> +  TODO_ggc_collect |
>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>> + }
>>>>> +};
>>>>> Index: gcc/common.opt
>>>>> ===================================================================
>>>>> --- gcc/common.opt (revision 189409)
>>>>> +++ gcc/common.opt (working copy)
>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>> Perform DWARF2 duplicate elimination
>>>>>
>>>>> +fextension-elimination
>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>> +Perform extension elimination
>>>>> +
>>>>> fipa-sra
>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>> Perform interprocedural reduction of aggregates
>>>>> Index: gcc/Makefile.in
>>>>> ===================================================================
>>>>> --- gcc/Makefile.in (revision 189409)
>>>>> +++ gcc/Makefile.in (working copy)
>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>> 	dwarf2asm.o \
>>>>> 	dwarf2cfi.o \
>>>>> 	dwarf2out.o \
>>>>> +	ee.o \
>>>>> 	ebitmap.o \
>>>>> 	emit-rtl.o \
>>>>> 	et-forest.o \
>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>      $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>      intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>      $(DF_H) $(CFGLOOP_H)
>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>      $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>      $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>> Index: gcc/passes.c
>>>>> ===================================================================
>>>>> --- gcc/passes.c (revision 189409)
>>>>> +++ gcc/passes.c (working copy)
>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>         NEXT_PASS (pass_initialize_regs);
>>>>>         NEXT_PASS (pass_ud_rtl_dce);
>>>>>         NEXT_PASS (pass_combine);
>>>>> +      NEXT_PASS (pass_ee);
>>>>>         NEXT_PASS (pass_if_after_combine);
>>>>>         NEXT_PASS (pass_partition_blocks);
>>>>>         NEXT_PASS (pass_regmove);
>>>>
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-13  7:54                       ` Tom de Vries
@ 2012-07-13 11:39                         ` Kenneth Zadeck
  2012-07-13 12:58                           ` Tom de Vries
  2012-07-17 15:17                         ` Kenneth Zadeck
  1 sibling, 1 reply; 43+ messages in thread
From: Kenneth Zadeck @ 2012-07-13 11:39 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

it really is not.

the problem is that sign extension removal is just a more difficult 
problem than what you are considering.  You have attacked a small part 
of the problem and have a good start but you really should consider the 
whole problem.

kenny


On 07/13/2012 03:53 AM, Tom de Vries wrote:
> On 12/07/12 14:04, Kenneth Zadeck wrote:
>> you are on the right track with the example but combine will not get
>> this unless everything is in the same bb.
>> the whole point of having a separate pass for doing extension
>> elimination is that it needs to be done over the entire function.
>>
> There is a pass_ree, which does inter-bb combine targeted at extensions.
> However, that pass is currently limited to combining extensions with the
> definitions of the register it extends. The way your example sounds, you want
> the reverse, where extensions are combined with all their uses.
> I would say pass_ree is the natural place to add this and handle the example you
> describe.
>
> Thanks,
> - Tom
>
>> my example is also a little more complex because, since we are talking
>> about induction vars, you have an initial assignment outside of a loop,
>> and increment inside the loop and the test you describe at the bottom of
>> the loop.
>>
>> I would point out that with respect to speed optimizations, the case i
>> am describing is in fact very important because getting code out of
>> loops is were the important gains are.   I believe that the ppc has a
>> some significant performance issues because of this kind of thing.
>>
>> kenny
>>
>>
>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>> Tom,
>>>>>
>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>> supportive, but i think you are missing an important case.
>>>>>
>>>>> My problem is that it the pass does not actually look at the target and
>>>>> make any decisions based on that target.
>>>>>
>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>> extension operations before doing 64 bit compare and branch
>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>> the machine.     There are a lot of machines that can do more than one
>>>>> size of comparison.
>>>>>
>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>> extensions because it believes that the length of the destination is the
>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>> branch instruction that does not cost more. in that case, those
>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>
>>>>> There are many operations other than compare and branch where the pass
>>>>> should be asking "can i shorten the target for free and therefore get
>>>>> rid of the extension?"
>>>> Kenneth,
>>>>
>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>> is the same.
>>>>
>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>> like this:
>>>> ...
>>>>     (insn (set (reg:DI 5)
>>>>                (zero_extend:DI (reg:SI 4))))
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:DI 5)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>>     ->
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:SI 4)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>> ...
>>>> I would expect combine to optimize this.
>>>>
>>>> In case I got the example all backwards or it is a too simple one, please
>>>> provide an rtl example that illustrates the optimization.
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>
>>>>>    right shifts, rotates, and stores are not in
>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>> branches, conditional moves.   There may even be machines that have this
>>>>> for divide, but i do not know of any off the top of my head.
>>>>>
>>>>> What i am suggesting moves this pass into the target specific set of
>>>>> optimizations rather than target independent set, but at where this pass
>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>> where all of the operands have been extended should be checked to see if
>>>>> it was really necessary to use the longer form before doing the
>>>>> propagation pass.
>>>>>
>>>>> kenny
>>>>>
>>>>>
>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>
>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>
>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>
>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>> currently written, yours doesn't.
>>>>>>>
>>>>>> Eric,
>>>>>>
>>>>>> I've done the following:
>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>     instructions.
>>>>>> - updated the patch to be applicable to current trunk
>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>     this thread), and added that one as test-case.
>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>     on the motivating example.
>>>>>>
>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>
>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>
>>>>>> OK for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>
>>>>>> 	* ee.c: New file.
>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>
>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>> Index: gcc/tree-pass.h
>>>>>> ===================================================================
>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>
>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>> @@ -5,19 +5,19 @@
>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -f (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +f (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (; i & 1; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> }
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -g (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +g (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (i = 0; i < n; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>> @@ -0,0 +1,16 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>> +{
>>>>>> +  unsigned int b = a;
>>>>>> +  if (c)
>>>>>> +    b = a & 0x10ff;
>>>>>> +  return b;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>> +{
>>>>>> +  if (c)
>>>>>> +    *z = 0;
>>>>>> +  *p ^= (unsigned char)s;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f (short d[2][2])
>>>>>> +{
>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>> +  d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +unsigned int f(unsigned char byte)
>>>>>> +{
>>>>>> +  return byte << 25;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/opts.c
>>>>>> ===================================================================
>>>>>> --- gcc/opts.c (revision 189409)
>>>>>> +++ gcc/opts.c (working copy)
>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>
>>>>>>        /* -O3 optimizations.  */
>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>> Index: gcc/timevar.def
>>>>>> ===================================================================
>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>> +++ gcc/timevar.def (working copy)
>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>> Index: gcc/ee.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/ee.c (revision 0)
>>>>>> @@ -0,0 +1,1190 @@
>>>>>> +/* Redundant extension elimination.
>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>> +
>>>>>> +This file is part of GCC.
>>>>>> +
>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>> +version.
>>>>>> +
>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>> +for more details.
>>>>>> +
>>>>>> +You should have received a copy of the GNU General Public License
>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>> +
>>>>>> +/*
>>>>>> +
>>>>>> +  MOTIVATING EXAMPLE
>>>>>> +
>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>> +
>>>>>> +    void f (short d[2][2])
>>>>>> +    {
>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>> +      d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +    }
>>>>>> +
>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>> +
>>>>>> +    (set (reg:SI 204)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 217)
>>>>>> +         (plus:SI (reg:SI 205)
>>>>>> +                  (reg:SI 204)))
>>>>>> +
>>>>>> +    (set (reg:SI 218)
>>>>>> +         (minus:SI (reg:SI 204)
>>>>>> +                   (reg:SI 205)))
>>>>>> +
>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>> +
>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>> +                 (const_int 2 [0x2])))
>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>> +
>>>>>> +
>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>> +
>>>>>> +
>>>>>> +  INTENDED EFFECT
>>>>>> +
>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>> +  than the extension.
>>>>>> +
>>>>>> +
>>>>>> +  IMPLEMENTATION
>>>>>> +
>>>>>> +  The pass scans at most two times over all instructions.
>>>>>> +
>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>> +  done.
>>>>>> +
>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>> +  operands of the insns defining the pseudo.
>>>>>> +
>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>> +
>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>> +  which we may find more redundant extensions.
>>>>>> +
>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>> +
>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>> +  of a paradoxical subreg instead.
>>>>>> +
>>>>>> +
>>>>>> +  ILLUSTRATION OF PASS
>>>>>> +
>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>> +
>>>>>> +  We find the 2 extensions:
>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>> +
>>>>>> +  We calculate the biggests uses of a register:
>>>>>> +    biggest_use
>>>>>> +    reg 204: size 32
>>>>>> +    reg 205: size 32
>>>>>> +    reg 217: size 16
>>>>>> +    reg 218: size 16
>>>>>> +
>>>>>> +  We propagate the biggest uses where possible:
>>>>>> +    propagations
>>>>>> +    205: 32 -> 16
>>>>>> +    204: 32 -> 16
>>>>>> +    214: 32 -> 16
>>>>>> +    215: 32 -> 16
>>>>>> +
>>>>>> +  We conclude that the extensions are redundant:
>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>> +
>>>>>> +  And we replace them with regcopies:
>>>>>> +    (set (reg:SI 204)
>>>>>> +        (reg:SI 213))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +        (reg:SI 216))
>>>>>> +
>>>>>> +
>>>>>> +  LIMITATIONS
>>>>>> +
>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>> +  done.
>>>>>> +
>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>> +
>>>>>> +#include "config.h"
>>>>>> +#include "system.h"
>>>>>> +#include "coretypes.h"
>>>>>> +#include "tm.h"
>>>>>> +#include "rtl.h"
>>>>>> +#include "tree.h"
>>>>>> +#include "tm_p.h"
>>>>>> +#include "flags.h"
>>>>>> +#include "regs.h"
>>>>>> +#include "hard-reg-set.h"
>>>>>> +#include "basic-block.h"
>>>>>> +#include "insn-config.h"
>>>>>> +#include "function.h"
>>>>>> +#include "expr.h"
>>>>>> +#include "insn-attr.h"
>>>>>> +#include "recog.h"
>>>>>> +#include "toplev.h"
>>>>>> +#include "target.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "optabs.h"
>>>>>> +#include "insn-codes.h"
>>>>>> +#include "rtlhooks-def.h"
>>>>>> +#include "output.h"
>>>>>> +#include "params.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "tree-pass.h"
>>>>>> +#include "cgraph.h"
>>>>>> +#include "vec.h"
>>>>>> +
>>>>>> +#define SKIP_REG (-1)
>>>>>> +#define NONE (-1)
>>>>>> +
>>>>>> +/* Number of registers at start of pass.  */
>>>>>> +
>>>>>> +static int n_regs;
>>>>>> +
>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>> +
>>>>>> +static int *biggest_use;
>>>>>> +
>>>>>> +/* Array to register the promoted subregs.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>> +
>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>> +
>>>>>> +static int *propagated_size;
>>>>>> +
>>>>>> +typedef struct use
>>>>>> +{
>>>>>> +  int regno;
>>>>>> +  int size;
>>>>>> +  int offset;
>>>>>> +  rtx *use;
>>>>>> +} use_type;
>>>>>> +
>>>>>> +DEF_VEC_O(use_type);
>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the uses.  */
>>>>>> +
>>>>>> +static VEC (use_type,heap) **uses;
>>>>>> +
>>>>>> +typedef struct prop
>>>>>> +{
>>>>>> +  rtx set;
>>>>>> +  int uses_regno;
>>>>>> +  int uses_index;
>>>>>> +} prop_type;
>>>>>> +
>>>>>> +DEF_VEC_O(prop_type);
>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the propagations.  */
>>>>>> +
>>>>>> +static VEC (prop_type,heap) **props;
>>>>>> +
>>>>>> +/* Work list for propragation.  */
>>>>>> +
>>>>>> +static VEC (int,heap) *wl;
>>>>>> +
>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>> +
>>>>>> +static bool *in_wl;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>> +   removed or replaced.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>> +
>>>>>> +/* Forward declaration.  */
>>>>>> +
>>>>>> +static void note_use (rtx *x, void *data);
>>>>>> +static bool skip_reg_p (int regno);
>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>> +   promotion.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  int mre;
>>>>>> +
>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>> +  return mre != UNKNOWN;
>>>>>> +}
>>>>>> +
>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>> +{
>>>>>> +  rtx reg;
>>>>>> +
>>>>>> +  if (REG_P (use))
>>>>>> +    {
>>>>>> +      *regno = REGNO (use);
>>>>>> +      *offset = 0;
>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>> +    {
>>>>>> +      reg = SUBREG_REG (use);
>>>>>> +
>>>>>> +      if (!REG_P (reg))
>>>>>> +	return false;
>>>>>> +
>>>>>> +      *regno = REGNO (reg);
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>> +	{
>>>>>> +	  *offset = 0;
>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  *offset = subreg_lsb (use);
>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +	}
>>>>>> +
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +new_use (unsigned int regno)
>>>>>> +{
>>>>>> +  if (uses[regno] == NULL)
>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>> +
>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>> +{
>>>>>> +  int *current;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  gcc_assert (size >= 0);
>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return NULL;
>>>>>> +
>>>>>> +  p = new_use (regno);
>>>>>> +  p->regno = regno;
>>>>>> +  p->size = size;
>>>>>> +  p->offset = offset;
>>>>>> +  p->use = use;
>>>>>> +
>>>>>> +  /* Update the bigest use.  */
>>>>>> +  current = &biggest_use[regno];
>>>>>> +  *current = MAX (*current, size);
>>>>>> +
>>>>>> +  return p;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  const char *format_ptr;
>>>>>> +  int i, j;
>>>>>> +
>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>> +    if (format_ptr[i] == 'e')
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>> +
>>>>>> +static rtx
>>>>>> +get_set (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  rtx sub;
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>> +    return pattern;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>> +      {
>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>> +	  return sub;
>>>>>> +      }
>>>>>> +
>>>>>> +  return NULL_RTX;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>> +{
>>>>>> +  unsigned int i, smallest;
>>>>>> +  int operand_size[2];
>>>>>> +  int operand_offset[2];
>>>>>> +  int used_size;
>>>>>> +  unsigned int operand_regno[2];
>>>>>> +  bool operand_reg[2];
>>>>>> +  bool operand_ignore[2];
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    {
>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>> +      operand_ignore[i] = false;
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>> +
>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>> +    {
>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    if (!operand_reg[i])
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (!operand_ignore[i])
>>>>>> +      {
>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>> +			  &XEXP (use, i));
>>>>>> +	register_prop (set, p);
>>>>>> +      }
>>>>>> +}
>>>>>> +
>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>> +{
>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>> +
>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note promoted subregs in X.  */
>>>>>> +
>>>>>> +static int
>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +  rtx subreg = *x;
>>>>>> +
>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>> +    register_promoted_subreg (subreg);
>>>>>> +
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_use (rtx *x, void *data)
>>>>>> +{
>>>>>> +  rtx use = *x;
>>>>>> +  rtx pattern = (rtx)data;
>>>>>> +  int use_size, use_offset;
>>>>>> +  unsigned int use_regno;
>>>>>> +  rtx set;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>> +
>>>>>> +  set = get_set (use, pattern);
>>>>>> +
>>>>>> +  switch (GET_CODE (use))
>>>>>> +    {
>>>>>> +    case REG:
>>>>>> +    case SUBREG:
>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case SIGN_EXTEND:
>>>>>> +    case ZERO_EXTEND:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case IOR:
>>>>>> +    case AND:
>>>>>> +    case XOR:
>>>>>> +    case PLUS:
>>>>>> +    case MINUS:
>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>> +      return;
>>>>>> +    case NOT:
>>>>>> +    case NEG:
>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>> +      return;
>>>>>> +    case ASHIFT:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>> +			  use_offset, x);
>>>>>> +      return;
>>>>>> +    default:
>>>>>> +      note_embedded_uses (use, pattern);
>>>>>> +      return;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +#ifdef EPILOGUE_USES
>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +#ifdef EH_USES
>>>>>> +  if (EH_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +skip_reg_p (int regno)
>>>>>> +{
>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>> +     We don't handle that properly.  */
>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_call_uses (rtx insn)
>>>>>> +{
>>>>>> +  rtx link, link_expr;
>>>>>> +
>>>>>> +  if (!CALL_P (insn))
>>>>>> +    return;
>>>>>> +
>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>> +    {
>>>>>> +      link_expr = XEXP (link, 0);
>>>>>> +
>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Dump the biggest uses found.  */
>>>>>> +
>>>>>> +static void
>>>>>> +dump_biggest_use (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (!dump_file)
>>>>>> +    return;
>>>>>> +
>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    if (biggest_use[i] > 0)
>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>> +
>>>>>> +  fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>> +
>>>>>> +static void
>>>>>> +calculate_biggest_use (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>> +
>>>>>> +	if (CALL_P (insn))
>>>>>> +	  note_call_uses (insn);
>>>>>> +      }
>>>>>> +
>>>>>> +  dump_biggest_use ();
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_prop (rtx set, use_type *use)
>>>>>> +{
>>>>>> +  prop_type *p;
>>>>>> +  int regno;
>>>>>> +
>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (props[regno] == NULL)
>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>> +  p->set = set;
>>>>>> +  p->uses_regno = use->regno;
>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>> +}
>>>>>> +
>>>>>> +/* Add REGNO to the worklist.  */
>>>>>> +
>>>>>> +static void
>>>>>> +add_to_wl (int regno)
>>>>>> +{
>>>>>> +  if (in_wl[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (biggest_use[regno] > 0
>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (propagated_size[regno] != NONE
>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>> +  in_wl[regno] = true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>> +
>>>>>> +static int
>>>>>> +pop_wl (void)
>>>>>> +{
>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>> +  in_wl[regno] = false;
>>>>>> +  return regno;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>> +
>>>>>> +static int
>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>> +{
>>>>>> +  if (dest_size == 0)
>>>>>> +    return 0;
>>>>>> +
>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>> +
>>>>>> +static int
>>>>>> +get_biggest_use (unsigned int regno)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  use_type *p;
>>>>>> +  int max = 0;
>>>>>> +
>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>> +    max = MAX (max, p->size);
>>>>>> +
>>>>>> +  return max;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>> +{
>>>>>> +  int new_use_size;
>>>>>> +  int prev_biggest_use;
>>>>>> +  int *current;
>>>>>> +
>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>> +
>>>>>> +  if (new_use_size >= use->size)
>>>>>> +    return;
>>>>>> +
>>>>>> +  use->size = new_use_size;
>>>>>> +
>>>>>> +  current = &biggest_use[use->regno];
>>>>>> +
>>>>>> +  prev_biggest_use = *current;
>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>> +
>>>>>> +  if (*current >= prev_biggest_use)
>>>>>> +    return;
>>>>>> +
>>>>>> +  add_to_wl (use->regno);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>> +	     *current);
>>>>>> +
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  prop_type *p;
>>>>>> +
>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>> +    {
>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>> +      ++(*nr_propagations);
>>>>>> +    }
>>>>>> +
>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>> +}
>>>>>> +
>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +  int nr_propagations = 0;
>>>>>> +
>>>>>> +  /* Initialize work list.  */
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    add_to_wl (i);
>>>>>> +
>>>>>> +  /* Work the work list.  */
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>> +  while (!VEC_empty (int, wl))
>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  rtx src, op0;
>>>>>> +
>>>>>> +  /* Detect set of reg.  */
>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>> +
>>>>>> +  if (!REG_P (*dest))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Detect sign or zero extension.  */
>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>> +    {
>>>>>> +      op0 = XEXP (src, 0);
>>>>>> +
>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>> +      if (GET_CODE (src) == AND)
>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>> +      else
>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>> +
>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>> +	{
>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (REG_P (op0))
>>>>>> +	{
>>>>>> +	  *inner = op0;
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>> +	{
>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +find_extensions (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>> +		   " reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>> +	fprintf (dump_file, "\n");
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  int biggest_dest_use;
>>>>>> +
>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>> +    gcc_unreachable ();
>>>>>> +
>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>> +
>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  return true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>> +   redundant_extensions vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +find_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int ix;
>>>>>> +  bool found = false;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +      {
>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>> +		   " defining reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +	found = true;
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file && found)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>> +
>>>>>> +static void
>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  rtx subreg;
>>>>>> +
>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>> +    {
>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>> +    }
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>> +   writes to DEST.  */
>>>>>> +
>>>>>> +static void
>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>> +{
>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>> +
>>>>>> +  /* Check whether replacement is needed.  */
>>>>>> +  if (dest != inner)
>>>>>> +    {
>>>>>> +      start_sequence ();
>>>>>> +
>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>> +	{
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>> +	{
>>>>>> +	  emit_clobber (dest);
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +
>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>> +
>>>>>> +      seq = get_insns ();
>>>>>> +      end_sequence ();
>>>>>> +
>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>> +	  return;
>>>>>> +
>>>>>> +      /* Insert the replacement.  */
>>>>>> +      emit_insn_before (seq, insn);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>> +      if (dest != inner)
>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Remove the extension.  */
>>>>>> +  delete_insn (insn);
>>>>>> +
>>>>>> +  reset_promoted_subreg (dest);
>>>>>> +}
>>>>>> +
>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +init_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>> +
>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    {
>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>> +      propagated_size[i] = NONE;
>>>>>> +    }
>>>>>> +
>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>> +
>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      uses[i] = NULL;
>>>>>> +      props[i] = NULL;
>>>>>> +      in_wl[i] = false;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>> +
>>>>>> +static void
>>>>>> +remove_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +  int ix;
>>>>>> +
>>>>>> +  if (!find_extensions ())
>>>>>> +    return;
>>>>>> +
>>>>>> +  calculate_biggest_use ();
>>>>>> +
>>>>>> +  find_redundant_extensions ();
>>>>>> +
>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>> +    {
>>>>>> +      propagate ();
>>>>>> +
>>>>>> +      find_redundant_extensions ();
>>>>>> +    }
>>>>>> +
>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>> +    {
>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>> +    }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +finish_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  XDELETEVEC (propagated_size);
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>> +
>>>>>> +  VEC_free (int, heap, wl);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      if (uses[i] != NULL)
>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>> +
>>>>>> +      if (props[i] != NULL)
>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>> +    }
>>>>>> +
>>>>>> +  XDELETEVEC (uses);
>>>>>> +  XDELETEVEC (props);
>>>>>> +  XDELETEVEC (biggest_use);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Remove redundant extensions.  */
>>>>>> +
>>>>>> +static unsigned int
>>>>>> +rest_of_handle_ee (void)
>>>>>> +{
>>>>>> +  n_regs = max_reg_num ();
>>>>>> +
>>>>>> +  init_pass ();
>>>>>> +  remove_redundant_extensions ();
>>>>>> +  finish_pass ();
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +gate_handle_ee (void)
>>>>>> +{
>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>> +}
>>>>>> +
>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>> +{
>>>>>> + {
>>>>>> +  RTL_PASS,
>>>>>> +  "ee",                                 /* name */
>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>> +  NULL,                                 /* sub */
>>>>>> +  NULL,                                 /* next */
>>>>>> +  0,                                    /* static_pass_number */
>>>>>> +  TV_EE,                                /* tv_id */
>>>>>> +  0,                                    /* properties_required */
>>>>>> +  0,                                    /* properties_provided */
>>>>>> +  0,                                    /* properties_destroyed */
>>>>>> +  0,                                    /* todo_flags_start */
>>>>>> +  TODO_ggc_collect |
>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>> + }
>>>>>> +};
>>>>>> Index: gcc/common.opt
>>>>>> ===================================================================
>>>>>> --- gcc/common.opt (revision 189409)
>>>>>> +++ gcc/common.opt (working copy)
>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>> Perform DWARF2 duplicate elimination
>>>>>>
>>>>>> +fextension-elimination
>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>> +Perform extension elimination
>>>>>> +
>>>>>> fipa-sra
>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>> Perform interprocedural reduction of aggregates
>>>>>> Index: gcc/Makefile.in
>>>>>> ===================================================================
>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>> 	dwarf2asm.o \
>>>>>> 	dwarf2cfi.o \
>>>>>> 	dwarf2out.o \
>>>>>> +	ee.o \
>>>>>> 	ebitmap.o \
>>>>>> 	emit-rtl.o \
>>>>>> 	et-forest.o \
>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>> Index: gcc/passes.c
>>>>>> ===================================================================
>>>>>> --- gcc/passes.c (revision 189409)
>>>>>> +++ gcc/passes.c (working copy)
>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>          NEXT_PASS (pass_combine);
>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>          NEXT_PASS (pass_regmove);
>>
>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-13 11:39                         ` Kenneth Zadeck
@ 2012-07-13 12:58                           ` Tom de Vries
  0 siblings, 0 replies; 43+ messages in thread
From: Tom de Vries @ 2012-07-13 12:58 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

On 13/07/12 13:38, Kenneth Zadeck wrote:
> it really is not.
> 

Kenneth,

I'm not quite sure I know what you mean by 'it', but I suppose you mean to say
that pass_ree is not the natural place to add the inter-bb combining of
extensions with all their uses.

Why do you think so?

> the problem is that sign extension removal is just a more difficult 
> problem than what you are considering.  You have attacked a small part 
> of the problem and have a good start but you really should consider the 
> whole problem.

Compilers are split up into passes that handle a well-defined part of the whole
problem of generating optimal and correct code. In my view, pass_ee does exactly
that.

The goal of the pass is not to remove extensions in all ways possible, but
focuses on the goal of detecting extensions that can be replaced by a regcopy,
and replace them.

If you have an example that shows a missing optimization where pass_ee could be
adapted or extended in a certain way to handle that, I would love to see that one.

Thanks,
- Tom

> 
> kenny
> 
> 
> On 07/13/2012 03:53 AM, Tom de Vries wrote:
>> On 12/07/12 14:04, Kenneth Zadeck wrote:
>>> you are on the right track with the example but combine will not get
>>> this unless everything is in the same bb.
>>> the whole point of having a separate pass for doing extension
>>> elimination is that it needs to be done over the entire function.
>>>
>> There is a pass_ree, which does inter-bb combine targeted at extensions.
>> However, that pass is currently limited to combining extensions with the
>> definitions of the register it extends. The way your example sounds, you want
>> the reverse, where extensions are combined with all their uses.
>> I would say pass_ree is the natural place to add this and handle the example you
>> describe.
>>
>> Thanks,
>> - Tom
>>
>>> my example is also a little more complex because, since we are talking
>>> about induction vars, you have an initial assignment outside of a loop,
>>> and increment inside the loop and the test you describe at the bottom of
>>> the loop.
>>>
>>> I would point out that with respect to speed optimizations, the case i
>>> am describing is in fact very important because getting code out of
>>> loops is were the important gains are.   I believe that the ppc has a
>>> some significant performance issues because of this kind of thing.
>>>
>>> kenny
>>>
>>>
>>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>>> Tom,
>>>>>>
>>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>>> supportive, but i think you are missing an important case.
>>>>>>
>>>>>> My problem is that it the pass does not actually look at the target and
>>>>>> make any decisions based on that target.
>>>>>>
>>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>>> extension operations before doing 64 bit compare and branch
>>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>>> the machine.     There are a lot of machines that can do more than one
>>>>>> size of comparison.
>>>>>>
>>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>>> extensions because it believes that the length of the destination is the
>>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>>> branch instruction that does not cost more. in that case, those
>>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>>
>>>>>> There are many operations other than compare and branch where the pass
>>>>>> should be asking "can i shorten the target for free and therefore get
>>>>>> rid of the extension?"
>>>>> Kenneth,
>>>>>
>>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>>> is the same.
>>>>>
>>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>>> like this:
>>>>> ...
>>>>>     (insn (set (reg:DI 5)
>>>>>                (zero_extend:DI (reg:SI 4))))
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:DI 5)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>>     ->
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:SI 4)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>> ...
>>>>> I would expect combine to optimize this.
>>>>>
>>>>> In case I got the example all backwards or it is a too simple one, please
>>>>> provide an rtl example that illustrates the optimization.
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>
>>>>>>    right shifts, rotates, and stores are not in
>>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>>> branches, conditional moves.   There may even be machines that have this
>>>>>> for divide, but i do not know of any off the top of my head.
>>>>>>
>>>>>> What i am suggesting moves this pass into the target specific set of
>>>>>> optimizations rather than target independent set, but at where this pass
>>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>>> where all of the operands have been extended should be checked to see if
>>>>>> it was really necessary to use the longer form before doing the
>>>>>> propagation pass.
>>>>>>
>>>>>> kenny
>>>>>>
>>>>>>
>>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>>
>>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>>
>>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>>
>>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>>> currently written, yours doesn't.
>>>>>>>>
>>>>>>> Eric,
>>>>>>>
>>>>>>> I've done the following:
>>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>>     instructions.
>>>>>>> - updated the patch to be applicable to current trunk
>>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>>     this thread), and added that one as test-case.
>>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>>     on the motivating example.
>>>>>>>
>>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>>
>>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Tom
>>>>>>>
>>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>>
>>>>>>> 	* ee.c: New file.
>>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>>
>>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>>> Index: gcc/tree-pass.h
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>>
>>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>>> @@ -5,19 +5,19 @@
>>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -f (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +f (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (; i & 1; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> }
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -g (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +g (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (i = 0; i < n; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>>> @@ -0,0 +1,16 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>>> +{
>>>>>>> +  unsigned int b = a;
>>>>>>> +  if (c)
>>>>>>> +    b = a & 0x10ff;
>>>>>>> +  return b;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>>> +{
>>>>>>> +  if (c)
>>>>>>> +    *z = 0;
>>>>>>> +  *p ^= (unsigned char)s;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f (short d[2][2])
>>>>>>> +{
>>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>>> +  d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +unsigned int f(unsigned char byte)
>>>>>>> +{
>>>>>>> +  return byte << 25;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/opts.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/opts.c (revision 189409)
>>>>>>> +++ gcc/opts.c (working copy)
>>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>>
>>>>>>>        /* -O3 optimizations.  */
>>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>>> Index: gcc/timevar.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>>> +++ gcc/timevar.def (working copy)
>>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>>> Index: gcc/ee.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/ee.c (revision 0)
>>>>>>> @@ -0,0 +1,1190 @@
>>>>>>> +/* Redundant extension elimination.
>>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>>> +
>>>>>>> +This file is part of GCC.
>>>>>>> +
>>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>>> +version.
>>>>>>> +
>>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>>> +for more details.
>>>>>>> +
>>>>>>> +You should have received a copy of the GNU General Public License
>>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>>> +
>>>>>>> +/*
>>>>>>> +
>>>>>>> +  MOTIVATING EXAMPLE
>>>>>>> +
>>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>>> +
>>>>>>> +    void f (short d[2][2])
>>>>>>> +    {
>>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>>> +      d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>>> +
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 217)
>>>>>>> +         (plus:SI (reg:SI 205)
>>>>>>> +                  (reg:SI 204)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 218)
>>>>>>> +         (minus:SI (reg:SI 204)
>>>>>>> +                   (reg:SI 205)))
>>>>>>> +
>>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>>> +
>>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>>> +                 (const_int 2 [0x2])))
>>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>>> +
>>>>>>> +
>>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>>> +
>>>>>>> +
>>>>>>> +  INTENDED EFFECT
>>>>>>> +
>>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>>> +  than the extension.
>>>>>>> +
>>>>>>> +
>>>>>>> +  IMPLEMENTATION
>>>>>>> +
>>>>>>> +  The pass scans at most two times over all instructions.
>>>>>>> +
>>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>>> +  operands of the insns defining the pseudo.
>>>>>>> +
>>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>>> +
>>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>>> +  which we may find more redundant extensions.
>>>>>>> +
>>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>>> +
>>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>>> +  of a paradoxical subreg instead.
>>>>>>> +
>>>>>>> +
>>>>>>> +  ILLUSTRATION OF PASS
>>>>>>> +
>>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>>> +
>>>>>>> +  We find the 2 extensions:
>>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>>> +
>>>>>>> +  We calculate the biggests uses of a register:
>>>>>>> +    biggest_use
>>>>>>> +    reg 204: size 32
>>>>>>> +    reg 205: size 32
>>>>>>> +    reg 217: size 16
>>>>>>> +    reg 218: size 16
>>>>>>> +
>>>>>>> +  We propagate the biggest uses where possible:
>>>>>>> +    propagations
>>>>>>> +    205: 32 -> 16
>>>>>>> +    204: 32 -> 16
>>>>>>> +    214: 32 -> 16
>>>>>>> +    215: 32 -> 16
>>>>>>> +
>>>>>>> +  We conclude that the extensions are redundant:
>>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>>> +
>>>>>>> +  And we replace them with regcopies:
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +        (reg:SI 213))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +        (reg:SI 216))
>>>>>>> +
>>>>>>> +
>>>>>>> +  LIMITATIONS
>>>>>>> +
>>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>>> +
>>>>>>> +#include "config.h"
>>>>>>> +#include "system.h"
>>>>>>> +#include "coretypes.h"
>>>>>>> +#include "tm.h"
>>>>>>> +#include "rtl.h"
>>>>>>> +#include "tree.h"
>>>>>>> +#include "tm_p.h"
>>>>>>> +#include "flags.h"
>>>>>>> +#include "regs.h"
>>>>>>> +#include "hard-reg-set.h"
>>>>>>> +#include "basic-block.h"
>>>>>>> +#include "insn-config.h"
>>>>>>> +#include "function.h"
>>>>>>> +#include "expr.h"
>>>>>>> +#include "insn-attr.h"
>>>>>>> +#include "recog.h"
>>>>>>> +#include "toplev.h"
>>>>>>> +#include "target.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "optabs.h"
>>>>>>> +#include "insn-codes.h"
>>>>>>> +#include "rtlhooks-def.h"
>>>>>>> +#include "output.h"
>>>>>>> +#include "params.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "tree-pass.h"
>>>>>>> +#include "cgraph.h"
>>>>>>> +#include "vec.h"
>>>>>>> +
>>>>>>> +#define SKIP_REG (-1)
>>>>>>> +#define NONE (-1)
>>>>>>> +
>>>>>>> +/* Number of registers at start of pass.  */
>>>>>>> +
>>>>>>> +static int n_regs;
>>>>>>> +
>>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>>> +
>>>>>>> +static int *biggest_use;
>>>>>>> +
>>>>>>> +/* Array to register the promoted subregs.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>>> +
>>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>>> +
>>>>>>> +static int *propagated_size;
>>>>>>> +
>>>>>>> +typedef struct use
>>>>>>> +{
>>>>>>> +  int regno;
>>>>>>> +  int size;
>>>>>>> +  int offset;
>>>>>>> +  rtx *use;
>>>>>>> +} use_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(use_type);
>>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the uses.  */
>>>>>>> +
>>>>>>> +static VEC (use_type,heap) **uses;
>>>>>>> +
>>>>>>> +typedef struct prop
>>>>>>> +{
>>>>>>> +  rtx set;
>>>>>>> +  int uses_regno;
>>>>>>> +  int uses_index;
>>>>>>> +} prop_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(prop_type);
>>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the propagations.  */
>>>>>>> +
>>>>>>> +static VEC (prop_type,heap) **props;
>>>>>>> +
>>>>>>> +/* Work list for propragation.  */
>>>>>>> +
>>>>>>> +static VEC (int,heap) *wl;
>>>>>>> +
>>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>>> +
>>>>>>> +static bool *in_wl;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>>> +   removed or replaced.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>>> +
>>>>>>> +/* Forward declaration.  */
>>>>>>> +
>>>>>>> +static void note_use (rtx *x, void *data);
>>>>>>> +static bool skip_reg_p (int regno);
>>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>>> +   promotion.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  int mre;
>>>>>>> +
>>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>>> +  return mre != UNKNOWN;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>>> +{
>>>>>>> +  rtx reg;
>>>>>>> +
>>>>>>> +  if (REG_P (use))
>>>>>>> +    {
>>>>>>> +      *regno = REGNO (use);
>>>>>>> +      *offset = 0;
>>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>>> +    {
>>>>>>> +      reg = SUBREG_REG (use);
>>>>>>> +
>>>>>>> +      if (!REG_P (reg))
>>>>>>> +	return false;
>>>>>>> +
>>>>>>> +      *regno = REGNO (reg);
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>>> +	{
>>>>>>> +	  *offset = 0;
>>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  *offset = subreg_lsb (use);
>>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +new_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  if (uses[regno] == NULL)
>>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>>> +
>>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>>> +{
>>>>>>> +  int *current;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (size >= 0);
>>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return NULL;
>>>>>>> +
>>>>>>> +  p = new_use (regno);
>>>>>>> +  p->regno = regno;
>>>>>>> +  p->size = size;
>>>>>>> +  p->offset = offset;
>>>>>>> +  p->use = use;
>>>>>>> +
>>>>>>> +  /* Update the bigest use.  */
>>>>>>> +  current = &biggest_use[regno];
>>>>>>> +  *current = MAX (*current, size);
>>>>>>> +
>>>>>>> +  return p;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  const char *format_ptr;
>>>>>>> +  int i, j;
>>>>>>> +
>>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>>> +    if (format_ptr[i] == 'e')
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>>> +
>>>>>>> +static rtx
>>>>>>> +get_set (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  rtx sub;
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>>> +    return pattern;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>>> +      {
>>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>>> +	  return sub;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  return NULL_RTX;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>>> +{
>>>>>>> +  unsigned int i, smallest;
>>>>>>> +  int operand_size[2];
>>>>>>> +  int operand_offset[2];
>>>>>>> +  int used_size;
>>>>>>> +  unsigned int operand_regno[2];
>>>>>>> +  bool operand_reg[2];
>>>>>>> +  bool operand_ignore[2];
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    {
>>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>>> +      operand_ignore[i] = false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>>> +
>>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>>> +    {
>>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    if (!operand_reg[i])
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (!operand_ignore[i])
>>>>>>> +      {
>>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>>> +			  &XEXP (use, i));
>>>>>>> +	register_prop (set, p);
>>>>>>> +      }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>>> +{
>>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>>> +
>>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note promoted subregs in X.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +  rtx subreg = *x;
>>>>>>> +
>>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>>> +    register_promoted_subreg (subreg);
>>>>>>> +
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_use (rtx *x, void *data)
>>>>>>> +{
>>>>>>> +  rtx use = *x;
>>>>>>> +  rtx pattern = (rtx)data;
>>>>>>> +  int use_size, use_offset;
>>>>>>> +  unsigned int use_regno;
>>>>>>> +  rtx set;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>>> +
>>>>>>> +  set = get_set (use, pattern);
>>>>>>> +
>>>>>>> +  switch (GET_CODE (use))
>>>>>>> +    {
>>>>>>> +    case REG:
>>>>>>> +    case SUBREG:
>>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case SIGN_EXTEND:
>>>>>>> +    case ZERO_EXTEND:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case IOR:
>>>>>>> +    case AND:
>>>>>>> +    case XOR:
>>>>>>> +    case PLUS:
>>>>>>> +    case MINUS:
>>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>>> +      return;
>>>>>>> +    case NOT:
>>>>>>> +    case NEG:
>>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>>> +      return;
>>>>>>> +    case ASHIFT:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>>> +			  use_offset, x);
>>>>>>> +      return;
>>>>>>> +    default:
>>>>>>> +      note_embedded_uses (use, pattern);
>>>>>>> +      return;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +#ifdef EPILOGUE_USES
>>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +#ifdef EH_USES
>>>>>>> +  if (EH_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +skip_reg_p (int regno)
>>>>>>> +{
>>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>>> +     We don't handle that properly.  */
>>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_call_uses (rtx insn)
>>>>>>> +{
>>>>>>> +  rtx link, link_expr;
>>>>>>> +
>>>>>>> +  if (!CALL_P (insn))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>>> +    {
>>>>>>> +      link_expr = XEXP (link, 0);
>>>>>>> +
>>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Dump the biggest uses found.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +dump_biggest_use (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (!dump_file)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    if (biggest_use[i] > 0)
>>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +calculate_biggest_use (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>>> +
>>>>>>> +	if (CALL_P (insn))
>>>>>>> +	  note_call_uses (insn);
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  dump_biggest_use ();
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_prop (rtx set, use_type *use)
>>>>>>> +{
>>>>>>> +  prop_type *p;
>>>>>>> +  int regno;
>>>>>>> +
>>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (props[regno] == NULL)
>>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>>> +  p->set = set;
>>>>>>> +  p->uses_regno = use->regno;
>>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Add REGNO to the worklist.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +add_to_wl (int regno)
>>>>>>> +{
>>>>>>> +  if (in_wl[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (biggest_use[regno] > 0
>>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (propagated_size[regno] != NONE
>>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>>> +  in_wl[regno] = true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +pop_wl (void)
>>>>>>> +{
>>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>>> +  in_wl[regno] = false;
>>>>>>> +  return regno;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>>> +{
>>>>>>> +  if (dest_size == 0)
>>>>>>> +    return 0;
>>>>>>> +
>>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +get_biggest_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  use_type *p;
>>>>>>> +  int max = 0;
>>>>>>> +
>>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>>> +    max = MAX (max, p->size);
>>>>>>> +
>>>>>>> +  return max;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>>> +{
>>>>>>> +  int new_use_size;
>>>>>>> +  int prev_biggest_use;
>>>>>>> +  int *current;
>>>>>>> +
>>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>>> +
>>>>>>> +  if (new_use_size >= use->size)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  use->size = new_use_size;
>>>>>>> +
>>>>>>> +  current = &biggest_use[use->regno];
>>>>>>> +
>>>>>>> +  prev_biggest_use = *current;
>>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>>> +
>>>>>>> +  if (*current >= prev_biggest_use)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  add_to_wl (use->regno);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>>> +	     *current);
>>>>>>> +
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  prop_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>>> +    {
>>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>>> +      ++(*nr_propagations);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +  int nr_propagations = 0;
>>>>>>> +
>>>>>>> +  /* Initialize work list.  */
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    add_to_wl (i);
>>>>>>> +
>>>>>>> +  /* Work the work list.  */
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>>> +  while (!VEC_empty (int, wl))
>>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  rtx src, op0;
>>>>>>> +
>>>>>>> +  /* Detect set of reg.  */
>>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>>> +
>>>>>>> +  if (!REG_P (*dest))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  /* Detect sign or zero extension.  */
>>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>>> +    {
>>>>>>> +      op0 = XEXP (src, 0);
>>>>>>> +
>>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>>> +      if (GET_CODE (src) == AND)
>>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>>> +      else
>>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>>> +
>>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>>> +	{
>>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (REG_P (op0))
>>>>>>> +	{
>>>>>>> +	  *inner = op0;
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>>> +	{
>>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +find_extensions (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>>> +		   " reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>>> +	fprintf (dump_file, "\n");
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  int biggest_dest_use;
>>>>>>> +
>>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>>> +    gcc_unreachable ();
>>>>>>> +
>>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>>> +
>>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>>> +   redundant_extensions vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +find_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int ix;
>>>>>>> +  bool found = false;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +      {
>>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>>> +		   " defining reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +	found = true;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file && found)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  rtx subreg;
>>>>>>> +
>>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>>> +    {
>>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>>> +   writes to DEST.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>>> +{
>>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>>> +
>>>>>>> +  /* Check whether replacement is needed.  */
>>>>>>> +  if (dest != inner)
>>>>>>> +    {
>>>>>>> +      start_sequence ();
>>>>>>> +
>>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>>> +	{
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>>> +	{
>>>>>>> +	  emit_clobber (dest);
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>>> +
>>>>>>> +      seq = get_insns ();
>>>>>>> +      end_sequence ();
>>>>>>> +
>>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>>> +	  return;
>>>>>>> +
>>>>>>> +      /* Insert the replacement.  */
>>>>>>> +      emit_insn_before (seq, insn);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>>> +      if (dest != inner)
>>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Remove the extension.  */
>>>>>>> +  delete_insn (insn);
>>>>>>> +
>>>>>>> +  reset_promoted_subreg (dest);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +init_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>>> +
>>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    {
>>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>>> +      propagated_size[i] = NONE;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>>> +
>>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      uses[i] = NULL;
>>>>>>> +      props[i] = NULL;
>>>>>>> +      in_wl[i] = false;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +remove_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +  int ix;
>>>>>>> +
>>>>>>> +  if (!find_extensions ())
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  calculate_biggest_use ();
>>>>>>> +
>>>>>>> +  find_redundant_extensions ();
>>>>>>> +
>>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>>> +    {
>>>>>>> +      propagate ();
>>>>>>> +
>>>>>>> +      find_redundant_extensions ();
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>>> +    {
>>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +finish_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  XDELETEVEC (propagated_size);
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>>> +
>>>>>>> +  VEC_free (int, heap, wl);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      if (uses[i] != NULL)
>>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>>> +
>>>>>>> +      if (props[i] != NULL)
>>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  XDELETEVEC (uses);
>>>>>>> +  XDELETEVEC (props);
>>>>>>> +  XDELETEVEC (biggest_use);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Remove redundant extensions.  */
>>>>>>> +
>>>>>>> +static unsigned int
>>>>>>> +rest_of_handle_ee (void)
>>>>>>> +{
>>>>>>> +  n_regs = max_reg_num ();
>>>>>>> +
>>>>>>> +  init_pass ();
>>>>>>> +  remove_redundant_extensions ();
>>>>>>> +  finish_pass ();
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +gate_handle_ee (void)
>>>>>>> +{
>>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>>> +}
>>>>>>> +
>>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>>> +{
>>>>>>> + {
>>>>>>> +  RTL_PASS,
>>>>>>> +  "ee",                                 /* name */
>>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>>> +  NULL,                                 /* sub */
>>>>>>> +  NULL,                                 /* next */
>>>>>>> +  0,                                    /* static_pass_number */
>>>>>>> +  TV_EE,                                /* tv_id */
>>>>>>> +  0,                                    /* properties_required */
>>>>>>> +  0,                                    /* properties_provided */
>>>>>>> +  0,                                    /* properties_destroyed */
>>>>>>> +  0,                                    /* todo_flags_start */
>>>>>>> +  TODO_ggc_collect |
>>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>>> + }
>>>>>>> +};
>>>>>>> Index: gcc/common.opt
>>>>>>> ===================================================================
>>>>>>> --- gcc/common.opt (revision 189409)
>>>>>>> +++ gcc/common.opt (working copy)
>>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>>> Perform DWARF2 duplicate elimination
>>>>>>>
>>>>>>> +fextension-elimination
>>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>>> +Perform extension elimination
>>>>>>> +
>>>>>>> fipa-sra
>>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>>> Perform interprocedural reduction of aggregates
>>>>>>> Index: gcc/Makefile.in
>>>>>>> ===================================================================
>>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>>> 	dwarf2asm.o \
>>>>>>> 	dwarf2cfi.o \
>>>>>>> 	dwarf2out.o \
>>>>>>> +	ee.o \
>>>>>>> 	ebitmap.o \
>>>>>>> 	emit-rtl.o \
>>>>>>> 	et-forest.o \
>>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>>> Index: gcc/passes.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/passes.c (revision 189409)
>>>>>>> +++ gcc/passes.c (working copy)
>>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>>          NEXT_PASS (pass_combine);
>>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>>          NEXT_PASS (pass_regmove);
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-13  7:54                       ` Tom de Vries
  2012-07-13 11:39                         ` Kenneth Zadeck
@ 2012-07-17 15:17                         ` Kenneth Zadeck
  2012-07-20 18:41                           ` Tom de Vries
  1 sibling, 1 reply; 43+ messages in thread
From: Kenneth Zadeck @ 2012-07-17 15:17 UTC (permalink / raw)
  To: Tom de Vries; +Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

the pass does not handle induction variables, i.e. variables that feed 
into themselves.

kenny
On 07/13/2012 03:53 AM, Tom de Vries wrote:
> On 12/07/12 14:04, Kenneth Zadeck wrote:
>> you are on the right track with the example but combine will not get
>> this unless everything is in the same bb.
>> the whole point of having a separate pass for doing extension
>> elimination is that it needs to be done over the entire function.
>>
> There is a pass_ree, which does inter-bb combine targeted at extensions.
> However, that pass is currently limited to combining extensions with the
> definitions of the register it extends. The way your example sounds, you want
> the reverse, where extensions are combined with all their uses.
> I would say pass_ree is the natural place to add this and handle the example you
> describe.
>
> Thanks,
> - Tom
>
>> my example is also a little more complex because, since we are talking
>> about induction vars, you have an initial assignment outside of a loop,
>> and increment inside the loop and the test you describe at the bottom of
>> the loop.
>>
>> I would point out that with respect to speed optimizations, the case i
>> am describing is in fact very important because getting code out of
>> loops is were the important gains are.   I believe that the ppc has a
>> some significant performance issues because of this kind of thing.
>>
>> kenny
>>
>>
>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>> Tom,
>>>>>
>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>> supportive, but i think you are missing an important case.
>>>>>
>>>>> My problem is that it the pass does not actually look at the target and
>>>>> make any decisions based on that target.
>>>>>
>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>> extension operations before doing 64 bit compare and branch
>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>> the machine.     There are a lot of machines that can do more than one
>>>>> size of comparison.
>>>>>
>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>> extensions because it believes that the length of the destination is the
>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>> branch instruction that does not cost more. in that case, those
>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>
>>>>> There are many operations other than compare and branch where the pass
>>>>> should be asking "can i shorten the target for free and therefore get
>>>>> rid of the extension?"
>>>> Kenneth,
>>>>
>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>> is the same.
>>>>
>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>> like this:
>>>> ...
>>>>     (insn (set (reg:DI 5)
>>>>                (zero_extend:DI (reg:SI 4))))
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:DI 5)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>>     ->
>>>>
>>>>     (jump_insn (set (pc)
>>>>                     (if_then_else (eq (reg:SI 4)
>>>>                                       (const_int 0))
>>>>                                   (label_ref:DI 62)
>>>>                                   (pc))))
>>>>
>>>> ...
>>>> I would expect combine to optimize this.
>>>>
>>>> In case I got the example all backwards or it is a too simple one, please
>>>> provide an rtl example that illustrates the optimization.
>>>>
>>>> Thanks,
>>>> - Tom
>>>>
>>>>
>>>>>    right shifts, rotates, and stores are not in
>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>> branches, conditional moves.   There may even be machines that have this
>>>>> for divide, but i do not know of any off the top of my head.
>>>>>
>>>>> What i am suggesting moves this pass into the target specific set of
>>>>> optimizations rather than target independent set, but at where this pass
>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>> where all of the operands have been extended should be checked to see if
>>>>> it was really necessary to use the longer form before doing the
>>>>> propagation pass.
>>>>>
>>>>> kenny
>>>>>
>>>>>
>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>
>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>
>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>
>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>> currently written, yours doesn't.
>>>>>>>
>>>>>> Eric,
>>>>>>
>>>>>> I've done the following:
>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>     instructions.
>>>>>> - updated the patch to be applicable to current trunk
>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>     this thread), and added that one as test-case.
>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>     on the motivating example.
>>>>>>
>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>
>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>
>>>>>> OK for trunk?
>>>>>>
>>>>>> Thanks,
>>>>>> - Tom
>>>>>>
>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>
>>>>>> 	* ee.c: New file.
>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>
>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>> Index: gcc/tree-pass.h
>>>>>> ===================================================================
>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>
>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>> ===================================================================
>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>> @@ -5,19 +5,19 @@
>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -f (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +f (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (; i & 1; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> }
>>>>>>
>>>>>> -NOMIPS16 int
>>>>>> -g (int n, int i)
>>>>>> +NOMIPS16 long int
>>>>>> +g (long int n, long int i)
>>>>>> {
>>>>>> -  int s = 0;
>>>>>> +  long int s = 0;
>>>>>>      for (i = 0; i < n; i++)
>>>>>>        s += i;
>>>>>>      return s;
>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>> @@ -0,0 +1,16 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>> +{
>>>>>> +  unsigned int b = a;
>>>>>> +  if (c)
>>>>>> +    b = a & 0x10ff;
>>>>>> +  return b;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>> +{
>>>>>> +  if (c)
>>>>>> +    *z = 0;
>>>>>> +  *p ^= (unsigned char)s;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +
>>>>>> +void f (short d[2][2])
>>>>>> +{
>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>> +  d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>> @@ -0,0 +1,20 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>> +{
>>>>>> +  short or = 0;
>>>>>> +  while (c)
>>>>>> +    {
>>>>>> +      or = or | s[c];
>>>>>> +      c --;
>>>>>> +    }
>>>>>> +  *p = (unsigned char)or;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>> @@ -0,0 +1,13 @@
>>>>>> +/* { dg-do compile } */
>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>> +
>>>>>> +unsigned int f(unsigned char byte)
>>>>>> +{
>>>>>> +  return byte << 25;
>>>>>> +}
>>>>>> +
>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>> +
>>>>>> Index: gcc/opts.c
>>>>>> ===================================================================
>>>>>> --- gcc/opts.c (revision 189409)
>>>>>> +++ gcc/opts.c (working copy)
>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>
>>>>>>        /* -O3 optimizations.  */
>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>> Index: gcc/timevar.def
>>>>>> ===================================================================
>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>> +++ gcc/timevar.def (working copy)
>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>> Index: gcc/ee.c
>>>>>> ===================================================================
>>>>>> --- /dev/null (new file)
>>>>>> +++ gcc/ee.c (revision 0)
>>>>>> @@ -0,0 +1,1190 @@
>>>>>> +/* Redundant extension elimination.
>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>> +
>>>>>> +This file is part of GCC.
>>>>>> +
>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>> +version.
>>>>>> +
>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>> +for more details.
>>>>>> +
>>>>>> +You should have received a copy of the GNU General Public License
>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>> +
>>>>>> +/*
>>>>>> +
>>>>>> +  MOTIVATING EXAMPLE
>>>>>> +
>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>> +
>>>>>> +    void f (short d[2][2])
>>>>>> +    {
>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>> +      d[0][0] = d0 + d1;
>>>>>> +      d[0][1] = d0 - d1;
>>>>>> +    }
>>>>>> +
>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>> +
>>>>>> +    (set (reg:SI 204)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>> +
>>>>>> +    (set (reg:SI 217)
>>>>>> +         (plus:SI (reg:SI 205)
>>>>>> +                  (reg:SI 204)))
>>>>>> +
>>>>>> +    (set (reg:SI 218)
>>>>>> +         (minus:SI (reg:SI 204)
>>>>>> +                   (reg:SI 205)))
>>>>>> +
>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>> +
>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>> +                 (const_int 2 [0x2])))
>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>> +
>>>>>> +
>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>> +
>>>>>> +
>>>>>> +  INTENDED EFFECT
>>>>>> +
>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>> +  than the extension.
>>>>>> +
>>>>>> +
>>>>>> +  IMPLEMENTATION
>>>>>> +
>>>>>> +  The pass scans at most two times over all instructions.
>>>>>> +
>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>> +  done.
>>>>>> +
>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>> +  operands of the insns defining the pseudo.
>>>>>> +
>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>> +
>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>> +  which we may find more redundant extensions.
>>>>>> +
>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>> +
>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>> +  of a paradoxical subreg instead.
>>>>>> +
>>>>>> +
>>>>>> +  ILLUSTRATION OF PASS
>>>>>> +
>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>> +
>>>>>> +  We find the 2 extensions:
>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>> +
>>>>>> +  We calculate the biggests uses of a register:
>>>>>> +    biggest_use
>>>>>> +    reg 204: size 32
>>>>>> +    reg 205: size 32
>>>>>> +    reg 217: size 16
>>>>>> +    reg 218: size 16
>>>>>> +
>>>>>> +  We propagate the biggest uses where possible:
>>>>>> +    propagations
>>>>>> +    205: 32 -> 16
>>>>>> +    204: 32 -> 16
>>>>>> +    214: 32 -> 16
>>>>>> +    215: 32 -> 16
>>>>>> +
>>>>>> +  We conclude that the extensions are redundant:
>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>> +
>>>>>> +  And we replace them with regcopies:
>>>>>> +    (set (reg:SI 204)
>>>>>> +        (reg:SI 213))
>>>>>> +
>>>>>> +    (set (reg:SI 205)
>>>>>> +        (reg:SI 216))
>>>>>> +
>>>>>> +
>>>>>> +  LIMITATIONS
>>>>>> +
>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>> +  done.
>>>>>> +
>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>> +
>>>>>> +#include "config.h"
>>>>>> +#include "system.h"
>>>>>> +#include "coretypes.h"
>>>>>> +#include "tm.h"
>>>>>> +#include "rtl.h"
>>>>>> +#include "tree.h"
>>>>>> +#include "tm_p.h"
>>>>>> +#include "flags.h"
>>>>>> +#include "regs.h"
>>>>>> +#include "hard-reg-set.h"
>>>>>> +#include "basic-block.h"
>>>>>> +#include "insn-config.h"
>>>>>> +#include "function.h"
>>>>>> +#include "expr.h"
>>>>>> +#include "insn-attr.h"
>>>>>> +#include "recog.h"
>>>>>> +#include "toplev.h"
>>>>>> +#include "target.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "optabs.h"
>>>>>> +#include "insn-codes.h"
>>>>>> +#include "rtlhooks-def.h"
>>>>>> +#include "output.h"
>>>>>> +#include "params.h"
>>>>>> +#include "timevar.h"
>>>>>> +#include "tree-pass.h"
>>>>>> +#include "cgraph.h"
>>>>>> +#include "vec.h"
>>>>>> +
>>>>>> +#define SKIP_REG (-1)
>>>>>> +#define NONE (-1)
>>>>>> +
>>>>>> +/* Number of registers at start of pass.  */
>>>>>> +
>>>>>> +static int n_regs;
>>>>>> +
>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>> +
>>>>>> +static int *biggest_use;
>>>>>> +
>>>>>> +/* Array to register the promoted subregs.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>> +
>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>> +
>>>>>> +static int *propagated_size;
>>>>>> +
>>>>>> +typedef struct use
>>>>>> +{
>>>>>> +  int regno;
>>>>>> +  int size;
>>>>>> +  int offset;
>>>>>> +  rtx *use;
>>>>>> +} use_type;
>>>>>> +
>>>>>> +DEF_VEC_O(use_type);
>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the uses.  */
>>>>>> +
>>>>>> +static VEC (use_type,heap) **uses;
>>>>>> +
>>>>>> +typedef struct prop
>>>>>> +{
>>>>>> +  rtx set;
>>>>>> +  int uses_regno;
>>>>>> +  int uses_index;
>>>>>> +} prop_type;
>>>>>> +
>>>>>> +DEF_VEC_O(prop_type);
>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>> +
>>>>>> +/* Vector to register the propagations.  */
>>>>>> +
>>>>>> +static VEC (prop_type,heap) **props;
>>>>>> +
>>>>>> +/* Work list for propragation.  */
>>>>>> +
>>>>>> +static VEC (int,heap) *wl;
>>>>>> +
>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>> +
>>>>>> +static bool *in_wl;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>> +
>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>> +   removed or replaced.  */
>>>>>> +
>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>> +
>>>>>> +/* Forward declaration.  */
>>>>>> +
>>>>>> +static void note_use (rtx *x, void *data);
>>>>>> +static bool skip_reg_p (int regno);
>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>> +   promotion.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>> +{
>>>>>> +  int mre;
>>>>>> +
>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>> +  return mre != UNKNOWN;
>>>>>> +}
>>>>>> +
>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>> +{
>>>>>> +  rtx reg;
>>>>>> +
>>>>>> +  if (REG_P (use))
>>>>>> +    {
>>>>>> +      *regno = REGNO (use);
>>>>>> +      *offset = 0;
>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>> +    {
>>>>>> +      reg = SUBREG_REG (use);
>>>>>> +
>>>>>> +      if (!REG_P (reg))
>>>>>> +	return false;
>>>>>> +
>>>>>> +      *regno = REGNO (reg);
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>> +	{
>>>>>> +	  *offset = 0;
>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  *offset = subreg_lsb (use);
>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>> +	}
>>>>>> +
>>>>>> +      return true;
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +new_use (unsigned int regno)
>>>>>> +{
>>>>>> +  if (uses[regno] == NULL)
>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>> +
>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>> +
>>>>>> +static use_type *
>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>> +{
>>>>>> +  int *current;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  gcc_assert (size >= 0);
>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return NULL;
>>>>>> +
>>>>>> +  p = new_use (regno);
>>>>>> +  p->regno = regno;
>>>>>> +  p->size = size;
>>>>>> +  p->offset = offset;
>>>>>> +  p->use = use;
>>>>>> +
>>>>>> +  /* Update the bigest use.  */
>>>>>> +  current = &biggest_use[regno];
>>>>>> +  *current = MAX (*current, size);
>>>>>> +
>>>>>> +  return p;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  const char *format_ptr;
>>>>>> +  int i, j;
>>>>>> +
>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>> +    if (format_ptr[i] == 'e')
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>> +
>>>>>> +static rtx
>>>>>> +get_set (rtx use, rtx pattern)
>>>>>> +{
>>>>>> +  rtx sub;
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>> +    return pattern;
>>>>>> +
>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>> +      {
>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>> +	  return sub;
>>>>>> +      }
>>>>>> +
>>>>>> +  return NULL_RTX;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>> +{
>>>>>> +  unsigned int i, smallest;
>>>>>> +  int operand_size[2];
>>>>>> +  int operand_offset[2];
>>>>>> +  int used_size;
>>>>>> +  unsigned int operand_regno[2];
>>>>>> +  bool operand_reg[2];
>>>>>> +  bool operand_ignore[2];
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    {
>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>> +      operand_ignore[i] = false;
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>> +    {
>>>>>> +      used_size =
>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>> +
>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>> +    {
>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>> +
>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>> +    if (!operand_reg[i])
>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>> +    else if (!operand_ignore[i])
>>>>>> +      {
>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>> +			  &XEXP (use, i));
>>>>>> +	register_prop (set, p);
>>>>>> +      }
>>>>>> +}
>>>>>> +
>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>> +{
>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>> +
>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note promoted subregs in X.  */
>>>>>> +
>>>>>> +static int
>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +  rtx subreg = *x;
>>>>>> +
>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>> +    register_promoted_subreg (subreg);
>>>>>> +
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_use (rtx *x, void *data)
>>>>>> +{
>>>>>> +  rtx use = *x;
>>>>>> +  rtx pattern = (rtx)data;
>>>>>> +  int use_size, use_offset;
>>>>>> +  unsigned int use_regno;
>>>>>> +  rtx set;
>>>>>> +  use_type *p;
>>>>>> +
>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>> +
>>>>>> +  set = get_set (use, pattern);
>>>>>> +
>>>>>> +  switch (GET_CODE (use))
>>>>>> +    {
>>>>>> +    case REG:
>>>>>> +    case SUBREG:
>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case SIGN_EXTEND:
>>>>>> +    case ZERO_EXTEND:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>> +      register_prop (set, p);
>>>>>> +      return;
>>>>>> +    case IOR:
>>>>>> +    case AND:
>>>>>> +    case XOR:
>>>>>> +    case PLUS:
>>>>>> +    case MINUS:
>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>> +      return;
>>>>>> +    case NOT:
>>>>>> +    case NEG:
>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>> +      return;
>>>>>> +    case ASHIFT:
>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>> +	{
>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>> +	  return;
>>>>>> +	}
>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>> +			  use_offset, x);
>>>>>> +      return;
>>>>>> +    default:
>>>>>> +      note_embedded_uses (use, pattern);
>>>>>> +      return;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>> +{
>>>>>> +#ifdef EPILOGUE_USES
>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +#ifdef EH_USES
>>>>>> +  if (EH_USES (regno))
>>>>>> +    return true;
>>>>>> +#endif
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +skip_reg_p (int regno)
>>>>>> +{
>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>> +     We don't handle that properly.  */
>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>> +}
>>>>>> +
>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>> +
>>>>>> +static void
>>>>>> +note_call_uses (rtx insn)
>>>>>> +{
>>>>>> +  rtx link, link_expr;
>>>>>> +
>>>>>> +  if (!CALL_P (insn))
>>>>>> +    return;
>>>>>> +
>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>> +    {
>>>>>> +      link_expr = XEXP (link, 0);
>>>>>> +
>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Dump the biggest uses found.  */
>>>>>> +
>>>>>> +static void
>>>>>> +dump_biggest_use (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  if (!dump_file)
>>>>>> +    return;
>>>>>> +
>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    if (biggest_use[i] > 0)
>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>> +
>>>>>> +  fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>> +
>>>>>> +static void
>>>>>> +calculate_biggest_use (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>> +
>>>>>> +	if (CALL_P (insn))
>>>>>> +	  note_call_uses (insn);
>>>>>> +      }
>>>>>> +
>>>>>> +  dump_biggest_use ();
>>>>>> +}
>>>>>> +
>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +register_prop (rtx set, use_type *use)
>>>>>> +{
>>>>>> +  prop_type *p;
>>>>>> +  int regno;
>>>>>> +
>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>> +
>>>>>> +  if (skip_reg_p (regno))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (props[regno] == NULL)
>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>> +
>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>> +  p->set = set;
>>>>>> +  p->uses_regno = use->regno;
>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>> +}
>>>>>> +
>>>>>> +/* Add REGNO to the worklist.  */
>>>>>> +
>>>>>> +static void
>>>>>> +add_to_wl (int regno)
>>>>>> +{
>>>>>> +  if (in_wl[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (biggest_use[regno] > 0
>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>> +    return;
>>>>>> +
>>>>>> +  if (propagated_size[regno] != NONE
>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>> +    return;
>>>>>> +
>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>> +  in_wl[regno] = true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>> +
>>>>>> +static int
>>>>>> +pop_wl (void)
>>>>>> +{
>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>> +  in_wl[regno] = false;
>>>>>> +  return regno;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>> +
>>>>>> +static int
>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>> +{
>>>>>> +  if (dest_size == 0)
>>>>>> +    return 0;
>>>>>> +
>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>> +}
>>>>>> +
>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>> +
>>>>>> +static int
>>>>>> +get_biggest_use (unsigned int regno)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  use_type *p;
>>>>>> +  int max = 0;
>>>>>> +
>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>> +    max = MAX (max, p->size);
>>>>>> +
>>>>>> +  return max;
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>> +{
>>>>>> +  int new_use_size;
>>>>>> +  int prev_biggest_use;
>>>>>> +  int *current;
>>>>>> +
>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>> +
>>>>>> +  if (new_use_size >= use->size)
>>>>>> +    return;
>>>>>> +
>>>>>> +  use->size = new_use_size;
>>>>>> +
>>>>>> +  current = &biggest_use[use->regno];
>>>>>> +
>>>>>> +  prev_biggest_use = *current;
>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>> +
>>>>>> +  if (*current >= prev_biggest_use)
>>>>>> +    return;
>>>>>> +
>>>>>> +  add_to_wl (use->regno);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>> +	     *current);
>>>>>> +
>>>>>> +}
>>>>>> +
>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  prop_type *p;
>>>>>> +
>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>> +    {
>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>> +      ++(*nr_propagations);
>>>>>> +    }
>>>>>> +
>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>> +}
>>>>>> +
>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>> +
>>>>>> +static void
>>>>>> +propagate (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +  int nr_propagations = 0;
>>>>>> +
>>>>>> +  /* Initialize work list.  */
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    add_to_wl (i);
>>>>>> +
>>>>>> +  /* Work the work list.  */
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>> +  while (!VEC_empty (int, wl))
>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  rtx src, op0;
>>>>>> +
>>>>>> +  /* Detect set of reg.  */
>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>> +
>>>>>> +  if (!REG_P (*dest))
>>>>>> +    return false;
>>>>>> +
>>>>>> +  /* Detect sign or zero extension.  */
>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>> +    {
>>>>>> +      op0 = XEXP (src, 0);
>>>>>> +
>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>> +      if (GET_CODE (src) == AND)
>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>> +      else
>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>> +
>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>> +	{
>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (REG_P (op0))
>>>>>> +	{
>>>>>> +	  *inner = op0;
>>>>>> +
>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>> +	    return false;
>>>>>> +
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>> +	{
>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>> +	  return true;
>>>>>> +	}
>>>>>> +    }
>>>>>> +
>>>>>> +  return false;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +find_extensions (void)
>>>>>> +{
>>>>>> +  basic_block bb;
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>> +  FOR_EACH_BB (bb)
>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>> +      {
>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +	  continue;
>>>>>> +
>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>> +		   " reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>> +	fprintf (dump_file, "\n");
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>> +}
>>>>>> +
>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>> +{
>>>>>> +  int biggest_dest_use;
>>>>>> +
>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>> +    gcc_unreachable ();
>>>>>> +
>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>> +
>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>> +    return false;
>>>>>> +
>>>>>> +  return true;
>>>>>> +}
>>>>>> +
>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>> +   redundant_extensions vector.  */
>>>>>> +
>>>>>> +static void
>>>>>> +find_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int ix;
>>>>>> +  bool found = false;
>>>>>> +  int preserved_size;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>> +      {
>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>> +
>>>>>> +	if (dump_file)
>>>>>> +	  fprintf (dump_file,
>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>> +		   " defining reg %d\n",
>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>> +	found = true;
>>>>>> +      }
>>>>>> +
>>>>>> +  if (dump_file && found)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>> +
>>>>>> +static void
>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>> +{
>>>>>> +  int ix;
>>>>>> +  rtx subreg;
>>>>>> +
>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>> +    return;
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>> +    {
>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>> +    }
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>> +}
>>>>>> +
>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>> +   writes to DEST.  */
>>>>>> +
>>>>>> +static void
>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>> +{
>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>> +
>>>>>> +  /* Check whether replacement is needed.  */
>>>>>> +  if (dest != inner)
>>>>>> +    {
>>>>>> +      start_sequence ();
>>>>>> +
>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>> +	{
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>> +	{
>>>>>> +	  emit_clobber (dest);
>>>>>> +	  cp_src = inner;
>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>> +	}
>>>>>> +      else
>>>>>> +	{
>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>> +	  cp_dest = dest;
>>>>>> +	}
>>>>>> +
>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>> +
>>>>>> +      seq = get_insns ();
>>>>>> +      end_sequence ();
>>>>>> +
>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>> +	  return;
>>>>>> +
>>>>>> +      /* Insert the replacement.  */
>>>>>> +      emit_insn_before (seq, insn);
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>> +  if (dump_file)
>>>>>> +    {
>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>> +      if (dest != inner)
>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>> +      else
>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>> +    }
>>>>>> +
>>>>>> +  /* Remove the extension.  */
>>>>>> +  delete_insn (insn);
>>>>>> +
>>>>>> +  reset_promoted_subreg (dest);
>>>>>> +}
>>>>>> +
>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +init_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>> +
>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>> +    {
>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>> +      propagated_size[i] = NONE;
>>>>>> +    }
>>>>>> +
>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>> +
>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>> +
>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      uses[i] = NULL;
>>>>>> +      props[i] = NULL;
>>>>>> +      in_wl[i] = false;
>>>>>> +    }
>>>>>> +}
>>>>>> +
>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>> +
>>>>>> +static void
>>>>>> +remove_redundant_extensions (void)
>>>>>> +{
>>>>>> +  rtx insn, dest, inner;
>>>>>> +  int preserved_size;
>>>>>> +  int ix;
>>>>>> +
>>>>>> +  if (!find_extensions ())
>>>>>> +    return;
>>>>>> +
>>>>>> +  calculate_biggest_use ();
>>>>>> +
>>>>>> +  find_redundant_extensions ();
>>>>>> +
>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>> +    {
>>>>>> +      propagate ();
>>>>>> +
>>>>>> +      find_redundant_extensions ();
>>>>>> +    }
>>>>>> +
>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>> +
>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>> +    {
>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>> +    }
>>>>>> +
>>>>>> +  if (dump_file)
>>>>>> +    fprintf (dump_file, "\n");
>>>>>> +}
>>>>>> +
>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>> +
>>>>>> +static void
>>>>>> +finish_pass (void)
>>>>>> +{
>>>>>> +  int i;
>>>>>> +
>>>>>> +  XDELETEVEC (propagated_size);
>>>>>> +
>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>> +
>>>>>> +  VEC_free (int, heap, wl);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    {
>>>>>> +      if (uses[i] != NULL)
>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>> +
>>>>>> +      if (props[i] != NULL)
>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>> +    }
>>>>>> +
>>>>>> +  XDELETEVEC (uses);
>>>>>> +  XDELETEVEC (props);
>>>>>> +  XDELETEVEC (biggest_use);
>>>>>> +
>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>> +}
>>>>>> +
>>>>>> +/* Remove redundant extensions.  */
>>>>>> +
>>>>>> +static unsigned int
>>>>>> +rest_of_handle_ee (void)
>>>>>> +{
>>>>>> +  n_regs = max_reg_num ();
>>>>>> +
>>>>>> +  init_pass ();
>>>>>> +  remove_redundant_extensions ();
>>>>>> +  finish_pass ();
>>>>>> +  return 0;
>>>>>> +}
>>>>>> +
>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>> +
>>>>>> +static bool
>>>>>> +gate_handle_ee (void)
>>>>>> +{
>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>> +}
>>>>>> +
>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>> +{
>>>>>> + {
>>>>>> +  RTL_PASS,
>>>>>> +  "ee",                                 /* name */
>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>> +  NULL,                                 /* sub */
>>>>>> +  NULL,                                 /* next */
>>>>>> +  0,                                    /* static_pass_number */
>>>>>> +  TV_EE,                                /* tv_id */
>>>>>> +  0,                                    /* properties_required */
>>>>>> +  0,                                    /* properties_provided */
>>>>>> +  0,                                    /* properties_destroyed */
>>>>>> +  0,                                    /* todo_flags_start */
>>>>>> +  TODO_ggc_collect |
>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>> + }
>>>>>> +};
>>>>>> Index: gcc/common.opt
>>>>>> ===================================================================
>>>>>> --- gcc/common.opt (revision 189409)
>>>>>> +++ gcc/common.opt (working copy)
>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>> Perform DWARF2 duplicate elimination
>>>>>>
>>>>>> +fextension-elimination
>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>> +Perform extension elimination
>>>>>> +
>>>>>> fipa-sra
>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>> Perform interprocedural reduction of aggregates
>>>>>> Index: gcc/Makefile.in
>>>>>> ===================================================================
>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>> 	dwarf2asm.o \
>>>>>> 	dwarf2cfi.o \
>>>>>> 	dwarf2out.o \
>>>>>> +	ee.o \
>>>>>> 	ebitmap.o \
>>>>>> 	emit-rtl.o \
>>>>>> 	et-forest.o \
>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>> Index: gcc/passes.c
>>>>>> ===================================================================
>>>>>> --- gcc/passes.c (revision 189409)
>>>>>> +++ gcc/passes.c (working copy)
>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>          NEXT_PASS (pass_combine);
>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>          NEXT_PASS (pass_regmove);
>>
>


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-17 15:17                         ` Kenneth Zadeck
@ 2012-07-20 18:41                           ` Tom de Vries
  0 siblings, 0 replies; 43+ messages in thread
From: Tom de Vries @ 2012-07-20 18:41 UTC (permalink / raw)
  To: Kenneth Zadeck
  Cc: Tom de Vries, Eric Botcazou, tom, gcc-patches, Paolo Bonzini

On 17/07/12 17:16, Kenneth Zadeck wrote:
> the pass does not handle induction variables, i.e. variables that feed 
> into themselves.
> 

Kenny,

I know of two types of redundant extensions:
1. extensions that are redundant because the bits they are extending are already
   extended.
2. extensions that are redundant because the bits they are extending are never
   used.

The two types can be detected by different propagations. The first type can be
addressed by doing value-range propagation. The second type can be addressed by
a backward unused-bits propagation.

pass_ee implements such a backward unused-bits propagation.

I've created a simple example, which I hope represents the example you're
concerned about:
...
unsigned int
f (unsigned short n, unsigned int *x)
{
  unsigned int sum = 0;
  unsigned short s;

  for (s = 0; s < n ; s += 1)
    {
      sum += *x;
      x++;
    }

  return sum;
}
...

Assembly for mips:
...
	andi	$4,$4,0xffff
	move	$3,$0
	j	$L2
	move	$2,$0
$L3:
	lw	$7,0($5)
	andi	$3,$6,0xffff
	addu	$2,$2,$7
	addiu	$5,$5,4
$L2:
	bne	$3,$4,$L3
	addiu	$6,$3,1
	j	$31
	nop
...

The extend 'andi $3,$6,0xffff' is redundant, because it zero-extends a value
that is already zero-extended (so the first type of redundancy). This is not
something that value range propagation by itself can figure out though.

In the rtl representation below, vrp can figure out that reg 233 is always zero
extented, as a consequence of it's definitions. But it can't say that for sure
about reg 240.

The only way to figure that out, is by doing induction variable analysis. If we
assume that induction variable analysis is able to figure out the range of regs
233 an 240, then removing the redundant extension can be done by either:
- vrp. In this example, this can be limited to simply applying the value
  range of reg 240 to its use, which doesn't really require a full vrp
  solution. I think that in most test-cases like this, other redundant
  extensions will also be direct uses of the induction variables, and can be
  removed like this.
- canonical induction variable insertion (replacing the induction variables
  computations with a new canonical and simple induction variable), but that
  doesn't works for other extensions which use the induction variable.

Rtl representation for the test-case:
...
(note 46 0 40 2 [bb 2] NOTE_INSN_BASIC_BLOCK)

(insn 42 40 41 2 (set (reg/v/f:SI 238 [ x ])
        (reg:SI 5 $5 [ x ])))

(insn 41 42 43 2 (set (reg/v:SI 236 [ n+-2 ])
        (zero_extend:SI (reg:HI 4 $4 [ n ]))))

(insn 44 43 45 2 (set (reg/v:SI 233 [ s+-2 ])
        (const_int 0 [0])))

(insn 45 44 58 2 (set (reg/v:SI 232 [ sum ])
        (const_int 0 [0])))

(jump_insn 72 45 73 2 (set (pc)
        (label_ref 56))
 -> 56)

(code_label 58 45 50 3 3 "" [1 uses])

(note 50 58 51 3 [bb 3] NOTE_INSN_BASIC_BLOCK)

(insn 51 50 52 3 (set (reg:SI 239)
        (mem:SI (reg/v/f:SI 238 [ x ]) )))

(insn 52 51 53 3 (set (reg/v:SI 232 [ sum ])
        (plus:SI (reg/v:SI 232 [ sum ])
            (reg:SI 239))))

(insn 53 52 54 3 (set (reg/v/f:SI 238 [ x ])
        (plus:SI (reg/v/f:SI 238 [ x ])
            (const_int 4 [0x4]))))

(insn 54 53 55 3 (set (reg:SI 240)
        (plus:SI (reg/v:SI 233 [ s+-2 ])
            (const_int 1 [0x1]))))

(insn 55 54 56 3 (set (reg/v:SI 233 [ s+-2 ])
        (zero_extend:SI (subreg:HI (reg:SI 240) 2))))

(code_label 56 55 57 4 2 "" [1 uses])

(note 57 56 59 4 [bb 4] NOTE_INSN_BASIC_BLOCK)

(jump_insn 59 57 60 4 (set (pc)
	(if_then_else (ne (reg/v:SI 233 [ s+-2 ])
                (reg/v:SI 236 [ n+-2 ]))
            (label_ref 58)
            (pc)))
 -> 58)

(note 60 59 65 5 [bb 5] NOTE_INSN_BASIC_BLOCK)

(insn 65 60 68 5 (set (reg/i:SI 2 $2)
        (reg/v:SI 232 [ sum ])))
...

A tree-level loop index promotion pass was proposed three years back (
http://gcc.gnu.org/ml/gcc-patches/2009-04/msg01860.html ) that does optimize
this example as well. It does induction variable analysis, promotes the mode of
the induction variable and rewrites the uses, which prevents the extend from
being generated at expand.

In conclusion:
- to remove the redundant extension of type 1 in this test-case at rtl level,
  induction variable analysis is needed, in combination with (simple) vrp.
- pass_ee implements backward unused-bits propagation, which targets the
  redundant extension of type 2. This example, and more generally extensions
  which are redundant because they use range-restricted induction variables are
  out of scope for backward unused-bits propagation.

Thanks,
- Tom

> kenny
> On 07/13/2012 03:53 AM, Tom de Vries wrote:
>> On 12/07/12 14:04, Kenneth Zadeck wrote:
>>> you are on the right track with the example but combine will not get
>>> this unless everything is in the same bb.
>>> the whole point of having a separate pass for doing extension
>>> elimination is that it needs to be done over the entire function.
>>>
>> There is a pass_ree, which does inter-bb combine targeted at extensions.
>> However, that pass is currently limited to combining extensions with the
>> definitions of the register it extends. The way your example sounds, you want
>> the reverse, where extensions are combined with all their uses.
>> I would say pass_ree is the natural place to add this and handle the example you
>> describe.
>>
>> Thanks,
>> - Tom
>>
>>> my example is also a little more complex because, since we are talking
>>> about induction vars, you have an initial assignment outside of a loop,
>>> and increment inside the loop and the test you describe at the bottom of
>>> the loop.
>>>
>>> I would point out that with respect to speed optimizations, the case i
>>> am describing is in fact very important because getting code out of
>>> loops is were the important gains are.   I believe that the ppc has a
>>> some significant performance issues because of this kind of thing.
>>>
>>> kenny
>>>
>>>
>>> On 07/12/2012 05:20 AM, Tom de Vries wrote:
>>>> On 12/07/12 11:05, Tom de Vries wrote:
>>>>> On 12/07/12 03:39, Kenneth Zadeck wrote:
>>>>>> Tom,
>>>>>>
>>>>>> I have a problem with the approach that you have taken here.   I believe
>>>>>> that this could be a very useful addition to gcc so I am in general very
>>>>>> supportive, but i think you are missing an important case.
>>>>>>
>>>>>> My problem is that it the pass does not actually look at the target and
>>>>>> make any decisions based on that target.
>>>>>>
>>>>>> for instance, we have a llp64 target.   As with many targets, the target
>>>>>> has a rich set of compare and branch instructions.  In particular, it
>>>>>> can do both 32 and 64 bit comparisons.    We see that many of the
>>>>>> upstream optimizations that take int (SI mode) index variables generate
>>>>>> extension operations before doing 64 bit compare and branch
>>>>>> instructions, even though there are 32 bit comparison and branches on
>>>>>> the machine.     There are a lot of machines that can do more than one
>>>>>> size of comparison.
>>>>>>
>>>>> 	 This optimization pass, as it is currently written will not remove those
>>>>>> extensions because it believes that the length of the destination is the
>>>>>> "final answer" unless it is wrapped in an explicit truncation.
>>>>>> Instead it needs to ask the port if there is a shorted compare and
>>>>>> branch instruction that does not cost more. in that case, those
>>>>>> instructions should be rewritten to use the shorted compare and branch.
>>>>>>
>>>>>> There are many operations other than compare and branch where the pass
>>>>>> should be asking "can i shorten the target for free and therefore get
>>>>>> rid of the extension?"
>>>>> Kenneth,
>>>>>
>>>>> I'm not sure I understand the optimization you're talking about, in particular
>>>>> I'm confused about whether the branch range of the 32-bit and 64-bit comparison
>>>>> is the same.
>>>>>
>>>>> Assuming it's the same, my understanding is that you're talking about an example
>>>>> like this:
>>>>> ...
>>>>>     (insn (set (reg:DI 5)
>>>>>                (zero_extend:DI (reg:SI 4))))
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:DI 5)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>>     ->
>>>>>
>>>>>     (jump_insn (set (pc)
>>>>>                     (if_then_else (eq (reg:SI 4)
>>>>>                                       (const_int 0))
>>>>>                                   (label_ref:DI 62)
>>>>>                                   (pc))))
>>>>>
>>>>> ...
>>>>> I would expect combine to optimize this.
>>>>>
>>>>> In case I got the example all backwards or it is a too simple one, please
>>>>> provide an rtl example that illustrates the optimization.
>>>>>
>>>>> Thanks,
>>>>> - Tom
>>>>>
>>>>>
>>>>>>    right shifts, rotates, and stores are not in
>>>>>> this class, but left shifts are as are all comparisons, compare and
>>>>>> branches, conditional moves.   There may even be machines that have this
>>>>>> for divide, but i do not know of any off the top of my head.
>>>>>>
>>>>>> What i am suggesting moves this pass into the target specific set of
>>>>>> optimizations rather than target independent set, but at where this pass
>>>>>> is to be put this is completely appropriate.    Any dest instruction
>>>>>> where all of the operands have been extended should be checked to see if
>>>>>> it was really necessary to use the longer form before doing the
>>>>>> propagation pass.
>>>>>>
>>>>>> kenny
>>>>>>
>>>>>>
>>>>>> On 07/11/2012 06:30 AM, Tom de Vries wrote:
>>>>>>> On 13/11/10 10:50, Eric Botcazou wrote:
>>>>>>>>> I profiled the pass on spec2000:
>>>>>>>>>
>>>>>>>>>                       -mabi=32     -mabi=64
>>>>>>>>> ee-pass (usr time):     0.70         1.16
>>>>>>>>> total   (usr time):   919.30       879.26
>>>>>>>>> ee-pass        (%):     0.08         0.13
>>>>>>>>>
>>>>>>>>> The pass takes 0.13% or less of the total usr runtime.
>>>>>>>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>>>>>>>
>>>>>>>>> Is it necessary to improve the runtime of this pass?
>>>>>>>> I've already given my opinion about the implementation.  The other passes in
>>>>>>>> the compiler try hard not to rescan everything when a single bit changes; as
>>>>>>>> currently written, yours doesn't.
>>>>>>>>
>>>>>>> Eric,
>>>>>>>
>>>>>>> I've done the following:
>>>>>>> - refactored the pass such that it now scans at most twice over all
>>>>>>>     instructions.
>>>>>>> - updated the patch to be applicable to current trunk
>>>>>>> - updated the motivating example to a more applicable one (as discussed in
>>>>>>>     this thread), and added that one as test-case.
>>>>>>> - added a part in the header comment illustrating the working of the pass
>>>>>>>     on the motivating example.
>>>>>>>
>>>>>>> bootstrapped and reg-tested on x86_64 and i686.
>>>>>>>
>>>>>>> build and reg-tested on mips, mips64, and arm.
>>>>>>>
>>>>>>> OK for trunk?
>>>>>>>
>>>>>>> Thanks,
>>>>>>> - Tom
>>>>>>>
>>>>>>> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
>>>>>>>
>>>>>>> 	* ee.c: New file.
>>>>>>> 	* tree-pass.h (pass_ee): Declare.
>>>>>>> 	* opts.c ( default_options_table): Set flag_ee at -O2.
>>>>>>> 	* timevar.def (TV_EE): New timevar.
>>>>>>> 	* common.opt (fextension-elimination): New option.
>>>>>>> 	* Makefile.in (ee.o): New rule.
>>>>>>> 	* passes.c (pass_ee): Add it.
>>>>>>>
>>>>>>> 	* gcc.dg/extend-1.c: New test.
>>>>>>> 	* gcc.dg/extend-2.c: Same.
>>>>>>> 	* gcc.dg/extend-2-64.c: Same.
>>>>>>> 	* gcc.dg/extend-3.c: Same.
>>>>>>> 	* gcc.dg/extend-4.c: Same.
>>>>>>> 	* gcc.dg/extend-5.c: Same.
>>>>>>> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
>>>>>>> Index: gcc/tree-pass.h
>>>>>>> ===================================================================
>>>>>>> --- gcc/tree-pass.h (revision 189409)
>>>>>>> +++ gcc/tree-pass.h (working copy)
>>>>>>> @@ -483,6 +483,7 @@ extern struct gimple_opt_pass pass_fixup
>>>>>>>
>>>>>>> extern struct rtl_opt_pass pass_expand;
>>>>>>> extern struct rtl_opt_pass pass_instantiate_virtual_regs;
>>>>>>> +extern struct rtl_opt_pass pass_ee;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop;
>>>>>>> extern struct rtl_opt_pass pass_rtl_fwprop_addr;
>>>>>>> extern struct rtl_opt_pass pass_jump;
>>>>>>> Index: gcc/testsuite/gcc.target/mips/octeon-bbit-2.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (revision 189409)
>>>>>>> +++ gcc/testsuite/gcc.target/mips/octeon-bbit-2.c (working copy)
>>>>>>> @@ -5,19 +5,19 @@
>>>>>>> /* { dg-final { scan-assembler "\tbnel\t" } } */
>>>>>>> /* { dg-final { scan-assembler-not "\tbne\t" } } */
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -f (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +f (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (; i & 1; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> }
>>>>>>>
>>>>>>> -NOMIPS16 int
>>>>>>> -g (int n, int i)
>>>>>>> +NOMIPS16 long int
>>>>>>> +g (long int n, long int i)
>>>>>>> {
>>>>>>> -  int s = 0;
>>>>>>> +  long int s = 0;
>>>>>>>      for (i = 0; i < n; i++)
>>>>>>>        s += i;
>>>>>>>      return s;
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-4.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-4.c (revision 0)
>>>>>>> @@ -0,0 +1,16 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +unsigned char f(unsigned int a, int c)
>>>>>>> +{
>>>>>>> +  unsigned int b = a;
>>>>>>> +  if (c)
>>>>>>> +    b = a & 0x10ff;
>>>>>>> +  return b;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "and:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ removed" "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-1.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-1.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short s, int c, int *z)
>>>>>>> +{
>>>>>>> +  if (c)
>>>>>>> +    *z = 0;
>>>>>>> +  *p ^= (unsigned char)s;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-5.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-5.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +
>>>>>>> +void f (short d[2][2])
>>>>>>> +{
>>>>>>> +  int d0 = d[0][0] + d[0][1];
>>>>>>> +  int d1 = d[1][0] + d[1][1];
>>>>>>> +  d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee" } */
>>>>>>> +/* { dg-require-effective-target ilp32 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-2-64.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-2-64.c (revision 0)
>>>>>>> @@ -0,0 +1,20 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +void f(unsigned char * p, short *s, int c)
>>>>>>> +{
>>>>>>> +  short or = 0;
>>>>>>> +  while (c)
>>>>>>> +    {
>>>>>>> +      or = or | s[c];
>>>>>>> +      c --;
>>>>>>> +    }
>>>>>>> +  *p = (unsigned char)or;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "sign_extend:" 1 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump-times "redundant extension \[0-9\]+ replaced" 2 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/testsuite/gcc.dg/extend-3.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/testsuite/gcc.dg/extend-3.c (revision 0)
>>>>>>> @@ -0,0 +1,13 @@
>>>>>>> +/* { dg-do compile } */
>>>>>>> +/* { dg-options "-O2 -fdump-rtl-ee -mabi=64" } */
>>>>>>> +/* { dg-require-effective-target mips64 } */
>>>>>>> +
>>>>>>> +unsigned int f(unsigned char byte)
>>>>>>> +{
>>>>>>> +  return byte << 25;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* { dg-final { scan-rtl-dump-times "zero_extend:" 0 "ee" { target mips*-*-* } } } */
>>>>>>> +/* { dg-final { scan-rtl-dump "redundant extension \[0-9\]+ replaced" "ee" } } */
>>>>>>> +/* { dg-final { cleanup-rtl-dump "ee" } } */
>>>>>>> +
>>>>>>> Index: gcc/opts.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/opts.c (revision 189409)
>>>>>>> +++ gcc/opts.c (working copy)
>>>>>>> @@ -490,6 +490,7 @@ static const struct default_options defa
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_ftree_tail_merge, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS_SPEED_ONLY, OPT_foptimize_strlen, NULL, 1 },
>>>>>>>        { OPT_LEVELS_2_PLUS, OPT_fhoist_adjacent_loads, NULL, 1 },
>>>>>>> +    { OPT_LEVELS_2_PLUS, OPT_fextension_elimination, NULL, 1 },
>>>>>>>
>>>>>>>        /* -O3 optimizations.  */
>>>>>>>        { OPT_LEVELS_3_PLUS, OPT_ftree_loop_distribute_patterns, NULL, 1 },
>>>>>>> Index: gcc/timevar.def
>>>>>>> ===================================================================
>>>>>>> --- gcc/timevar.def (revision 189409)
>>>>>>> +++ gcc/timevar.def (working copy)
>>>>>>> @@ -201,6 +201,7 @@ DEFTIMEVAR (TV_POST_EXPAND	     , "post
>>>>>>> DEFTIMEVAR (TV_VARCONST              , "varconst")
>>>>>>> DEFTIMEVAR (TV_LOWER_SUBREG	     , "lower subreg")
>>>>>>> DEFTIMEVAR (TV_JUMP                  , "jump")
>>>>>>> +DEFTIMEVAR (TV_EE                    , "extension elimination")
>>>>>>> DEFTIMEVAR (TV_FWPROP                , "forward prop")
>>>>>>> DEFTIMEVAR (TV_CSE                   , "CSE")
>>>>>>> DEFTIMEVAR (TV_DCE                   , "dead code elimination")
>>>>>>> Index: gcc/ee.c
>>>>>>> ===================================================================
>>>>>>> --- /dev/null (new file)
>>>>>>> +++ gcc/ee.c (revision 0)
>>>>>>> @@ -0,0 +1,1190 @@
>>>>>>> +/* Redundant extension elimination.
>>>>>>> +   Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc.
>>>>>>> +   Contributed by Tom de Vries (tom@codesourcery.com)
>>>>>>> +
>>>>>>> +This file is part of GCC.
>>>>>>> +
>>>>>>> +GCC is free software; you can redistribute it and/or modify it under
>>>>>>> +the terms of the GNU General Public License as published by the Free
>>>>>>> +Software Foundation; either version 3, or (at your option) any later
>>>>>>> +version.
>>>>>>> +
>>>>>>> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY
>>>>>>> +WARRANTY; without even the implied warranty of MERCHANTABILITY or
>>>>>>> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
>>>>>>> +for more details.
>>>>>>> +
>>>>>>> +You should have received a copy of the GNU General Public License
>>>>>>> +along with GCC; see the file COPYING3.  If not see
>>>>>>> +<http://www.gnu.org/licenses/>.  */
>>>>>>> +
>>>>>>> +/*
>>>>>>> +
>>>>>>> +  MOTIVATING EXAMPLE
>>>>>>> +
>>>>>>> +  The motivating example for this pass is the example from PR 40893:
>>>>>>> +
>>>>>>> +    void f (short d[2][2])
>>>>>>> +    {
>>>>>>> +      int d0 = d[0][0] + d[0][1];
>>>>>>> +      int d1 = d[1][0] + d[1][1];
>>>>>>> +      d[0][0] = d0 + d1;
>>>>>>> +      d[0][1] = d0 - d1;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  For MIPS, compilation results in the following insns.
>>>>>>> +
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 213) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +         (zero_extend:SI (subreg:HI (reg:SI 216 [ d1 ]) 2)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 217)
>>>>>>> +         (plus:SI (reg:SI 205)
>>>>>>> +                  (reg:SI 204)))
>>>>>>> +
>>>>>>> +    (set (reg:SI 218)
>>>>>>> +         (minus:SI (reg:SI 204)
>>>>>>> +                   (reg:SI 205)))
>>>>>>> +
>>>>>>> +    (set (mem:HI (reg/v/f:SI 210))
>>>>>>> +         (subreg:HI (reg:SI 217) 2))
>>>>>>> +
>>>>>>> +    (set (mem:HI (plus:SI (reg/v/f:SI 210)
>>>>>>> +                 (const_int 2 [0x2])))
>>>>>>> +         (subreg:HI (reg:SI 218) 2))
>>>>>>> +
>>>>>>> +
>>>>>>> +  The pseudos 217 and 218 only use the lower half of pseudos 217 and 218, and
>>>>>>> +  are the only uses.  And the plus and minus operators belong to the class of
>>>>>>> +  operators where a bit in the result is only influenced by same-or-less
>>>>>>> +  significant bitss in the operands, so the plus and minus insns only use the
>>>>>>> +  lower halves of pseudos 204 and 205.  Those are also the only uses of pseudos
>>>>>>> +  204 and 205, so the zero_extends are redundant.
>>>>>>> +
>>>>>>> +
>>>>>>> +  INTENDED EFFECT
>>>>>>> +
>>>>>>> +  This pass works by removing sign/zero-extensions, or replacing them with
>>>>>>> +  regcopies.  The idea there is that the regcopy might be eliminated by a later
>>>>>>> +  pass.  In case the regcopy cannot be eliminated, it might at least be cheaper
>>>>>>> +  than the extension.
>>>>>>> +
>>>>>>> +
>>>>>>> +  IMPLEMENTATION
>>>>>>> +
>>>>>>> +  The pass scans at most two times over all instructions.
>>>>>>> +
>>>>>>> +  The first scan collects all extensions.  If there are no extensions, we're
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  The second scan registers all uses of a reg in the biggest_use array.
>>>>>>> +  Additionally, it registers how the use size of a pseudo is propagated to the
>>>>>>> +  operands of the insns defining the pseudo.
>>>>>>> +
>>>>>>> +  The biggest_use array now contains the size in bits of the biggest use
>>>>>>> +  of each reg, which allows us to find redundant extensions.
>>>>>>> +
>>>>>>> +  If there are still non-redundant extensions left, we use the propagation
>>>>>>> +  information in an iterative fashion to improve the biggest_use array, after
>>>>>>> +  which we may find more redundant extensions.
>>>>>>> +
>>>>>>> +  Finally, redundant extensions are deleted or replaced.
>>>>>>> +
>>>>>>> +  In case that the src and dest reg of the replacement are not of the same size,
>>>>>>> +  we do not replace with a normal regcopy, but with a truncate or with the copy
>>>>>>> +  of a paradoxical subreg instead.
>>>>>>> +
>>>>>>> +
>>>>>>> +  ILLUSTRATION OF PASS
>>>>>>> +
>>>>>>> +  The dump of the pass shows us how the pass works on the motivating example.
>>>>>>> +
>>>>>>> +  We find the 2 extensions:
>>>>>>> +    found extension with preserved size 16 defining reg 204
>>>>>>> +    found extension with preserved size 16 defining reg 205
>>>>>>> +
>>>>>>> +  We calculate the biggests uses of a register:
>>>>>>> +    biggest_use
>>>>>>> +    reg 204: size 32
>>>>>>> +    reg 205: size 32
>>>>>>> +    reg 217: size 16
>>>>>>> +    reg 218: size 16
>>>>>>> +
>>>>>>> +  We propagate the biggest uses where possible:
>>>>>>> +    propagations
>>>>>>> +    205: 32 -> 16
>>>>>>> +    204: 32 -> 16
>>>>>>> +    214: 32 -> 16
>>>>>>> +    215: 32 -> 16
>>>>>>> +
>>>>>>> +  We conclude that the extensions are redundant:
>>>>>>> +    found redundant extension with preserved size 16 defining reg 205
>>>>>>> +    found redundant extension with preserved size 16 defining reg 204
>>>>>>> +
>>>>>>> +  And we replace them with regcopies:
>>>>>>> +    (set (reg:SI 204)
>>>>>>> +        (reg:SI 213))
>>>>>>> +
>>>>>>> +    (set (reg:SI 205)
>>>>>>> +        (reg:SI 216))
>>>>>>> +
>>>>>>> +
>>>>>>> +  LIMITATIONS
>>>>>>> +
>>>>>>> +  The scope of the analysis is limited to an extension and its uses.  The other
>>>>>>> +  type of analysis (related to the defs of the operand of an extension) is not
>>>>>>> +  done.
>>>>>>> +
>>>>>>> +  Furthermore, we do the analysis of biggest use per reg.  So when determining
>>>>>>> +  whether an extension is redundant, we take all uses of a dest reg into
>>>>>>> +  account, also the ones that are not uses of the extension.
>>>>>>> +  The consideration is that using use-def chains will give a more precise
>>>>>>> +  analysis, but is much more expensive in terms of runtime.  */
>>>>>>> +
>>>>>>> +#include "config.h"
>>>>>>> +#include "system.h"
>>>>>>> +#include "coretypes.h"
>>>>>>> +#include "tm.h"
>>>>>>> +#include "rtl.h"
>>>>>>> +#include "tree.h"
>>>>>>> +#include "tm_p.h"
>>>>>>> +#include "flags.h"
>>>>>>> +#include "regs.h"
>>>>>>> +#include "hard-reg-set.h"
>>>>>>> +#include "basic-block.h"
>>>>>>> +#include "insn-config.h"
>>>>>>> +#include "function.h"
>>>>>>> +#include "expr.h"
>>>>>>> +#include "insn-attr.h"
>>>>>>> +#include "recog.h"
>>>>>>> +#include "toplev.h"
>>>>>>> +#include "target.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "optabs.h"
>>>>>>> +#include "insn-codes.h"
>>>>>>> +#include "rtlhooks-def.h"
>>>>>>> +#include "output.h"
>>>>>>> +#include "params.h"
>>>>>>> +#include "timevar.h"
>>>>>>> +#include "tree-pass.h"
>>>>>>> +#include "cgraph.h"
>>>>>>> +#include "vec.h"
>>>>>>> +
>>>>>>> +#define SKIP_REG (-1)
>>>>>>> +#define NONE (-1)
>>>>>>> +
>>>>>>> +/* Number of registers at start of pass.  */
>>>>>>> +
>>>>>>> +static int n_regs;
>>>>>>> +
>>>>>>> +/* Array to register the biggest use of a reg, in bits.  */
>>>>>>> +
>>>>>>> +static int *biggest_use;
>>>>>>> +
>>>>>>> +/* Array to register the promoted subregs.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) **promoted_subreg;
>>>>>>> +
>>>>>>> +/* Array to register for a reg what the last propagated size is.  */
>>>>>>> +
>>>>>>> +static int *propagated_size;
>>>>>>> +
>>>>>>> +typedef struct use
>>>>>>> +{
>>>>>>> +  int regno;
>>>>>>> +  int size;
>>>>>>> +  int offset;
>>>>>>> +  rtx *use;
>>>>>>> +} use_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(use_type);
>>>>>>> +DEF_VEC_ALLOC_O(use_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the uses.  */
>>>>>>> +
>>>>>>> +static VEC (use_type,heap) **uses;
>>>>>>> +
>>>>>>> +typedef struct prop
>>>>>>> +{
>>>>>>> +  rtx set;
>>>>>>> +  int uses_regno;
>>>>>>> +  int uses_index;
>>>>>>> +} prop_type;
>>>>>>> +
>>>>>>> +DEF_VEC_O(prop_type);
>>>>>>> +DEF_VEC_ALLOC_O(prop_type,heap);
>>>>>>> +
>>>>>>> +/* Vector to register the propagations.  */
>>>>>>> +
>>>>>>> +static VEC (prop_type,heap) **props;
>>>>>>> +
>>>>>>> +/* Work list for propragation.  */
>>>>>>> +
>>>>>>> +static VEC (int,heap) *wl;
>>>>>>> +
>>>>>>> +/* Array to register what regs are in the work list.  */
>>>>>>> +
>>>>>>> +static bool *in_wl;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *extensions;
>>>>>>> +
>>>>>>> +/* Vector that contains the extensions in the function that are going to be
>>>>>>> +   removed or replaced.  */
>>>>>>> +
>>>>>>> +static VEC (rtx,heap) *redundant_extensions;
>>>>>>> +
>>>>>>> +/* Forward declaration.  */
>>>>>>> +
>>>>>>> +static void note_use (rtx *x, void *data);
>>>>>>> +static bool skip_reg_p (int regno);
>>>>>>> +static void register_prop (rtx set, use_type *use);
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  return (GET_CODE (subreg) == SUBREG
>>>>>>> +	  && SUBREG_PROMOTED_VAR_P (subreg));
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether SUBREG is a promoted subreg for which we cannot reset the
>>>>>>> +   promotion.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +fixed_promoted_subreg_p (rtx subreg)
>>>>>>> +{
>>>>>>> +  int mre;
>>>>>>> +
>>>>>>> +  if (!promoted_subreg_p (subreg))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  mre = targetm.mode_rep_extended (GET_MODE (subreg),
>>>>>>> +				   GET_MODE (SUBREG_REG (subreg)));
>>>>>>> +  return mre != UNKNOWN;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Attempt to return the size, reg number and offset of USE in SIZE, REGNO and
>>>>>>> +   OFFSET.  Return true if successful.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +reg_use_p (rtx use, int *size, unsigned int *regno, int *offset)
>>>>>>> +{
>>>>>>> +  rtx reg;
>>>>>>> +
>>>>>>> +  if (REG_P (use))
>>>>>>> +    {
>>>>>>> +      *regno = REGNO (use);
>>>>>>> +      *offset = 0;
>>>>>>> +      *size = GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +  else if (GET_CODE (use) == SUBREG)
>>>>>>> +    {
>>>>>>> +      reg = SUBREG_REG (use);
>>>>>>> +
>>>>>>> +      if (!REG_P (reg))
>>>>>>> +	return false;
>>>>>>> +
>>>>>>> +      *regno = REGNO (reg);
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (use) || fixed_promoted_subreg_p (use))
>>>>>>> +	{
>>>>>>> +	  *offset = 0;
>>>>>>> +	  *size = GET_MODE_BITSIZE (GET_MODE (reg));
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  *offset = subreg_lsb (use);
>>>>>>> +	  *size = *offset + GET_MODE_BITSIZE (GET_MODE (use));
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      return true;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Create a new empty entry in the uses[REGNO] vector.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +new_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  if (uses[regno] == NULL)
>>>>>>> +    uses[regno] = VEC_alloc (use_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (use_type, heap, uses[regno], NULL);
>>>>>>> +
>>>>>>> +  return VEC_last (use_type, uses[regno]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a USE of reg REGNO with SIZE and OFFSET.  */
>>>>>>> +
>>>>>>> +static use_type *
>>>>>>> +register_use (int size, unsigned int regno, int offset, rtx *use)
>>>>>>> +{
>>>>>>> +  int *current;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (size >= 0);
>>>>>>> +  gcc_assert (regno < (unsigned int)n_regs);
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return NULL;
>>>>>>> +
>>>>>>> +  p = new_use (regno);
>>>>>>> +  p->regno = regno;
>>>>>>> +  p->size = size;
>>>>>>> +  p->offset = offset;
>>>>>>> +  p->use = use;
>>>>>>> +
>>>>>>> +  /* Update the bigest use.  */
>>>>>>> +  current = &biggest_use[regno];
>>>>>>> +  *current = MAX (*current, size);
>>>>>>> +
>>>>>>> +  return p;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle embedded uses in USE, which is a part of PATTERN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_embedded_uses (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  const char *format_ptr;
>>>>>>> +  int i, j;
>>>>>>> +
>>>>>>> +  format_ptr = GET_RTX_FORMAT (GET_CODE (use));
>>>>>>> +  for (i = 0; i < GET_RTX_LENGTH (GET_CODE (use)); i++)
>>>>>>> +    if (format_ptr[i] == 'e')
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (format_ptr[i] == 'E')
>>>>>>> +      for (j = 0; j < XVECLEN (use, i); j++)
>>>>>>> +	note_use (&XVECEXP (use, i, j), pattern);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the set in PATTERN that has USE as its src operand.  */
>>>>>>> +
>>>>>>> +static rtx
>>>>>>> +get_set (rtx use, rtx pattern)
>>>>>>> +{
>>>>>>> +  rtx sub;
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == SET && SET_SRC (pattern) == use)
>>>>>>> +    return pattern;
>>>>>>> +
>>>>>>> +  if (GET_CODE (pattern) == PARALLEL)
>>>>>>> +    for (i = 0; i < XVECLEN (pattern, 0); ++i)
>>>>>>> +      {
>>>>>>> +	sub = XVECEXP (pattern, 0, i);
>>>>>>> +	if (GET_CODE (sub) == SET && SET_SRC (sub) == use)
>>>>>>> +	  return sub;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  return NULL_RTX;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle a restricted op USE with NR_OPERANDS.  USE is a part of SET, which is
>>>>>>> +   a part of PATTERN.  In this context restricted means that a bit in
>>>>>>> +   an operand influences only the same bit or more significant bits in the
>>>>>>> +   result.  The bitwise ops are a subclass, but PLUS is one as well.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_restricted_op_use (rtx set, rtx use, unsigned int nr_operands, rtx pattern)
>>>>>>> +{
>>>>>>> +  unsigned int i, smallest;
>>>>>>> +  int operand_size[2];
>>>>>>> +  int operand_offset[2];
>>>>>>> +  int used_size;
>>>>>>> +  unsigned int operand_regno[2];
>>>>>>> +  bool operand_reg[2];
>>>>>>> +  bool operand_ignore[2];
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  /* Init operand_reg, operand_size, operand_regno and operand_ignore.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    {
>>>>>>> +      operand_reg[i] = reg_use_p (XEXP (use, i), &operand_size[i],
>>>>>>> +				  &operand_regno[i], &operand_offset[i]);
>>>>>>> +      operand_ignore[i] = false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg and-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == AND && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Handle case of reg or-masked with const.  */
>>>>>>> +  if (GET_CODE (use) == IOR && CONST_INT_P (XEXP (use, 1)) && operand_reg[0])
>>>>>>> +    {
>>>>>>> +      used_size =
>>>>>>> +	HOST_BITS_PER_WIDE_INT - clz_hwi (~UINTVAL (XEXP (use, 1)));
>>>>>>> +      operand_size[0] = MIN (operand_size[0], used_size);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Ignore the use of a in 'a = a + b'.  */
>>>>>>> +  /* TODO: handle GET_CODE ((SET_DEST (set))) == SUBREG.  */
>>>>>>> +  if (set != NULL_RTX && REG_P (SET_DEST (set)))
>>>>>>> +    for (i = 0; i < nr_operands; ++i)
>>>>>>> +      operand_ignore[i] = (operand_reg[i]
>>>>>>> +			   && (REGNO (SET_DEST (set)) == operand_regno[i]));
>>>>>>> +
>>>>>>> +  /* Handle the case a reg is combined with don't care bits.  */
>>>>>>> +  if (nr_operands == 2 && operand_reg[0] && operand_reg[1]
>>>>>>> +      && operand_size[0] != operand_size[1])
>>>>>>> +    {
>>>>>>> +      smallest = operand_size[0] > operand_size[1];
>>>>>>> +
>>>>>>> +      if (paradoxical_subreg_p (XEXP (use, smallest)))
>>>>>>> +	operand_size[1 - smallest] = operand_size[smallest];
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Register the operand use, if necessary.  */
>>>>>>> +  for (i = 0; i < nr_operands; ++i)
>>>>>>> +    if (!operand_reg[i])
>>>>>>> +      note_use (&XEXP (use, i), pattern);
>>>>>>> +    else if (!operand_ignore[i])
>>>>>>> +      {
>>>>>>> +	p = register_use (operand_size[i], operand_regno[i], operand_offset[i],
>>>>>>> +			  &XEXP (use, i));
>>>>>>> +	register_prop (set, p);
>>>>>>> +      }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register promoted SUBREG in promoted_subreg.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_promoted_subreg (rtx subreg)
>>>>>>> +{
>>>>>>> +  int index = REGNO (SUBREG_REG (subreg));
>>>>>>> +
>>>>>>> +  if (promoted_subreg[index] == NULL)
>>>>>>> +    promoted_subreg[index] = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  VEC_safe_push (rtx, heap, promoted_subreg[index], subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note promoted subregs in X.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +note_promoted_subreg (rtx *x, void *y ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +  rtx subreg = *x;
>>>>>>> +
>>>>>>> +  if (promoted_subreg_p (subreg) && !fixed_promoted_subreg_p (subreg)
>>>>>>> +      && REG_P (SUBREG_REG (subreg)))
>>>>>>> +    register_promoted_subreg (subreg);
>>>>>>> +
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Handle use X in pattern DATA noted by note_uses.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_use (rtx *x, void *data)
>>>>>>> +{
>>>>>>> +  rtx use = *x;
>>>>>>> +  rtx pattern = (rtx)data;
>>>>>>> +  int use_size, use_offset;
>>>>>>> +  unsigned int use_regno;
>>>>>>> +  rtx set;
>>>>>>> +  use_type *p;
>>>>>>> +
>>>>>>> +  for_each_rtx (x, note_promoted_subreg, NULL);
>>>>>>> +
>>>>>>> +  set = get_set (use, pattern);
>>>>>>> +
>>>>>>> +  switch (GET_CODE (use))
>>>>>>> +    {
>>>>>>> +    case REG:
>>>>>>> +    case SUBREG:
>>>>>>> +      if (!reg_use_p (use, &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case SIGN_EXTEND:
>>>>>>> +    case ZERO_EXTEND:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      p = register_use (use_size, use_regno, use_offset, x);
>>>>>>> +      register_prop (set, p);
>>>>>>> +      return;
>>>>>>> +    case IOR:
>>>>>>> +    case AND:
>>>>>>> +    case XOR:
>>>>>>> +    case PLUS:
>>>>>>> +    case MINUS:
>>>>>>> +      note_restricted_op_use (set, use, 2, pattern);
>>>>>>> +      return;
>>>>>>> +    case NOT:
>>>>>>> +    case NEG:
>>>>>>> +      note_restricted_op_use (set, use, 1, pattern);
>>>>>>> +      return;
>>>>>>> +    case ASHIFT:
>>>>>>> +      if (!reg_use_p (XEXP (use, 0), &use_size, &use_regno, &use_offset)
>>>>>>> +	  || !CONST_INT_P (XEXP (use, 1))
>>>>>>> +	  || INTVAL (XEXP (use, 1)) <= 0
>>>>>>> +	  || paradoxical_subreg_p (XEXP (use, 0)))
>>>>>>> +	{
>>>>>>> +	  note_embedded_uses (use, pattern);
>>>>>>> +	  return;
>>>>>>> +	}
>>>>>>> +      (void)register_use (use_size - INTVAL (XEXP (use, 1)), use_regno,
>>>>>>> +			  use_offset, x);
>>>>>>> +      return;
>>>>>>> +    default:
>>>>>>> +      note_embedded_uses (use, pattern);
>>>>>>> +      return;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO is implicitly used.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +implicit_use_p (int regno ATTRIBUTE_UNUSED)
>>>>>>> +{
>>>>>>> +#ifdef EPILOGUE_USES
>>>>>>> +  if (EPILOGUE_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +#ifdef EH_USES
>>>>>>> +  if (EH_USES (regno))
>>>>>>> +    return true;
>>>>>>> +#endif
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether reg REGNO should be skipped in analysis.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +skip_reg_p (int regno)
>>>>>>> +{
>>>>>>> +  /* TODO: handle hard registers.  The problem with hard registers is that
>>>>>>> +     a DI use of r0 can mean a 64bit use of r0 and a 32 bit use of r1.
>>>>>>> +     We don't handle that properly.  */
>>>>>>> +  return implicit_use_p (regno) || HARD_REGISTER_NUM_P (regno);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Note the uses of argument registers in call INSN.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +note_call_uses (rtx insn)
>>>>>>> +{
>>>>>>> +  rtx link, link_expr;
>>>>>>> +
>>>>>>> +  if (!CALL_P (insn))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  for (link = CALL_INSN_FUNCTION_USAGE (insn); link; link = XEXP (link, 1))
>>>>>>> +    {
>>>>>>> +      link_expr = XEXP (link, 0);
>>>>>>> +
>>>>>>> +      if (GET_CODE (link_expr) == USE)
>>>>>>> +	note_use (&XEXP (link_expr, 0), link);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Dump the biggest uses found.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +dump_biggest_use (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  if (!dump_file)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "biggest_use:\n");
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    if (biggest_use[i] > 0)
>>>>>>> +      fprintf (dump_file, "reg %d: size %d\n", i, biggest_use[i]);
>>>>>>> +
>>>>>>> +  fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Calculate the biggest use mode for all regs.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +calculate_biggest_use (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	note_uses (&PATTERN (insn), note_use, PATTERN (insn));
>>>>>>> +
>>>>>>> +	if (CALL_P (insn))
>>>>>>> +	  note_call_uses (insn);
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  dump_biggest_use ();
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Register a propagation USE in SET in the props vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +register_prop (rtx set, use_type *use)
>>>>>>> +{
>>>>>>> +  prop_type *p;
>>>>>>> +  int regno;
>>>>>>> +
>>>>>>> +  if (set == NULL_RTX || use == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (!REG_P (SET_DEST (set)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  regno = REGNO (SET_DEST (set));
>>>>>>> +
>>>>>>> +  if (skip_reg_p (regno))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (props[regno] == NULL)
>>>>>>> +    props[regno] = VEC_alloc (prop_type, heap, 4);
>>>>>>> +
>>>>>>> +  VEC_safe_push (prop_type, heap, props[regno], NULL);
>>>>>>> +  p = VEC_last (prop_type, props[regno]);
>>>>>>> +  p->set = set;
>>>>>>> +  p->uses_regno = use->regno;
>>>>>>> +  p->uses_index = VEC_length (use_type, uses[use->regno]) - 1;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Add REGNO to the worklist.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +add_to_wl (int regno)
>>>>>>> +{
>>>>>>> +  if (in_wl[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (biggest_use[regno] > 0
>>>>>>> +      && biggest_use[regno] == GET_MODE_BITSIZE (PSEUDO_REGNO_MODE (regno)))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (VEC_empty (prop_type, props[regno]))
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  if (propagated_size[regno] != NONE
>>>>>>> +      && propagated_size[regno] == biggest_use[regno])
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  VEC_safe_push (int, heap, wl, regno);
>>>>>>> +  in_wl[regno] = true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Pop a reg from the worklist and return it.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +pop_wl (void)
>>>>>>> +{
>>>>>>> +  int regno = VEC_pop (int, wl);
>>>>>>> +  in_wl[regno] = false;
>>>>>>> +  return regno;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to use P.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +propagate_size (int dest_size, use_type *p)
>>>>>>> +{
>>>>>>> +  if (dest_size == 0)
>>>>>>> +    return 0;
>>>>>>> +
>>>>>>> +  return p->offset + MIN (p->size - p->offset, dest_size);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Get the biggest use of REGNO from the uses vector.  */
>>>>>>> +
>>>>>>> +static int
>>>>>>> +get_biggest_use (unsigned int regno)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  use_type *p;
>>>>>>> +  int max = 0;
>>>>>>> +
>>>>>>> +  gcc_assert (uses[regno] != NULL);
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (use_type, uses[regno], ix, p)
>>>>>>> +    max = MAX (max, p->size);
>>>>>>> +
>>>>>>> +  return max;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the use size DEST_SIZE of a reg to the uses in USE.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_use (int dest_size, use_type *use)
>>>>>>> +{
>>>>>>> +  int new_use_size;
>>>>>>> +  int prev_biggest_use;
>>>>>>> +  int *current;
>>>>>>> +
>>>>>>> +  new_use_size = propagate_size (dest_size, use);
>>>>>>> +
>>>>>>> +  if (new_use_size >= use->size)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  use->size = new_use_size;
>>>>>>> +
>>>>>>> +  current = &biggest_use[use->regno];
>>>>>>> +
>>>>>>> +  prev_biggest_use = *current;
>>>>>>> +  *current = get_biggest_use (use->regno);
>>>>>>> +
>>>>>>> +  if (*current >= prev_biggest_use)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  add_to_wl (use->regno);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "%d: %d -> %d\n", use->regno, prev_biggest_use,
>>>>>>> +	     *current);
>>>>>>> +
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Propagate the biggest use of a reg REGNO to all its uses, and note
>>>>>>> +   propagations in NR_PROPAGATIONS.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate_to_uses (int regno, int *nr_propagations)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  prop_type *p;
>>>>>>> +
>>>>>>> +  gcc_assert (!(propagated_size[regno] == NONE
>>>>>>> +		&& propagated_size[regno] == biggest_use[regno]));
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (prop_type, props[regno], ix, p)
>>>>>>> +    {
>>>>>>> +      use_type *use = VEC_index (use_type, uses[p->uses_regno], p->uses_index);
>>>>>>> +      propagate_to_use (biggest_use[regno], use);
>>>>>>> +      ++(*nr_propagations);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  propagated_size[regno] = biggest_use[regno];
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Improve biggest_use array iteratively.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +propagate (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +  int nr_propagations = 0;
>>>>>>> +
>>>>>>> +  /* Initialize work list.  */
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    add_to_wl (i);
>>>>>>> +
>>>>>>> +  /* Work the work list.  */
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "propagations: \n");
>>>>>>> +  while (!VEC_empty (int, wl))
>>>>>>> +    propagate_to_uses (pop_wl (), &nr_propagations);
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\nnr_propagations: %d\n\n", nr_propagations);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  rtx src, op0;
>>>>>>> +
>>>>>>> +  /* Detect set of reg.  */
>>>>>>> +  if (GET_CODE (PATTERN (insn)) != SET)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  src = SET_SRC (PATTERN (insn));
>>>>>>> +  *dest = SET_DEST (PATTERN (insn));
>>>>>>> +
>>>>>>> +  if (!REG_P (*dest))
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  /* Detect sign or zero extension.  */
>>>>>>> +  if (GET_CODE (src) == ZERO_EXTEND || GET_CODE (src) == SIGN_EXTEND
>>>>>>> +      || (GET_CODE (src) == AND && CONST_INT_P (XEXP (src, 1))))
>>>>>>> +    {
>>>>>>> +      op0 = XEXP (src, 0);
>>>>>>> +
>>>>>>> +      /* Determine amount of least significant bits preserved by operation.  */
>>>>>>> +      if (GET_CODE (src) == AND)
>>>>>>> +	*preserved_size = ctz_hwi (~UINTVAL (XEXP (src, 1)));
>>>>>>> +      else
>>>>>>> +	*preserved_size = GET_MODE_BITSIZE (GET_MODE (op0));
>>>>>>> +
>>>>>>> +      if (GET_CODE (op0) == SUBREG)
>>>>>>> +	{
>>>>>>> +	  if (subreg_lsb (op0) != 0)
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  *inner = SUBREG_REG (op0);
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (REG_P (op0))
>>>>>>> +	{
>>>>>>> +	  *inner = op0;
>>>>>>> +
>>>>>>> +	  if (GET_MODE_CLASS (GET_MODE (*dest))
>>>>>>> +	      != GET_MODE_CLASS (GET_MODE (*inner)))
>>>>>>> +	    return false;
>>>>>>> +
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +      else if (GET_CODE (op0) == TRUNCATE)
>>>>>>> +	{
>>>>>>> +	  *inner = XEXP (op0, 0);
>>>>>>> +	  return true;
>>>>>>> +	}
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return false;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find extensions and store them in the extensions vector.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +find_extensions (void)
>>>>>>> +{
>>>>>>> +  basic_block bb;
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  /* For all insns, call note_use for each use in insn.  */
>>>>>>> +  FOR_EACH_BB (bb)
>>>>>>> +    FOR_BB_INSNS (bb, insn)
>>>>>>> +      {
>>>>>>> +	if (!NONDEBUG_INSN_P (insn))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	if (!extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +	  continue;
>>>>>>> +
>>>>>>> +	VEC_safe_push (rtx, heap, extensions, insn);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found extension %u with preserved size %d defining"
>>>>>>> +		   " reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      if (!VEC_empty (rtx, extensions))
>>>>>>> +	fprintf (dump_file, "\n");
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "no extensions found.\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  return !VEC_empty (rtx, extensions);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Check whether this is a redundant sign/zero extension.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +redundant_extension_p (rtx insn, rtx *dest, rtx *inner, int *preserved_size)
>>>>>>> +{
>>>>>>> +  int biggest_dest_use;
>>>>>>> +
>>>>>>> +  if (!extension_p (insn, dest, inner, preserved_size))
>>>>>>> +    gcc_unreachable ();
>>>>>>> +
>>>>>>> +  biggest_dest_use = biggest_use[REGNO (*dest)];
>>>>>>> +
>>>>>>> +  if (biggest_dest_use == SKIP_REG)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  if (*preserved_size < biggest_dest_use)
>>>>>>> +    return false;
>>>>>>> +
>>>>>>> +  return true;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find the redundant extensions in the extensions vector and move them to the
>>>>>>> +   redundant_extensions vector.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +find_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int ix;
>>>>>>> +  bool found = false;
>>>>>>> +  int preserved_size;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT_REVERSE (rtx, extensions, ix, insn)
>>>>>>> +    if (redundant_extension_p (insn, &dest, &inner, &preserved_size))
>>>>>>> +      {
>>>>>>> +	VEC_safe_push (rtx, heap, redundant_extensions, insn);
>>>>>>> +	VEC_unordered_remove (rtx, extensions, ix);
>>>>>>> +
>>>>>>> +	if (dump_file)
>>>>>>> +	  fprintf (dump_file,
>>>>>>> +		   "found redundant extension %u with preserved size %d"
>>>>>>> +		   " defining reg %d\n",
>>>>>>> +		   INSN_UID (insn), preserved_size, REGNO (dest));
>>>>>>> +	found = true;
>>>>>>> +      }
>>>>>>> +
>>>>>>> +  if (dump_file && found)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Reset promotion of subregs or REG.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +reset_promoted_subreg (rtx reg)
>>>>>>> +{
>>>>>>> +  int ix;
>>>>>>> +  rtx subreg;
>>>>>>> +
>>>>>>> +  if (promoted_subreg[REGNO (reg)] == NULL)
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, promoted_subreg[REGNO (reg)], ix, subreg)
>>>>>>> +    {
>>>>>>> +      SUBREG_PROMOTED_UNSIGNED_SET (subreg, 0);
>>>>>>> +      SUBREG_PROMOTED_VAR_P (subreg) = 0;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, promoted_subreg[REGNO (reg)]);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Try to remove or replace the redundant extension INSN which extends INNER and
>>>>>>> +   writes to DEST.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +try_remove_or_replace_extension (rtx insn, rtx dest, rtx inner)
>>>>>>> +{
>>>>>>> +  rtx cp_src, cp_dest, seq = NULL_RTX, one;
>>>>>>> +
>>>>>>> +  /* Check whether replacement is needed.  */
>>>>>>> +  if (dest != inner)
>>>>>>> +    {
>>>>>>> +      start_sequence ();
>>>>>>> +
>>>>>>> +      /* Determine the proper replacement operation.  */
>>>>>>> +      if (GET_MODE (dest) == GET_MODE (inner))
>>>>>>> +	{
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +      else if (GET_MODE_SIZE (GET_MODE (dest))
>>>>>>> +	       > GET_MODE_SIZE (GET_MODE (inner)))
>>>>>>> +	{
>>>>>>> +	  emit_clobber (dest);
>>>>>>> +	  cp_src = inner;
>>>>>>> +	  cp_dest = gen_lowpart_SUBREG (GET_MODE (inner), dest);
>>>>>>> +	}
>>>>>>> +      else
>>>>>>> +	{
>>>>>>> +	  cp_src = gen_lowpart_SUBREG (GET_MODE (dest), inner);
>>>>>>> +	  cp_dest = dest;
>>>>>>> +	}
>>>>>>> +
>>>>>>> +      emit_move_insn (cp_dest, cp_src);
>>>>>>> +
>>>>>>> +      seq = get_insns ();
>>>>>>> +      end_sequence ();
>>>>>>> +
>>>>>>> +      /* If the replacement is not supported, bail out.  */
>>>>>>> +      for (one = seq; one != NULL_RTX; one = NEXT_INSN (one))
>>>>>>> +	if (recog_memoized (one) < 0 && GET_CODE (PATTERN (one)) != CLOBBER)
>>>>>>> +	  return;
>>>>>>> +
>>>>>>> +      /* Insert the replacement.  */
>>>>>>> +      emit_insn_before (seq, insn);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Note replacement/removal in the dump.  */
>>>>>>> +  if (dump_file)
>>>>>>> +    {
>>>>>>> +      fprintf (dump_file, "redundant extension %u ", INSN_UID (insn));
>>>>>>> +      if (dest != inner)
>>>>>>> +	fprintf (dump_file, "replaced by %u\n", INSN_UID (seq));
>>>>>>> +      else
>>>>>>> +	fprintf (dump_file, "removed\n");
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  /* Remove the extension.  */
>>>>>>> +  delete_insn (insn);
>>>>>>> +
>>>>>>> +  reset_promoted_subreg (dest);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Setup the variables at the start of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +init_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  biggest_use = XNEWVEC (int, n_regs);
>>>>>>> +  promoted_subreg = XCNEWVEC (VEC (rtx,heap) *, n_regs);
>>>>>>> +  propagated_size = XNEWVEC (int, n_regs);
>>>>>>> +
>>>>>>> +  /* Initialize biggest_use for all regs to 0.  If a reg is used implicitly, we
>>>>>>> +     handle that reg conservatively and set it to SKIP_REG instead.  */
>>>>>>> +  for (i = 0; i < n_regs; i++)
>>>>>>> +    {
>>>>>>> +      biggest_use[i] = skip_reg_p (i) ? SKIP_REG : 0;
>>>>>>> +      propagated_size[i] = NONE;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +  redundant_extensions = VEC_alloc (rtx, heap, 10);
>>>>>>> +
>>>>>>> +  wl = VEC_alloc (int, heap, 50);
>>>>>>> +  in_wl = XNEWVEC (bool, n_regs);
>>>>>>> +
>>>>>>> +  uses = XNEWVEC (typeof (*uses), n_regs);
>>>>>>> +  props = XNEWVEC (typeof (*props), n_regs);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      uses[i] = NULL;
>>>>>>> +      props[i] = NULL;
>>>>>>> +      in_wl[i] = false;
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Find redundant extensions and remove or replace them if possible.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +remove_redundant_extensions (void)
>>>>>>> +{
>>>>>>> +  rtx insn, dest, inner;
>>>>>>> +  int preserved_size;
>>>>>>> +  int ix;
>>>>>>> +
>>>>>>> +  if (!find_extensions ())
>>>>>>> +    return;
>>>>>>> +
>>>>>>> +  calculate_biggest_use ();
>>>>>>> +
>>>>>>> +  find_redundant_extensions ();
>>>>>>> +
>>>>>>> +  if (!VEC_empty (rtx, extensions))
>>>>>>> +    {
>>>>>>> +      propagate ();
>>>>>>> +
>>>>>>> +      find_redundant_extensions ();
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  gcc_checking_assert (n_regs == max_reg_num ());
>>>>>>> +
>>>>>>> +  FOR_EACH_VEC_ELT (rtx, redundant_extensions, ix, insn)
>>>>>>> +    {
>>>>>>> +      extension_p (insn, &dest, &inner, &preserved_size);
>>>>>>> +      try_remove_or_replace_extension (insn, dest, inner);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  if (dump_file)
>>>>>>> +    fprintf (dump_file, "\n");
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Free the variables at the end of the pass.  */
>>>>>>> +
>>>>>>> +static void
>>>>>>> +finish_pass (void)
>>>>>>> +{
>>>>>>> +  int i;
>>>>>>> +
>>>>>>> +  XDELETEVEC (propagated_size);
>>>>>>> +
>>>>>>> +  VEC_free (rtx, heap, extensions);
>>>>>>> +  VEC_free (rtx, heap, redundant_extensions);
>>>>>>> +
>>>>>>> +  VEC_free (int, heap, wl);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    {
>>>>>>> +      if (uses[i] != NULL)
>>>>>>> +	VEC_free (use_type, heap, uses[i]);
>>>>>>> +
>>>>>>> +      if (props[i] != NULL)
>>>>>>> +	VEC_free (prop_type, heap, props[i]);
>>>>>>> +    }
>>>>>>> +
>>>>>>> +  XDELETEVEC (uses);
>>>>>>> +  XDELETEVEC (props);
>>>>>>> +  XDELETEVEC (biggest_use);
>>>>>>> +
>>>>>>> +  for (i = 0; i < n_regs; ++i)
>>>>>>> +    if (promoted_subreg[i] != NULL)
>>>>>>> +      VEC_free (rtx, heap, promoted_subreg[i]);
>>>>>>> +  XDELETEVEC (promoted_subreg);
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Remove redundant extensions.  */
>>>>>>> +
>>>>>>> +static unsigned int
>>>>>>> +rest_of_handle_ee (void)
>>>>>>> +{
>>>>>>> +  n_regs = max_reg_num ();
>>>>>>> +
>>>>>>> +  init_pass ();
>>>>>>> +  remove_redundant_extensions ();
>>>>>>> +  finish_pass ();
>>>>>>> +  return 0;
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* Run ee pass when flag_ee is set at optimization level > 0.  */
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +gate_handle_ee (void)
>>>>>>> +{
>>>>>>> +  return (optimize > 0 && flag_ee);
>>>>>>> +}
>>>>>>> +
>>>>>>> +struct rtl_opt_pass pass_ee =
>>>>>>> +{
>>>>>>> + {
>>>>>>> +  RTL_PASS,
>>>>>>> +  "ee",                                 /* name */
>>>>>>> +  gate_handle_ee,                       /* gate */
>>>>>>> +  rest_of_handle_ee,                    /* execute */
>>>>>>> +  NULL,                                 /* sub */
>>>>>>> +  NULL,                                 /* next */
>>>>>>> +  0,                                    /* static_pass_number */
>>>>>>> +  TV_EE,                                /* tv_id */
>>>>>>> +  0,                                    /* properties_required */
>>>>>>> +  0,                                    /* properties_provided */
>>>>>>> +  0,                                    /* properties_destroyed */
>>>>>>> +  0,                                    /* todo_flags_start */
>>>>>>> +  TODO_ggc_collect |
>>>>>>> +  TODO_verify_rtl_sharing,              /* todo_flags_finish */
>>>>>>> + }
>>>>>>> +};
>>>>>>> Index: gcc/common.opt
>>>>>>> ===================================================================
>>>>>>> --- gcc/common.opt (revision 189409)
>>>>>>> +++ gcc/common.opt (working copy)
>>>>>>> @@ -1067,6 +1067,10 @@ feliminate-dwarf2-dups
>>>>>>> Common Report Var(flag_eliminate_dwarf2_dups)
>>>>>>> Perform DWARF2 duplicate elimination
>>>>>>>
>>>>>>> +fextension-elimination
>>>>>>> +Common Report Var(flag_ee) Init(0) Optimization
>>>>>>> +Perform extension elimination
>>>>>>> +
>>>>>>> fipa-sra
>>>>>>> Common Report Var(flag_ipa_sra) Init(0) Optimization
>>>>>>> Perform interprocedural reduction of aggregates
>>>>>>> Index: gcc/Makefile.in
>>>>>>> ===================================================================
>>>>>>> --- gcc/Makefile.in (revision 189409)
>>>>>>> +++ gcc/Makefile.in (working copy)
>>>>>>> @@ -1218,6 +1218,7 @@ OBJS = \
>>>>>>> 	dwarf2asm.o \
>>>>>>> 	dwarf2cfi.o \
>>>>>>> 	dwarf2out.o \
>>>>>>> +	ee.o \
>>>>>>> 	ebitmap.o \
>>>>>>> 	emit-rtl.o \
>>>>>>> 	et-forest.o \
>>>>>>> @@ -2971,6 +2972,12 @@ cprop.o : cprop.c $(CONFIG_H) $(SYSTEM_H
>>>>>>>       $(TM_P_H) $(PARAMS_H) cselib.h $(EXCEPT_H) $(TREE_H) $(TIMEVAR_H) \
>>>>>>>       intl.h $(OBSTACK_H) $(TREE_PASS_H) $(DF_H) $(DBGCNT_H) $(TARGET_H) \
>>>>>>>       $(DF_H) $(CFGLOOP_H)
>>>>>>> +ee.o : ee.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>> +   hard-reg-set.h $(FLAGS_H) $(BASIC_BLOCK_H) $(FUNCTION_H) output.h \
>>>>>>> +   $(DF_H) $(TIMEVAR_H) tree-pass.h $(RECOG_H) $(EXPR_H) \
>>>>>>> +   $(REGS_H) $(TREE_H) $(TM_P_H) insn-config.h $(INSN_ATTR_H) $(TOPLEV_H) \
>>>>>>> +   $(DIAGNOSTIC_CORE_H) $(TARGET_H) $(OPTABS_H) insn-codes.h rtlhooks-def.h \
>>>>>>> +   $(PARAMS_H) $(CGRAPH_H)
>>>>>>> gcse.o : gcse.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) $(RTL_H) \
>>>>>>>       $(REGS_H) hard-reg-set.h $(FLAGS_H) insn-config.h $(GGC_H) \
>>>>>>>       $(RECOG_H) $(EXPR_H) $(BASIC_BLOCK_H) $(FUNCTION_H) toplev.h $(DIAGNOSTIC_CORE_H) \
>>>>>>> Index: gcc/passes.c
>>>>>>> ===================================================================
>>>>>>> --- gcc/passes.c (revision 189409)
>>>>>>> +++ gcc/passes.c (working copy)
>>>>>>> @@ -1552,6 +1552,7 @@ init_optimization_passes (void)
>>>>>>>          NEXT_PASS (pass_initialize_regs);
>>>>>>>          NEXT_PASS (pass_ud_rtl_dce);
>>>>>>>          NEXT_PASS (pass_combine);
>>>>>>> +      NEXT_PASS (pass_ee);
>>>>>>>          NEXT_PASS (pass_if_after_combine);
>>>>>>>          NEXT_PASS (pass_partition_blocks);
>>>>>>>          NEXT_PASS (pass_regmove);
>>>
>>
> 
> 


^ permalink raw reply	[flat|nested] 43+ messages in thread

* Re: new sign/zero extension elimination pass
  2012-07-11 10:31             ` Tom de Vries
                                 ` (2 preceding siblings ...)
       [not found]               ` <4FFE2ADF.2060806@naturalbridge.com>
@ 2012-08-20 13:40               ` Tom de Vries
  3 siblings, 0 replies; 43+ messages in thread
From: Tom de Vries @ 2012-08-20 13:40 UTC (permalink / raw)
  To: Tom de Vries
  Cc: Eric Botcazou, Tom de Vries, Paolo Bonzini, gcc-patches, Bernd Schmidt

On 11/07/12 12:30, Tom de Vries wrote:
> On 13/11/10 10:50, Eric Botcazou wrote:
>>> I profiled the pass on spec2000:
>>>
>>>                     -mabi=32     -mabi=64
>>> ee-pass (usr time):     0.70         1.16
>>> total   (usr time):   919.30       879.26
>>> ee-pass        (%):     0.08         0.13
>>>
>>> The pass takes 0.13% or less of the total usr runtime.
>>
>> For how many hits?  What are the numbers with --param ee-max-propagate=0?
>>
>>> Is it necessary to improve the runtime of this pass?
>>
>> I've already given my opinion about the implementation.  The other passes in 
>> the compiler try hard not to rescan everything when a single bit changes; as 
>> currently written, yours doesn't.
>>
> 
> Eric,
> 
> I've done the following:
> - refactored the pass such that it now scans at most twice over all
>   instructions.
> - updated the patch to be applicable to current trunk
> - updated the motivating example to a more applicable one (as discussed in
>   this thread), and added that one as test-case.
> - added a part in the header comment illustrating the working of the pass
>   on the motivating example.
> 
> bootstrapped and reg-tested on x86_64 and i686.
> 
> build and reg-tested on mips, mips64, and arm.
> 
> OK for trunk?
> 

Eric,

does the new patch meet your concerns related to rescanning?

If so, OK for trunk?

Thanks,
- Tom


> Thanks,
> - Tom
> 
> 2012-07-10  Tom de Vries  <tom@codesourcery.com>
> 
> 	* ee.c: New file.
> 	* tree-pass.h (pass_ee): Declare.
> 	* opts.c ( default_options_table): Set flag_ee at -O2.
> 	* timevar.def (TV_EE): New timevar.
> 	* common.opt (fextension-elimination): New option.
> 	* Makefile.in (ee.o): New rule.
> 	* passes.c (pass_ee): Add it.
> 
> 	* gcc.dg/extend-1.c: New test.
> 	* gcc.dg/extend-2.c: Same.
> 	* gcc.dg/extend-2-64.c: Same.
> 	* gcc.dg/extend-3.c: Same.
> 	* gcc.dg/extend-4.c: Same.
> 	* gcc.dg/extend-5.c: Same.
> 	* gcc.target/mips/octeon-bbit-2.c: Make test more robust.
> 

^ permalink raw reply	[flat|nested] 43+ messages in thread

end of thread, other threads:[~2012-08-20 13:40 UTC | newest]

Thread overview: 43+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-18 15:54 new sign/zero extension elimination pass Tom de Vries
2010-10-18 16:03 ` Andrew Pinski
2010-10-18 16:59   ` Richard Guenther
2010-10-21 10:06     ` Tom de Vries
2010-10-21 10:44   ` Paolo Bonzini
2010-10-21 11:00     ` Paolo Bonzini
2010-10-21 17:21     ` Paolo Bonzini
2010-10-22  9:05       ` Tom de Vries
2010-10-22  9:24         ` Paolo Bonzini
2010-10-22  9:15 ` Eric Botcazou
2010-10-28 20:45   ` Tom de Vries
2010-10-29  2:11     ` Paolo Bonzini
2010-10-29  2:42       ` Paolo Bonzini
2010-10-31 19:30       ` Tom de Vries
2010-10-31 20:58         ` Joseph S. Myers
2010-10-31 21:11         ` Paolo Bonzini
2010-11-03 18:49           ` Eric Botcazou
2010-10-29  1:04   ` Paolo Bonzini
2010-10-29  1:33     ` Paolo Bonzini
2010-11-03 18:50     ` Eric Botcazou
2010-11-08 21:29     ` Tom de Vries
2010-11-08 22:11       ` Paolo Bonzini
2010-11-12  8:29         ` Tom de Vries
2010-11-13 10:41           ` Eric Botcazou
2012-07-11 10:31             ` Tom de Vries
2012-07-11 11:42               ` Jakub Jelinek
2012-07-11 13:01                 ` Tom de Vries
2012-07-12  1:52               ` Kenneth Zadeck
     [not found]               ` <4FFE2ADF.2060806@naturalbridge.com>
     [not found]                 ` <4FFE9346.2070806@mentor.com>
2012-07-12  9:21                   ` Tom de Vries
2012-07-12 12:05                     ` Kenneth Zadeck
2012-07-13  7:54                       ` Tom de Vries
2012-07-13 11:39                         ` Kenneth Zadeck
2012-07-13 12:58                           ` Tom de Vries
2012-07-17 15:17                         ` Kenneth Zadeck
2012-07-20 18:41                           ` Tom de Vries
2012-08-20 13:40               ` Tom de Vries
2010-10-28 20:55 ` Andrew Pinski
2010-10-28 21:00   ` Andrew Pinski
2010-10-28 21:12     ` Tom de Vries
2010-10-28 22:58       ` Andrew Pinski
2010-10-29 15:06         ` Tom de Vries
2010-10-29  0:34     ` Paolo Bonzini
2010-11-08 21:32 ` Andrew Pinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).