public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
* [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
@ 2015-08-12  1:11 Richard Henderson
  2015-08-12  1:11 ` [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide Richard Henderson
                   ` (16 more replies)
  0 siblings, 17 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn, Marcus Shawcroft, Richard Earnshaw

Something last week had me looking at ppc64 code generation,
and some of what I saw was fairly bad.  Fixing it wasn't going
to be easy, due to the fact that the logic for generating
constants wasn't contained within a single function.

Better is the way that aarch64 and alpha have done it in the
past, sharing a single function with all of the logical that
can be used for both cost calculation and the actual emission
of the constants.

However, the way that aarch64 and alpha have done it hasn't
been ideal, in that there's a fairly costly search that must
be done every time.  I've thought before about changing this
so that we would be able to cache results, akin to how we do
it in expmed.c for multiplication.

I've implemented such a caching scheme for three targets, as
a test of how much code could be shared.  The answer appears
to be about 100 lines of boiler-plate.  Minimal, true, but it
may still be worth it as a way of encouraging backends to do
similar things in a similar way.

Some notes about ppc64 in particular:

  * Constants aren't split until quite late, preventing all hope of
    CSE'ing portions of the generated code.  My gut feeling is that
    this is in general a mistake, but...

    I did attempt to fix it, and got nothing for my troubles except
    poorer code generation for AND/IOR/XOR with non-trivial constants.

    I'm somewhat surprised that the operands to the logicals aren't
    visible at rtl generation time, given all the work done in gimple.
    And failing that, combine has enough REG_EQUAL notes that it ought
    to be able to put things back together and see the simpler pattern.

    Perhaps there's some other predication or costing error that's
    getting in the way, and it simply wasn't obvious to me.   In any
    case, nothing in this patch set addresses this at all.

  * I go on to add 4 new methods of generating a constant, each of
    which typically saves 2 insns over the current algorithm.  There
    are a couple more that might be useful but...

  * Constants are split *really* late.  In particular, after reload.
    It would be awesome if we could at least have them all split before
    register allocation so that we arrange to use ADDI and ADDIS when
    that could save a few instructions.  But that does of course mean
    avoiding r0 for the input.  Again, nothing here attempts to change
    when constants are split.

  * This is the only platform for which I bothered collecting any sort
    of performance data:

    As best I can tell, there is a 9% improvement in bootstrap speed
    for ppc64.  That is, 10 minutes off the original 109 minute build.

    For aarch64 and alpha, I simply assumed there would be no loss,
    since the basic search algorithm is unchanged for each.

Comments?  Especially on the shared header?


r~

Cc: David Edelsohn <dje.gcc@gmail.com>
Cc: Marcus Shawcroft <marcus.shawcroft@arm.com>
Cc: Richard Earnshaw <richard.earnshaw@arm.com>

Richard Henderson (15):
  rs6000: Split out rs6000_is_valid_and_mask_wide
  rs6000: Make num_insns_constant_wide static
  rs6000: Tidy num_insns_constant vs CONST_DOUBLE
  rs6000: Implement set_const_data infrastructure
  rs6000: Move constant via mask into build_set_const_data
  rs6000: Use rldiwi in constant construction
  rs6000: Generalize left shift in constant generation
  rs6000: Generalize masking in constant generation
  rs6000: Use xoris in constant construction
  rs6000: Use rotldi in constant generation
  aarch64: Use hashing infrastructure for generating constants
  aarch64: Test for duplicated 32-bit halves
  alpha: Use hashing infrastructure for generating constants
  alpha: Split out alpha_cost_set_const
  alpha: Remove alpha_emit_set_long_const

 gcc/config/aarch64/aarch64.c      | 463 ++++++++++++++++------------
 gcc/config/alpha/alpha.c          | 583 +++++++++++++++++------------------
 gcc/config/rs6000/rs6000-protos.h |   1 -
 gcc/config/rs6000/rs6000.c        | 617 ++++++++++++++++++++++++--------------
 gcc/config/rs6000/rs6000.md       |  15 -
 gcc/genimm-hash.h                 | 122 ++++++++
 6 files changed, 1057 insertions(+), 744 deletions(-)
 create mode 100644 gcc/genimm-hash.h

-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 05/15] rs6000: Move constant via mask into build_set_const_data
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
  2015-08-12  1:11 ` [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide Richard Henderson
@ 2015-08-12  1:11 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 12/15] aarch64: Test for duplicated 32-bit halves Richard Henderson
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

No need to special-case this in several places anymore.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (num_insns_constant): Remove test
	for rs6000_is_valid_and_mask.
	(genimm_ppc::exam_mask): New.
	(genimm_ppc::exam_search): Use it.
	(genimm_ppc::generate): Handle AND.
	* config/rs6000/rs6000.md (rs6000_is_valid_and_mask splitter): Remove.
---
 gcc/config/rs6000/rs6000.c  | 27 ++++++++++++++++++++++-----
 gcc/config/rs6000/rs6000.md | 15 ---------------
 2 files changed, 22 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a864a7e..6af5cf3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5246,11 +5246,7 @@ num_insns_constant (rtx op, machine_mode mode)
   switch (GET_CODE (op))
     {
     case CONST_INT:
-      if ((INTVAL (op) >> 31) != 0 && (INTVAL (op) >> 31) != -1
-	  && rs6000_is_valid_and_mask (op, mode))
-	return 2;
-      else
-	return num_insns_constant_wide (INTVAL (op));
+      return num_insns_constant_wide (INTVAL (op));
 
     case CONST_WIDE_INT:
       {
@@ -8062,6 +8058,7 @@ struct genimm_ppc : genimm_base <rtx_code, 5>
 
   bool exam_simple (HOST_WIDE_INT c, machine_mode, int budget);
   bool exam_sub (HOST_WIDE_INT c, int budget);
+  bool exam_mask (HOST_WIDE_INT c, HOST_WIDE_INT mask, int sub_budget);
   bool exam_search (HOST_WIDE_INT c, int budget);
   void exam_full (HOST_WIDE_INT c);
   void generate (rtx dest, machine_mode mode) const;
@@ -8125,6 +8122,21 @@ genimm_ppc::exam_sub (HOST_WIDE_INT c, int budget)
 	  || (budget > 1 && exam_search (c, budget)));
 }
 
+/* If we are able to construct C within SUB_BUDGET + 1,
+   return true and fill in the recipe.  */
+
+bool
+genimm_ppc::exam_mask (HOST_WIDE_INT c, HOST_WIDE_INT mask, int sub_budget)
+{
+  if (rs6000_is_valid_and_mask_wide (mask, DImode)
+      && exam_sub (c, sub_budget))
+    {
+      opN (AND, mask); /* RLDICL, et al */
+      return true;
+    }
+  return false;
+}
+
 /* The body of the recursive search for C within BUDGET.
    We've already failed exam_simple.  */
 
@@ -8157,6 +8169,10 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
 	}
     }
 
+  /* If C is a mask itself, apply it to all ones.  */
+  if (exam_mask (-1, c, sub_budget))
+    return true;
+
   /* Shift the constant left.  */
   test = HOST_WIDE_INT_UC (0xffffffff00000000);
   if ((c & test) == c && exam_sub (c >> 32, sub_budget))
@@ -8209,6 +8225,7 @@ genimm_ppc::generate (rtx dest, machine_mode mode) const
 	  x = op2;
 	  break;
 	case PLUS:
+	case AND:
 	case IOR:
 	case ASHIFT:
 	  x = gen_rtx_fmt_ee (r, mode, op1, op2);
diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 527ad98..9161931 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7106,21 +7106,6 @@
   [(set_attr "type" "store,load,*,*,*,*,fpstore,fpload,fp,mfjmpr,mtjmpr,*,mftgpr,mffgpr,mftgpr,mffgpr,vecsimple")
    (set_attr "length" "4,4,4,4,4,20,4,4,4,4,4,4,4,4,4,4,4")])
 
-; Some DImode loads are best done as a load of -1 followed by a mask
-; instruction.
-(define_split
-  [(set (match_operand:DI 0 "gpc_reg_operand")
-	(match_operand:DI 1 "const_int_operand"))]
-  "TARGET_POWERPC64
-   && num_insns_constant (operands[1], DImode) > 1
-   && rs6000_is_valid_and_mask (operands[1], DImode)"
-  [(set (match_dup 0)
-	(const_int -1))
-   (set (match_dup 0)
-	(and:DI (match_dup 0)
-		(match_dup 1)))]
-  "")
-
 ;; Split a load of a large constant into the appropriate five-instruction
 ;; sequence.  Handle anything in a constant number of insns.
 ;; When non-easy constants can go in the TOC, this should use
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
@ 2015-08-12  1:11 ` Richard Henderson
  2015-08-12 13:24   ` Segher Boessenkool
  2015-08-12  1:11 ` [PATCH 05/15] rs6000: Move constant via mask into build_set_const_data Richard Henderson
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:11 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

This allows testing for a mask without having to call GEN_INT.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from...
	(rs6000_is_valid_mask): ... here.
	(rs6000_is_valid_and_mask_wide): Split out from...
	(rs6000_is_valid_and_mask): ... here.
	(rs6000_is_valid_2insn_and): Use rs6000_is_valid_and_mask_wide.
	(rs6000_emit_2insn_and): Likewise.
	(rs6000_rtx_costs): Likewise.
---
 gcc/config/rs6000/rs6000.c | 76 ++++++++++++++++++++++++++++++----------------
 1 file changed, 50 insertions(+), 26 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index e37ef9f..a33b9d3 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1108,6 +1108,8 @@ static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static bool rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val,
+					   machine_mode mode);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
 static bool rs6000_debug_rtx_costs (rtx, machine_mode, int, int, int *, bool);
 static int rs6000_debug_address_cost (rtx, machine_mode, addr_space_t,
@@ -16378,18 +16380,13 @@ validate_condition_mode (enum rtx_code code, machine_mode mode)
    the single stretch of 1 bits begins; and similarly for B, the bit
    offset where it ends.  */
 
-bool
-rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
+static bool
+rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n)
 {
-  unsigned HOST_WIDE_INT val = INTVAL (mask);
   unsigned HOST_WIDE_INT bit;
   int nb, ne;
-  int n = GET_MODE_PRECISION (mode);
 
-  if (mode != DImode && mode != SImode)
-    return false;
-
-  if (INTVAL (mask) >= 0)
+  if ((HOST_WIDE_INT)val >= 0)
     {
       bit = val & -val;
       ne = exact_log2 (bit);
@@ -16430,27 +16427,54 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
   return true;
 }
 
+bool
+rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
+{
+  int n;
+
+  if (mode == DImode)
+    n = 64;
+  else if (mode == SImode)
+    n = 32;
+  else
+    return false;
+
+  unsigned HOST_WIDE_INT val = INTVAL (mask);
+  return rs6000_is_valid_mask_wide (val, b, e, n);
+}
+
 /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, rldicl,
    or rldicr instruction, to implement an AND with it in mode MODE.  */
 
-bool
-rs6000_is_valid_and_mask (rtx mask, machine_mode mode)
+static bool
+rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode)
 {
   int nb, ne;
 
-  if (!rs6000_is_valid_mask (mask, &nb, &ne, mode))
-    return false;
+  switch (mode)
+    {
+    case DImode:
+      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 64))
+	return false;
+      /* For DImode, we need a rldicl, rldicr, or a rlwinm with
+	 mask that does not wrap.  */
+      return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
 
-  /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that
-     does not wrap.  */
-  if (mode == DImode)
-    return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
+    case SImode:
+      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 32))
+	return false;
+      /* For SImode, rlwinm can do everything.  */
+      return (nb < 32 && ne < 32);
 
-  /* For SImode, rlwinm can do everything.  */
-  if (mode == SImode)
-    return (nb < 32 && ne < 32);
+    default:
+      return false;
+    }
+}
 
-  return false;
+bool
+rs6000_is_valid_and_mask (rtx mask, machine_mode mode)
+{
+  return rs6000_is_valid_and_mask_wide (UINTVAL (mask), mode);
 }
 
 /* Return the instruction template for an AND with mask in mode MODE, with
@@ -16739,12 +16763,12 @@ rs6000_is_valid_2insn_and (rtx c, machine_mode mode)
 
   /* Otherwise, fill in the lowest "hole"; if we can do the result with
      one insn, we can do the whole thing with two.  */
-  unsigned HOST_WIDE_INT val = INTVAL (c);
+  unsigned HOST_WIDE_INT val = UINTVAL (c);
   unsigned HOST_WIDE_INT bit1 = val & -val;
   unsigned HOST_WIDE_INT bit2 = (val + bit1) & ~val;
   unsigned HOST_WIDE_INT val1 = (val + bit1) & val;
   unsigned HOST_WIDE_INT bit3 = val1 & -val1;
-  return rs6000_is_valid_and_mask (GEN_INT (val + bit3 - bit2), mode);
+  return rs6000_is_valid_and_mask_wide (val + bit3 - bit2, mode);
 }
 
 /* Emit a potentially record-form instruction, setting DST from SRC.
@@ -16835,10 +16859,10 @@ rs6000_emit_2insn_and (machine_mode mode, rtx *operands, bool expand, int dot)
   unsigned HOST_WIDE_INT mask1 = -bit3 + bit2 - 1;
   unsigned HOST_WIDE_INT mask2 = val + bit3 - bit2;
 
-  gcc_assert (rs6000_is_valid_and_mask (GEN_INT (mask2), mode));
+  gcc_assert (rs6000_is_valid_and_mask_wide (mask2, mode));
 
   /* Two "no-rotate"-and-mask instructions, for SImode.  */
-  if (rs6000_is_valid_and_mask (GEN_INT (mask1), mode))
+  if (rs6000_is_valid_and_mask_wide (mask1, mode))
     {
       gcc_assert (mode == SImode);
 
@@ -16855,7 +16879,7 @@ rs6000_emit_2insn_and (machine_mode mode, rtx *operands, bool expand, int dot)
   /* Two "no-rotate"-and-mask instructions, for DImode: both are rlwinm
      insns; we have to do the first in SImode, because it wraps.  */
   if (mask2 <= 0xffffffff
-      && rs6000_is_valid_and_mask (GEN_INT (mask1), SImode))
+      && rs6000_is_valid_and_mask_wide (mask1, SImode))
     {
       rtx reg = expand ? gen_reg_rtx (mode) : operands[0];
       rtx tmp = gen_rtx_AND (SImode, gen_lowpart (SImode, operands[1]),
@@ -31070,7 +31094,7 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int outer_code,
 
 	  /* rotate-and-mask (no rotate), andi., andis.: 1 insn.  */
 	  HOST_WIDE_INT val = INTVAL (XEXP (x, 1));
-	  if (rs6000_is_valid_and_mask (XEXP (x, 1), mode)
+	  if (rs6000_is_valid_and_mask_wide (val, mode)
 	      || (val & 0xffff) == val
 	      || (val & 0xffff0000) == val
 	      || ((val & 0xffff) == 0 && mode == SImode))
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 14/15] alpha: Split out alpha_cost_set_const
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (11 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 03/15] rs6000: Tidy num_insns_constant vs CONST_DOUBLE Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 07/15] rs6000: Generalize left shift in constant generation Richard Henderson
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches

Now that it's easy, avoid the "no_output" parameter
to alpha_emit_set_const.
---
	* config/alpha/alpha.c (alpha_cost_set_const): New.
	(alpha_legitimate_constant_p): Use it
	(alpha_emit_set_const): Remove no_output param.
	(alpha_split_const_mov): Update call.
	(alpha_expand_epilogue): Likewise.
---
 gcc/config/alpha/alpha.c | 26 +++++++++++++++++---------
 1 file changed, 17 insertions(+), 9 deletions(-)

diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index 6933601..cc25250 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -1998,6 +1998,18 @@ genimm_alpha::generate (rtx dest, machine_mode mode) const
 
 } // anon namespace
 
+/* Return the cost of outputting the constant C in MODE.  */
+
+static int
+alpha_cost_set_const (machine_mode mode, HOST_WIDE_INT c)
+{
+  if (mode == V8QImode || mode == V4HImode || mode == V2SImode)
+    mode = DImode;
+
+  genimm_alpha data = genimm_hash<genimm_alpha>::hash (c, mode);
+  return data.cost;
+}
+
 /* Try to output insns to set TARGET equal to the constant C if it can be
    done in less than N insns.  Do all computations in MODE.  Returns the place
    where the output has been placed if it can be done and the insns have been
@@ -2006,7 +2018,7 @@ genimm_alpha::generate (rtx dest, machine_mode mode) const
 
 static rtx
 alpha_emit_set_const (rtx target, machine_mode origmode,
-		      HOST_WIDE_INT c, int n, bool no_output)
+		      HOST_WIDE_INT c, int n)
 {
   machine_mode mode = origmode;
   if (mode == V8QImode || mode == V4HImode || mode == V2SImode)
@@ -2016,10 +2028,6 @@ alpha_emit_set_const (rtx target, machine_mode origmode,
   if (data.cost > n)
     return NULL;
 
-  /* If we're not emitting output, we only need return a nonnull value.  */
-  if (no_output)
-    return pc_rtx;
-
   if (origmode == mode)
     data.generate (target, mode);
   else
@@ -2132,7 +2140,7 @@ alpha_legitimate_constant_p (machine_mode mode, rtx x)
       mode = DImode;
       gcc_assert (CONST_WIDE_INT_NUNITS (x) == 2);
       i0 = CONST_WIDE_INT_ELT (x, 1);
-      if (alpha_emit_set_const (NULL_RTX, mode, i0, 3, true) == NULL)
+      if (alpha_cost_set_const (mode, i0) > 3)
 	return false;
       i0 = CONST_WIDE_INT_ELT (x, 0);
       goto do_integer;
@@ -2156,7 +2164,7 @@ alpha_legitimate_constant_p (machine_mode mode, rtx x)
 	return true;
       i0 = alpha_extract_integer (x);
     do_integer:
-      return alpha_emit_set_const (NULL_RTX, mode, i0, 3, true) != NULL;
+      return alpha_cost_set_const (mode, i0) <= 3;
 
     default:
       return false;
@@ -2174,7 +2182,7 @@ alpha_split_const_mov (machine_mode mode, rtx *operands)
 
   i0 = alpha_extract_integer (operands[1]);
 
-  temp = alpha_emit_set_const (operands[0], mode, i0, 3, false);
+  temp = alpha_emit_set_const (operands[0], mode, i0, 3);
 
   if (!temp && TARGET_BUILD_CONSTANTS)
     temp = alpha_emit_set_long_const (operands[0], i0);
@@ -8212,7 +8220,7 @@ alpha_expand_epilogue (void)
       else
 	{
 	  rtx tmp = gen_rtx_REG (DImode, 23);
-	  sp_adj2 = alpha_emit_set_const (tmp, DImode, frame_size, 3, false);
+	  sp_adj2 = alpha_emit_set_const (tmp, DImode, frame_size, 3);
 	  if (!sp_adj2)
 	    {
 	      /* We can't drop new things to memory this late, afaik,
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 04/15] rs6000: Implement set_const_data infrastructure
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (6 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 09/15] rs6000: Use xoris in constant construction Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12 13:53   ` Segher Boessenkool
  2015-08-12  1:12 ` [PATCH 06/15] rs6000: Use rldiwi in constant construction Richard Henderson
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Implements basic hashing and minimal cost search, but no real
new optimizations on the constants.  That said, there is one
code generation change implied by the searching: we'll find
equal or more efficient alternatives to zero-extending a
32-bit unsigned constant with bit 31 set.  E.g.

		0x80000001ul		0xf0f0f0f0ul

before		lis 3,0x8000		lis 3,0xf0f0
		ori 3,3,1		ori 3,3,0xf0f0
		rldicl 3,3,0,32		rldicl 3,3,0,32

after		li 3,1			li 3,0
		oris 3,3,0x8000		oris 3,3,0xf0f0
					ori 3,3,0xf0f0

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* genimm-hash.h: New file.
	* config/rs6000/rs6000.c: Include it.
	(genimm_ppc): New class.
	(genimm_ppc::genimm_ppc): New.
	(genimm_ppc::set0, genimm_ppc::opN): New.
	(genimm_ppc::exam_simple): New.
	(genimm_ppc::exam_sub): New.
	(genimm_ppc::exam_search): New.
	(genimm_ppc::exam_full): New.
	(genimm_ppc::generate): New.
	(num_insns_constant_wide): Use genimm_hash<genimm_ppc>.
	(rs6000_emit_set_const_1): New.
	(rs6000_emit_set_const): Use it.
	(rs6000_emit_set_long_const): Remove.
---
 gcc/config/rs6000/rs6000.c | 368 +++++++++++++++++++++++++--------------------
 gcc/genimm-hash.h          | 122 +++++++++++++++
 2 files changed, 328 insertions(+), 162 deletions(-)
 create mode 100644 gcc/genimm-hash.h

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index c25aa60..a864a7e 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -83,6 +83,7 @@
 #include "builtins.h"
 #include "context.h"
 #include "tree-pass.h"
+#include "genimm-hash.h"
 #if TARGET_XCOFF
 #include "xcoffout.h"  /* get declarations of xcoff_*_section_name */
 #endif
@@ -1107,7 +1108,6 @@ static tree rs6000_handle_longcall_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
-static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
 static int num_insns_constant_wide (HOST_WIDE_INT);
 static bool rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val,
 					   machine_mode mode);
@@ -5238,44 +5238,6 @@ direct_return (void)
   return 0;
 }
 
-/* Return the number of instructions it takes to form a constant in an
-   integer register.  */
-
-static int
-num_insns_constant_wide (HOST_WIDE_INT value)
-{
-  /* signed constant loadable with addi */
-  if (((unsigned HOST_WIDE_INT) value + 0x8000) < 0x10000)
-    return 1;
-
-  /* constant loadable with addis */
-  else if ((value & 0xffff) == 0
-	   && (value >> 31 == -1 || value >> 31 == 0))
-    return 1;
-
-  else if (TARGET_POWERPC64)
-    {
-      HOST_WIDE_INT low  = ((value & 0xffffffff) ^ 0x80000000) - 0x80000000;
-      HOST_WIDE_INT high = value >> 31;
-
-      if (high == 0 || high == -1)
-	return 2;
-
-      high >>= 1;
-
-      if (low == 0)
-	return num_insns_constant_wide (high) + 1;
-      else if (high == 0)
-	return num_insns_constant_wide (low) + 1;
-      else
-	return (num_insns_constant_wide (high)
-		+ num_insns_constant_wide (low) + 1);
-    }
-
-  else
-    return 2;
-}
-
 int
 num_insns_constant (rtx op, machine_mode mode)
 {
@@ -8086,6 +8048,204 @@ rs6000_conditional_register_usage (void)
 }
 
 \f
+namespace {
+
+/* All constants can be constructed in 5 insns.  */
+struct genimm_ppc : genimm_base <rtx_code, 5>
+{
+  static const int max_simple = 2;
+
+  genimm_ppc (HOST_WIDE_INT c);
+
+  void set0 (HOST_WIDE_INT o);
+  void opN (rtx_code r, HOST_WIDE_INT o);
+
+  bool exam_simple (HOST_WIDE_INT c, machine_mode, int budget);
+  bool exam_sub (HOST_WIDE_INT c, int budget);
+  bool exam_search (HOST_WIDE_INT c, int budget);
+  void exam_full (HOST_WIDE_INT c);
+  void generate (rtx dest, machine_mode mode) const;
+};
+
+genimm_ppc::genimm_ppc (HOST_WIDE_INT c)
+  : genimm_base (c)
+{
+#ifdef ENABLE_CHECKING
+  code[0] = code[1] = code[2] = code[3] = code[4] = UNKNOWN;
+  op[0] = op[1] = op[2] = op[3] = op[4] = 0;
+#endif
+}
+
+void
+genimm_ppc::set0 (HOST_WIDE_INT o)
+{
+  cost = 1;
+  code[0] = SET;
+  op[0] = o;
+}
+
+void
+genimm_ppc::opN (rtx_code r, HOST_WIDE_INT o)
+{
+  int n = cost++;
+  gcc_checking_assert (n > 0 && n < max_cost);
+  code[n] = r;
+  op[n] = o;
+}
+
+/* Only handle simple 32-bit sign-extended constants.  */
+
+bool
+genimm_ppc::exam_simple (HOST_WIDE_INT c, machine_mode, int budget)
+{
+  if ((unsigned HOST_WIDE_INT)c + 0x8000 < 0x10000)
+    {
+      set0 (c); /* LI */
+      return true;
+    }
+  if (c >> 31 == -1 || c >> 31 == 0)
+    {
+      int uh0 = c & 0xffff;
+
+      if (uh0 && budget < 2)
+	return false;
+
+      set0 (c - uh0); /* LIS */
+      if (uh0)
+	opN (IOR, uh0);
+      return true;
+    }
+  return false;
+}
+
+bool
+genimm_ppc::exam_sub (HOST_WIDE_INT c, int budget)
+{
+  return (exam_simple (c, DImode, budget)
+	  || (budget > 1 && exam_search (c, budget)));
+}
+
+/* The body of the recursive search for C within BUDGET.
+   We've already failed exam_simple.  */
+
+bool
+genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
+{
+  const int sub_budget = budget - 1;
+  HOST_WIDE_INT test;
+
+  /* Simple arithmetic on the low 32-bits.  */
+  test = c & 0xffff;
+  if (test != 0)
+    {
+      if (exam_sub (c ^ test, sub_budget))
+	{
+	  opN (IOR, test); /* ORI */
+	  return true;
+	}
+      /* Note that it doesn't help to try ORIS at this point.
+	 Zeroing bits in the middle of C when we know there
+	 are low bits set doesn't produce a simpler constant.  */
+    }
+  else
+    {
+      test = c & 0xffff0000u;
+      if (test != 0 && exam_sub (c ^ test, sub_budget))
+	{
+	  opN (IOR, test); /* ORIS */
+	  return true;
+	}
+    }
+
+  /* Shift the constant left.  */
+  test = HOST_WIDE_INT_UC (0xffffffff00000000);
+  if ((c & test) == c && exam_sub (c >> 32, sub_budget))
+    {
+      opN (ASHIFT, 32); /* SLDI */
+      return true;
+    }
+
+  return false;
+}
+
+/* Only handle full 64-bit constants requiring 5 insns.  */
+
+void
+genimm_ppc::exam_full (HOST_WIDE_INT c)
+{
+  bool ok = exam_simple (c >> 32, DImode, 2);
+  gcc_assert (ok);
+
+  opN (ASHIFT, 32);
+  if (c & 0xffff0000u)
+    opN (IOR, c & 0xffff0000u);
+  if (c & 0xffff)
+    opN (IOR, c & 0xffff);
+}
+
+/* Emit insns for the generated recipe.  */
+
+void
+genimm_ppc::generate (rtx dest, machine_mode mode) const
+{
+  bool do_subtargets = optimize && can_create_pseudo_p ();
+  rtx op1 = NULL;
+  rtx_insn *insn = NULL;
+  int i, n = cost;
+
+  gcc_checking_assert (n >= 1 && n <= max_cost);
+  gcc_checking_assert (code[0] == SET);
+
+  for (i = 0; i < n; ++i)
+    {
+      rtx sub = (do_subtargets && i + 1 < n ? gen_reg_rtx (mode) : dest);
+      rtx op2 = GEN_INT (op[i]);
+      rtx_code r = code[i];
+      rtx x;
+
+      switch (r)
+	{
+	case SET:
+	  x = op2;
+	  break;
+	case PLUS:
+	case IOR:
+	case ASHIFT:
+	  x = gen_rtx_fmt_ee (r, mode, op1, op2);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+
+      insn = emit_insn (gen_rtx_SET (sub, x));
+      op1 = sub;
+    }
+
+  if (n > 1)
+    set_unique_reg_note (insn, REG_EQUAL, GEN_INT (value));
+}
+
+} // anon namespace
+
+/* Return the number of instructions it takes to form a constant in an
+   integer register.  */
+
+static int
+num_insns_constant_wide (HOST_WIDE_INT c)
+{
+  machine_mode mode = TARGET_POWERPC64 ? DImode : SImode;
+  genimm_ppc data = genimm_hash<genimm_ppc>::hash (c, mode);
+  return data.cost;
+}
+
+static void
+rs6000_emit_set_const_1 (rtx dest, HOST_WIDE_INT c)
+{
+  machine_mode mode = GET_MODE (dest);
+  genimm_ppc data = genimm_hash<genimm_ppc>::hash (c, mode);
+  data.generate (dest, mode);
+}
+
 /* Output insns to set DEST equal to the constant SOURCE as a series of
    lis, ori and shl instructions and return TRUE.  */
 
@@ -8093,12 +8253,8 @@ bool
 rs6000_emit_set_const (rtx dest, rtx source)
 {
   machine_mode mode = GET_MODE (dest);
-  rtx temp, set;
-  rtx_insn *insn;
-  HOST_WIDE_INT c;
+  HOST_WIDE_INT c = INTVAL (source);
 
-  gcc_checking_assert (CONST_INT_P (source));
-  c = INTVAL (source);
   switch (mode)
     {
     case QImode:
@@ -8107,138 +8263,26 @@ rs6000_emit_set_const (rtx dest, rtx source)
       return true;
 
     case SImode:
-      temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (SImode);
-
-      emit_insn (gen_rtx_SET (copy_rtx (temp),
-			      GEN_INT (c & ~(HOST_WIDE_INT) 0xffff)));
-      emit_insn (gen_rtx_SET (dest,
-			      gen_rtx_IOR (SImode, copy_rtx (temp),
-					   GEN_INT (c & 0xffff))));
       break;
-
     case DImode:
       if (!TARGET_POWERPC64)
 	{
-	  rtx hi, lo;
-
-	  hi = operand_subword_force (copy_rtx (dest), WORDS_BIG_ENDIAN == 0,
-				      DImode);
-	  lo = operand_subword_force (dest, WORDS_BIG_ENDIAN != 0,
-				      DImode);
-	  emit_move_insn (hi, GEN_INT (c >> 32));
-	  c = ((c & 0xffffffff) ^ 0x80000000) - 0x80000000;
-	  emit_move_insn (lo, GEN_INT (c));
+	  rtx hi = operand_subword_force (dest, WORDS_BIG_ENDIAN == 0, DImode);
+	  rtx lo = operand_subword_force (dest, WORDS_BIG_ENDIAN != 0, DImode);
+	  rs6000_emit_set_const_1 (hi, trunc_int_for_mode (c >> 32, SImode));
+	  rs6000_emit_set_const_1 (lo, trunc_int_for_mode (c, SImode));
+	  return true;
 	}
-      else
-	rs6000_emit_set_long_const (dest, c);
       break;
 
     default:
       gcc_unreachable ();
     }
 
-  insn = get_last_insn ();
-  set = single_set (insn);
-  if (! CONSTANT_P (SET_SRC (set)))
-    set_unique_reg_note (insn, REG_EQUAL, GEN_INT (c));
-
+  rs6000_emit_set_const_1 (dest, c);
   return true;
 }
 
-/* Subroutine of rs6000_emit_set_const, handling PowerPC64 DImode.
-   Output insns to set DEST equal to the constant C as a series of
-   lis, ori and shl instructions.  */
-
-static void
-rs6000_emit_set_long_const (rtx dest, HOST_WIDE_INT c)
-{
-  rtx temp;
-  HOST_WIDE_INT ud1, ud2, ud3, ud4;
-
-  ud1 = c & 0xffff;
-  c = c >> 16;
-  ud2 = c & 0xffff;
-  c = c >> 16;
-  ud3 = c & 0xffff;
-  c = c >> 16;
-  ud4 = c & 0xffff;
-
-  if ((ud4 == 0xffff && ud3 == 0xffff && ud2 == 0xffff && (ud1 & 0x8000))
-      || (ud4 == 0 && ud3 == 0 && ud2 == 0 && ! (ud1 & 0x8000)))
-    emit_move_insn (dest, GEN_INT ((ud1 ^ 0x8000) - 0x8000));
-
-  else if ((ud4 == 0xffff && ud3 == 0xffff && (ud2 & 0x8000))
-	   || (ud4 == 0 && ud3 == 0 && ! (ud2 & 0x8000)))
-    {
-      temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
-
-      emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
-		      GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000));
-      if (ud1 != 0)
-	emit_move_insn (dest,
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud1)));
-    }
-  else if (ud3 == 0 && ud4 == 0)
-    {
-      temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
-
-      gcc_assert (ud2 & 0x8000);
-      emit_move_insn (copy_rtx (temp),
-		      GEN_INT (((ud2 << 16) ^ 0x80000000) - 0x80000000));
-      if (ud1 != 0)
-	emit_move_insn (copy_rtx (temp),
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud1)));
-      emit_move_insn (dest,
-		      gen_rtx_ZERO_EXTEND (DImode,
-					   gen_lowpart (SImode,
-							copy_rtx (temp))));
-    }
-  else if ((ud4 == 0xffff && (ud3 & 0x8000))
-	   || (ud4 == 0 && ! (ud3 & 0x8000)))
-    {
-      temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
-
-      emit_move_insn (copy_rtx (temp),
-		      GEN_INT (((ud3 << 16) ^ 0x80000000) - 0x80000000));
-      if (ud2 != 0)
-	emit_move_insn (copy_rtx (temp),
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud2)));
-      emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
-		      gen_rtx_ASHIFT (DImode, copy_rtx (temp),
-				      GEN_INT (16)));
-      if (ud1 != 0)
-	emit_move_insn (dest,
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud1)));
-    }
-  else
-    {
-      temp = !can_create_pseudo_p () ? dest : gen_reg_rtx (DImode);
-
-      emit_move_insn (copy_rtx (temp),
-		      GEN_INT (((ud4 << 16) ^ 0x80000000) - 0x80000000));
-      if (ud3 != 0)
-	emit_move_insn (copy_rtx (temp),
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud3)));
-
-      emit_move_insn (ud2 != 0 || ud1 != 0 ? copy_rtx (temp) : dest,
-		      gen_rtx_ASHIFT (DImode, copy_rtx (temp),
-				      GEN_INT (32)));
-      if (ud2 != 0)
-	emit_move_insn (ud1 != 0 ? copy_rtx (temp) : dest,
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud2 << 16)));
-      if (ud1 != 0)
-	emit_move_insn (dest,
-			gen_rtx_IOR (DImode, copy_rtx (temp),
-				     GEN_INT (ud1)));
-    }
-}
-
 /* Helper for the following.  Get rid of [r+r] memory refs
    in cases where it won't work (TImode, TFmode, TDmode, PTImode).  */
 
diff --git a/gcc/genimm-hash.h b/gcc/genimm-hash.h
new file mode 100644
index 0000000..73752e2
--- /dev/null
+++ b/gcc/genimm-hash.h
@@ -0,0 +1,122 @@
+/* Hash and memoize constant integer construction.
+   Copyright (C) 2015 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it
+under the terms of the GNU General Public License as published by the
+Free Software Foundation; either version 3, or (at your option) any
+later version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT
+ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GENIMM_HASH_H
+#define GENIMM_HASH_H
+
+/* The base class for the genimm infrastructure.  */
+
+template <typename CODETYPE, int MAXCOST>
+struct genimm_base
+{
+  static const int max_cost = MAXCOST;
+
+  /* The value we're generating.  */
+  const HOST_WIDE_INT value;
+
+  /* The sequence of operations required to generate the constant.
+     The code[0] will always be SET, with r[0] = GEN_INT (op[0])
+     being the initial value.  Subsequent intermediate results
+     r[i] are a function of code[i], op[i] and r[i-1].  */
+  HOST_WIDE_INT op[max_cost];
+  CODETYPE code[max_cost];
+
+  /* The number of operations required to generate the constant.  */
+  int cost;
+
+  genimm_base(HOST_WIDE_INT c) : value(c), cost(0) { }
+};
+
+/* The "middle" class in the hierarchy should be provided by the backend.
+   It should provide the following functions:
+
+   -- Recognize "simple" values that shouldn't be hashed, but normally
+      also used as the base of any recursive search.  Returns true if
+      the constant can be generated within BUDGET.
+   bool exam_simple (HOST_WIDE_INT c, machine_mode mode, int budget);
+
+   -- The maximum number of insns matched by exam_simple.
+   static const int max_simple;
+
+   -- The full search for a word_mode constant.  Normally recursive,
+      with decreasing values for BUDGET.  Normally written to assume
+      that the constant has failed to match exam_simple.
+   bool exam_search (HOST_WIDE_INT c, int budget);
+
+   -- The fallback generation for the most complex word_mode constants.
+      The receipe built will be the full MAX_COST insns, as we will
+      already have shown that the constant can't be built with fewer.
+   void exam_full (HOST_WIDE_INT c);
+
+   -- Generate code via the stored recipe, placing the final result in DEST.
+   void generate (rtx dest, machine_mode mode) const;
+
+   This backend class is the template argument to genimm_hash.
+*/
+
+template <typename BACKEND>
+class genimm_hash
+{
+  struct hasher : free_ptr_hash<BACKEND>
+  {
+    typedef HOST_WIDE_INT compare_type;
+    static hashval_t hash (BACKEND *i) { return i->value; }
+    static bool equal (BACKEND *a, compare_type b) { return a->value == b; }
+  };
+
+ public:
+  static BACKEND hash (HOST_WIDE_INT c, machine_mode mode);
+};
+
+template <typename BACKEND>
+BACKEND
+genimm_hash<BACKEND>::hash (HOST_WIDE_INT c, machine_mode mode)
+{
+  BACKEND data (c);
+
+  /* Don't hash simple constants.  Just return them immediately.  */
+  if (!data.exam_simple (c, mode, BACKEND::max_simple))
+    {
+      static hash_table<hasher> *htab;
+      if (htab == NULL)
+	htab = new hash_table<hasher> (128);
+
+      BACKEND **slot = htab->find_slot_with_hash (c, c, INSERT);
+      BACKEND *save = *slot;
+
+      /* Return a previously stored result.  */
+      if (save != NULL)
+	return *save;
+
+      /* Search for a solution to C with increasing budget...  */
+      bool ok = false;
+      for (int budget = 2; !ok && budget < BACKEND::max_cost - 1; ++budget)
+	ok = data.exam_search (c, budget);
+
+      /* ... otherwise, we know that C requires the maximum budget.  */
+      if (!ok)
+	data.exam_full (c);
+
+      *slot = new BACKEND (data);
+    }
+
+  return data;
+}
+
+#endif /* GENIMM_HASH_H */
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 13/15] alpha: Use hashing infrastructure for generating constants
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (2 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 12/15] aarch64: Test for duplicated 32-bit halves Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 15/15] alpha: Remove alpha_emit_set_long_const Richard Henderson
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches

---
	* config/alpha/alpha.c: Include genimm-hash.h
	(genimm_alpha): New class.
	(genimm_alpha::genimm_alpha): New.
	(genimm_alpha::set0, genimm_alpha::opN): New.
	(genimm_alpha::exam_simple): New.
	(genimm_alpha::exam_sub): New.
	(genimm_alpha::exam_search): Extract from the body of the
	old alpha_emit_set_const_1.
	(genimm_alpha::exam_full): New.
	(genimm_alpha::exam_generate): New.
	(alpha_emit_set_const_1): Remove.
	(alpha_emit_set_const): Rewrite using genimm_hash.
	(alpha_legitimate_constant_p): Use alpha_emit_set_const.
---
 gcc/config/alpha/alpha.c | 497 ++++++++++++++++++++++++-----------------------
 1 file changed, 250 insertions(+), 247 deletions(-)

diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index ca07cc7..6933601 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -77,6 +77,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "params.h"
 #include "builtins.h"
 #include "rtl-iter.h"
+#include "genimm-hash.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1726,249 +1727,277 @@ alpha_set_memflags (rtx seq, rtx ref)
       gcc_unreachable ();
 }
 \f
-static rtx alpha_emit_set_const (rtx, machine_mode, HOST_WIDE_INT,
-				 int, bool);
+namespace {
 
-/* Internal routine for alpha_emit_set_const to check for N or below insns.
-   If NO_OUTPUT is true, then we only check to see if N insns are possible,
-   and return pc_rtx if successful.  */
+/* All constants can be constructed in 5 insns.  */
+struct genimm_alpha : genimm_base<rtx_code, 5>
+{
+  static const int max_simple = 2;
 
-static rtx
-alpha_emit_set_const_1 (rtx target, machine_mode mode,
-			HOST_WIDE_INT c, int n, bool no_output)
+  genimm_alpha(HOST_WIDE_INT c);
+  void set0 (HOST_WIDE_INT o);
+  void opN (rtx_code r, HOST_WIDE_INT o);
+
+  bool exam_simple (HOST_WIDE_INT c, machine_mode mode, int budget);
+  bool exam_sub (HOST_WIDE_INT c, int budget);
+  bool exam_search (HOST_WIDE_INT c, int budget);
+  void exam_full (HOST_WIDE_INT c);
+  void generate (rtx dest, machine_mode mode) const;
+};
+
+genimm_alpha::genimm_alpha (HOST_WIDE_INT c)
+  : genimm_base (c)
 {
-  HOST_WIDE_INT new_const;
-  int i, bits;
-  /* Use a pseudo if highly optimizing and still generating RTL.  */
-  rtx subtarget
-    = (flag_expensive_optimizations && can_create_pseudo_p () ? 0 : target);
-  rtx temp, insn;
+#ifdef ENABLE_CHECKING
+  code[0] = code[1] = code[2] = code[3] = code[4] = UNKNOWN;
+  op[0] = op[1] = op[2] = op[3] = op[4] = 0;
+#endif
+}
+
+void
+genimm_alpha::set0 (HOST_WIDE_INT o)
+{
+  cost = 1;
+  code[0] = SET;
+  op[0] = o;
+}
+
+void
+genimm_alpha::opN (rtx_code r, HOST_WIDE_INT o)
+{
+  int n = cost++;
+  gcc_checking_assert (n > 0 && n < max_cost);
+  code[n] = r;
+  op[n] = o;
+}
 
-  /* If this is a sign-extended 32-bit constant, we can do this in at most
-     three insns, so do it if we have enough insns left.  */
+/* Only handle simple one and two insn constants.  */
 
-  if (c >> 31 == -1 || c >> 31 == 0)
+bool
+genimm_alpha::exam_simple (HOST_WIDE_INT c, machine_mode mode, int budget)
+{
+  HOST_WIDE_INT lo = sext_hwi (c, 16);
+  HOST_WIDE_INT hi = sext_hwi (c - lo, 32);
+
+  if (c == lo || c == hi)
+    {
+      set0 (c);
+      return true;
+    }
+  if (budget >= 2 && (c == hi + lo || mode == SImode))
     {
-      HOST_WIDE_INT low = ((c & 0xffff) ^ 0x8000) - 0x8000;
-      HOST_WIDE_INT tmp1 = c - low;
-      HOST_WIDE_INT high = (((tmp1 >> 16) & 0xffff) ^ 0x8000) - 0x8000;
-      HOST_WIDE_INT extra = 0;
+      set0 (hi);
+      opN (PLUS, lo);
+      return true;
+    }
+  return false;
+}
 
-      /* If HIGH will be interpreted as negative but the constant is
-	 positive, we must adjust it to do two ldha insns.  */
+bool
+genimm_alpha::exam_sub (HOST_WIDE_INT c, int budget)
+{
+  return (exam_simple (c, DImode, budget)
+	  || (budget > 1 && exam_search (c, budget)));
+}
 
-      if ((high & 0x8000) != 0 && c >= 0)
-	{
-	  extra = 0x4000;
-	  tmp1 -= 0x40000000;
-	  high = ((tmp1 >> 16) & 0xffff) - 2 * ((tmp1 >> 16) & 0x8000);
-	}
+/* The body of the recursive search for C within BUDGET.
+   We've already failed exam_simple.  */
 
-      if (c == low || (low == 0 && extra == 0))
+bool
+genimm_alpha::exam_search (HOST_WIDE_INT c, int budget)
+{
+  const int sub_budget = budget - 1;
+  HOST_WIDE_INT test;
+
+  /* Simple subtractions.  */
+  test = sext_hwi (c, 16);
+  if (test != 0)
+    {
+      if (exam_sub (c - test, sub_budget))
 	{
-	  /* We used to use copy_to_suggested_reg (GEN_INT (c), target, mode)
-	     but that meant that we can't handle INT_MIN on 32-bit machines
-	     (like NT/Alpha), because we recurse indefinitely through
-	     emit_move_insn to gen_movdi.  So instead, since we know exactly
-	     what we want, create it explicitly.  */
-
-	  if (no_output)
-	    return pc_rtx;
-	  if (target == NULL)
-	    target = gen_reg_rtx (mode);
-	  emit_insn (gen_rtx_SET (target, GEN_INT (c)));
-	  return target;
+	  opN (PLUS, test);
+	  return true;
 	}
-      else if (n >= 2 + (extra != 0))
+      /* Note that it doesn't help to try LDAH at this point.
+	 Zeroing bits in the middle of C when we know there
+	 are low bits set doesn't produce a simpler constant.  */
+    }
+  else
+    {
+      test = sext_hwi (c, 32);
+      if (test != 0)
 	{
-	  if (no_output)
-	    return pc_rtx;
-	  if (!can_create_pseudo_p ())
+	  if (exam_sub (c - test, sub_budget))
 	    {
-	      emit_insn (gen_rtx_SET (target, GEN_INT (high << 16)));
-	      temp = target;
+	      opN (PLUS, test);
+	      return true;
 	    }
-	  else
-	    temp = copy_to_suggested_reg (GEN_INT (high << 16),
-					  subtarget, mode);
 
-	  /* As of 2002-02-23, addsi3 is only available when not optimizing.
-	     This means that if we go through expand_binop, we'll try to
-	     generate extensions, etc, which will require new pseudos, which
-	     will fail during some split phases.  The SImode add patterns
-	     still exist, but are not named.  So build the insns by hand.  */
-
-	  if (extra != 0)
+	  /* Form constants between 0x80000000 - 0xfffe0000 by splitting
+	     the LDAH into two.  The ordering of operations here should
+	     prefer to load 0x7fff0000 into a register first, which would
+	     then be sharable by other constant loading sequences.  */
+	  HOST_WIDE_INT uhi = c & HOST_WIDE_INT_UC (0xffff0000);
+	  test = 0x7fff0000;
+	  if (uhi == c && uhi - test <= test)
 	    {
-	      if (! subtarget)
-		subtarget = gen_reg_rtx (mode);
-	      insn = gen_rtx_PLUS (mode, temp, GEN_INT (extra << 16));
-	      insn = gen_rtx_SET (subtarget, insn);
-	      emit_insn (insn);
-	      temp = subtarget;
+	      set0 (test);
+	      opN (PLUS, uhi - test);
+	      return true;
 	    }
-
-	  if (target == NULL)
-	    target = gen_reg_rtx (mode);
-	  insn = gen_rtx_PLUS (mode, temp, GEN_INT (low));
-	  insn = gen_rtx_SET (target, insn);
-	  emit_insn (insn);
-	  return target;
 	}
     }
 
-  /* If we couldn't do it that way, try some other methods.  But if we have
-     no instructions left, don't bother.  Likewise, if this is SImode and
-     we can't make pseudos, we can't do anything since the expand_binop
-     and expand_unop calls will widen and try to make pseudos.  */
-
-  if (n == 1 || (mode == SImode && !can_create_pseudo_p ()))
-    return 0;
-
-  /* Next, see if we can load a related constant and then shift and possibly
-     negate it to get the constant we want.  Try this once each increasing
-     numbers of insns.  */
+  /* Try complimenting.  */
+  if (exam_sub (~c, sub_budget))
+    {
+      opN (NOT, 0);
+      return true;
+    }
 
-  for (i = 1; i < n; i++)
+  /* Try to form a constant and do a left shift.  We can do this
+     if some low-order bits are zero.  The bits we are shifting out
+     could be any value, but here we'll just try the zero- and sign-
+     extended forms of the constant.  To try to increase the chance
+     of having the same constant in more than one insn, start at the
+     highest number of bits to shift, but try all possibilities in
+     case a ZAPNOT will be useful.  */
+  for (int bits = ctz_hwi (c); bits > 0; --bits)
     {
-      /* First, see if minus some low bits, we've an easy load of
-	 high bits.  */
+      /* First try with ones being shifted out.  */
+      test = c >> bits;
+      bool ok = exam_sub (test, sub_budget);
+      if (!ok && c < 0)
+	{
+	  /* If that failed, try with zeros being shifted out.  */
+	  test = (unsigned HOST_WIDE_INT)c >> bits;
+	  ok = exam_sub (test, sub_budget);
+	}
+      if (ok)
+	{
+	  opN (ASHIFT, bits);
+	  return true;
+	}
+    }
 
-      new_const = ((c & 0xffff) ^ 0x8000) - 0x8000;
-      if (new_const != 0)
+  /* Try a right shift, shifting in zeros.  */
+  for (int bits = clz_hwi (c); bits > 0; --bits)
+    {
+      test = c << bits;
+      bool ok = exam_sub (test, sub_budget);
+      if (!ok)
 	{
-          temp = alpha_emit_set_const (subtarget, mode, c - new_const, i, no_output);
-	  if (temp)
-	    {
-	      if (no_output)
-		return temp;
-	      return expand_binop (mode, add_optab, temp, GEN_INT (new_const),
-				   target, 0, OPTAB_WIDEN);
-	    }
+	  test |= (HOST_WIDE_INT_1U << bits) - 1;
+	  ok = exam_sub (test, sub_budget);
+	}
+      if (ok)
+	{
+	  opN (LSHIFTRT, bits);
+	  return true;
 	}
+    }
 
-      /* Next try complementing.  */
-      temp = alpha_emit_set_const (subtarget, mode, ~c, i, no_output);
-      if (temp)
+  /* Try a right shift, shifting in ones.  */
+  for (int bits = clz_hwi (~c) - 1; bits > 0; --bits)
+    {
+      test = c << bits;
+      bool ok = exam_sub (test, sub_budget);
+      if (!ok)
 	{
-	  if (no_output)
-	    return temp;
-	  return expand_unop (mode, one_cmpl_optab, temp, target, 0);
+	  test |= (HOST_WIDE_INT_1U << bits) - 1;
+	  ok = exam_sub (test, sub_budget);
 	}
+      if (ok)
+	{
+	  opN (ASHIFTRT, bits);
+	  return true;
+	}
+    }
 
-      /* Next try to form a constant and do a left shift.  We can do this
-	 if some low-order bits are zero; the exact_log2 call below tells
-	 us that information.  The bits we are shifting out could be any
-	 value, but here we'll just try the 0- and sign-extended forms of
-	 the constant.  To try to increase the chance of having the same
-	 constant in more than one insn, start at the highest number of
-	 bits to shift, but try all possibilities in case a ZAPNOT will
-	 be useful.  */
-
-      bits = exact_log2 (c & -c);
-      if (bits > 0)
-	for (; bits > 0; bits--)
-	  {
-	    new_const = c >> bits;
-	    temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output);
-	    if (!temp && c < 0)
-	      {
-		new_const = (unsigned HOST_WIDE_INT)c >> bits;
-		temp = alpha_emit_set_const (subtarget, mode, new_const,
-					     i, no_output);
-	      }
-	    if (temp)
-	      {
-		if (no_output)
-		  return temp;
-	        return expand_binop (mode, ashl_optab, temp, GEN_INT (bits),
-				     target, 0, OPTAB_WIDEN);
-	      }
-	  }
+  /* Try loading a value and zapping bytes.  */
+  test = c;
+  for (int i = 0; i < 64; i += 8)
+    {
+      HOST_WIDE_INT byte = HOST_WIDE_INT_UC (0xff) << i;
+      if ((c & byte) == 0)
+	test |= byte;
+    }
+  if (test != c && exam_sub (test, sub_budget))
+    {
+      opN (AND, c | ~test);
+      return true;
+    }
 
-      /* Now try high-order zero bits.  Here we try the shifted-in bits as
-	 all zero and all ones.  Be careful to avoid shifting outside the
-	 mode and to avoid shifting outside the host wide int size.  */
+  return false;
+}
 
-      bits = (MIN (HOST_BITS_PER_WIDE_INT, GET_MODE_SIZE (mode) * 8)
-	      - floor_log2 (c) - 1);
-      if (bits > 0)
-	for (; bits > 0; bits--)
-	  {
-	    new_const = c << bits;
-	    temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output);
-	    if (!temp)
-	      {
-		new_const = (c << bits) | ((HOST_WIDE_INT_1U << bits) - 1);
-	        temp = alpha_emit_set_const (subtarget, mode, new_const,
-					     i, no_output);
-	      }
-	    if (temp)
-	      {
-		if (no_output)
-		  return temp;
-		return expand_binop (mode, lshr_optab, temp, GEN_INT (bits),
-				     target, 1, OPTAB_WIDEN);
-	      }
-	  }
+/* Only handle full 64-bit constants requiring 5 insns.  */
 
-      /* Now try high-order 1 bits.  We get that with a sign-extension.
-	 But one bit isn't enough here.  Be careful to avoid shifting outside
-	 the mode and to avoid shifting outside the host wide int size.  */
+void
+genimm_alpha::exam_full (HOST_WIDE_INT c)
+{
+  HOST_WIDE_INT lo = sext_hwi (c, 16);
+  HOST_WIDE_INT hi = sext_hwi (c - lo, 32);
 
-      bits = (MIN (HOST_BITS_PER_WIDE_INT, GET_MODE_SIZE (mode) * 8)
-	      - floor_log2 (~ c) - 2);
-      if (bits > 0)
-	for (; bits > 0; bits--)
-	  {
-	    new_const = c << bits;
-	    temp = alpha_emit_set_const (subtarget, mode, new_const, i, no_output);
-	    if (!temp)
-	      {
-		new_const = (c << bits) | ((HOST_WIDE_INT_1U << bits) - 1);
-	        temp = alpha_emit_set_const (subtarget, mode, new_const,
-					     i, no_output);
-	      }
-	    if (temp)
-	      {
-		if (no_output)
-		  return temp;
-		return expand_binop (mode, ashr_optab, temp, GEN_INT (bits),
-				     target, 0, OPTAB_WIDEN);
-	      }
-	  }
-    }
+  bool ok = exam_simple ((c - hi - lo) >> 32, DImode, max_simple);
+  gcc_assert (ok);
 
-  /* Finally, see if can load a value into the target that is the same as the
-     constant except that all bytes that are 0 are changed to be 0xff.  If we
-     can, then we can do a ZAPNOT to obtain the desired constant.  */
+  opN (ASHIFT, 32);
+  if (hi)
+    opN (PLUS, hi);
+  if (lo)
+    opN (PLUS, lo);
+}
 
-  new_const = c;
-  for (i = 0; i < 64; i += 8)
-    if ((new_const & ((HOST_WIDE_INT) 0xff << i)) == 0)
-      new_const |= (HOST_WIDE_INT) 0xff << i;
+/* Emit insns for the generated recipe.  */
 
-  /* We are only called for SImode and DImode.  If this is SImode, ensure that
-     we are sign extended to a full word.  */
+void
+genimm_alpha::generate (rtx dest, machine_mode mode) const
+{
+  bool do_subtargets = optimize && can_create_pseudo_p ();
+  rtx op1 = NULL;
+  rtx_insn *insn = NULL;
+  int i, n = cost;
 
-  if (mode == SImode)
-    new_const = ((new_const & 0xffffffff) ^ 0x80000000) - 0x80000000;
+  gcc_checking_assert (n >= 1 && n <= max_cost);
+  gcc_checking_assert (code[0] == SET);
 
-  if (new_const != c)
+  for (i = 0; i < n; ++i)
     {
-      temp = alpha_emit_set_const (subtarget, mode, new_const, n - 1, no_output);
-      if (temp)
+      rtx sub = (do_subtargets && i + 1 < n ? gen_reg_rtx (mode) : dest);
+      rtx op2 = GEN_INT (op[i]);
+      rtx_code r = code[i];
+      rtx x;
+
+      switch (r)
 	{
-	  if (no_output)
-	    return temp;
-	  return expand_binop (mode, and_optab, temp, GEN_INT (c | ~ new_const),
-			       target, 0, OPTAB_WIDEN);
+	case SET:
+	  x = op2;
+	  break;
+	case AND:
+	case PLUS:
+	case ASHIFT:
+	case ASHIFTRT:
+	case LSHIFTRT:
+	  x = gen_rtx_fmt_ee (r, mode, op1, op2);
+	  break;
+	case NOT:
+	  x = gen_rtx_fmt_e (r, mode, op1);
+	  break;
+	default:
+	  gcc_unreachable ();
 	}
+
+      insn = emit_insn (gen_rtx_SET (sub, x));
+      op1 = sub;
     }
 
-  return 0;
+  if (n > 1)
+    set_unique_reg_note (insn, REG_EQUAL, GEN_INT (value));
 }
 
+} // anon namespace
+
 /* Try to output insns to set TARGET equal to the constant C if it can be
    done in less than N insns.  Do all computations in MODE.  Returns the place
    where the output has been placed if it can be done and the insns have been
@@ -1976,62 +2005,36 @@ alpha_emit_set_const_1 (rtx target, machine_mode mode,
    insns and emitted.  */
 
 static rtx
-alpha_emit_set_const (rtx target, machine_mode mode,
+alpha_emit_set_const (rtx target, machine_mode origmode,
 		      HOST_WIDE_INT c, int n, bool no_output)
 {
-  machine_mode orig_mode = mode;
-  rtx orig_target = target;
-  rtx result = 0;
-  int i;
+  machine_mode mode = origmode;
+  if (mode == V8QImode || mode == V4HImode || mode == V2SImode)
+    mode = DImode;
 
-  /* If we can't make any pseudos, TARGET is an SImode hard register, we
-     can't load this constant in one insn, do this in DImode.  */
-  if (!can_create_pseudo_p () && mode == SImode
-      && REG_P (target) && REGNO (target) < FIRST_PSEUDO_REGISTER)
-    {
-      result = alpha_emit_set_const_1 (target, mode, c, 1, no_output);
-      if (result)
-	return result;
+  genimm_alpha data = genimm_hash<genimm_alpha>::hash (c, mode);
+  if (data.cost > n)
+    return NULL;
 
-      target = no_output ? NULL : gen_lowpart (DImode, target);
-      mode = DImode;
-    }
-  else if (mode == V8QImode || mode == V4HImode || mode == V2SImode)
-    {
-      target = no_output ? NULL : gen_lowpart (DImode, target);
-      mode = DImode;
-    }
+  /* If we're not emitting output, we only need return a nonnull value.  */
+  if (no_output)
+    return pc_rtx;
 
-  /* Try 1 insn, then 2, then up to N.  */
-  for (i = 1; i <= n; i++)
+  if (origmode == mode)
+    data.generate (target, mode);
+  else
     {
-      result = alpha_emit_set_const_1 (target, mode, c, i, no_output);
-      if (result)
+      rtx low = gen_lowpart (mode, target);
+      if (can_create_pseudo_p ())
 	{
-	  rtx_insn *insn;
-	  rtx set;
-
-	  if (no_output)
-	    return result;
-
-	  insn = get_last_insn ();
-	  set = single_set (insn);
-	  if (! CONSTANT_P (SET_SRC (set)))
-	    set_unique_reg_note (get_last_insn (), REG_EQUAL, GEN_INT (c));
-	  break;
+	  target = gen_reg_rtx (mode);
+	  data.generate (target, mode);
+	  emit_move_insn (low, target);
 	}
+      else
+	data.generate (low, mode);
     }
-
-  /* Allow for the case where we changed the mode of TARGET.  */
-  if (result)
-    {
-      if (result == target)
-	result = orig_target;
-      else if (mode != orig_mode)
-	result = gen_lowpart (orig_mode, result);
-    }
-
-  return result;
+  return target;
 }
 
 /* Having failed to find a 3 insn sequence in alpha_emit_set_const,
@@ -2129,7 +2132,7 @@ alpha_legitimate_constant_p (machine_mode mode, rtx x)
       mode = DImode;
       gcc_assert (CONST_WIDE_INT_NUNITS (x) == 2);
       i0 = CONST_WIDE_INT_ELT (x, 1);
-      if (alpha_emit_set_const_1 (NULL_RTX, mode, i0, 3, true) == NULL)
+      if (alpha_emit_set_const (NULL_RTX, mode, i0, 3, true) == NULL)
 	return false;
       i0 = CONST_WIDE_INT_ELT (x, 0);
       goto do_integer;
@@ -2153,7 +2156,7 @@ alpha_legitimate_constant_p (machine_mode mode, rtx x)
 	return true;
       i0 = alpha_extract_integer (x);
     do_integer:
-      return alpha_emit_set_const_1 (NULL_RTX, mode, i0, 3, true) != NULL;
+      return alpha_emit_set_const (NULL_RTX, mode, i0, 3, true) != NULL;
 
     default:
       return false;
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 15/15] alpha: Remove alpha_emit_set_long_const
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (3 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 13/15] alpha: Use hashing infrastructure for generating constants Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 02/15] rs6000: Make num_insns_constant_wide static Richard Henderson
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches

It's functionality now folded into alpha_emit_set_const.
---
	* config/alpha/alpha.c (alpha_emit_set_long_const): Remove.
	(alpha_split_const_mov): Don't call it.
	(alpha_expand_epilogue): Likewise.
	(alpha_output_mi_thunk_osf): Likewise.
---
 gcc/config/alpha/alpha.c | 76 ++++++++----------------------------------------
 1 file changed, 12 insertions(+), 64 deletions(-)

diff --git a/gcc/config/alpha/alpha.c b/gcc/config/alpha/alpha.c
index cc25250..c0d8900 100644
--- a/gcc/config/alpha/alpha.c
+++ b/gcc/config/alpha/alpha.c
@@ -2045,49 +2045,6 @@ alpha_emit_set_const (rtx target, machine_mode origmode,
   return target;
 }
 
-/* Having failed to find a 3 insn sequence in alpha_emit_set_const,
-   fall back to a straight forward decomposition.  We do this to avoid
-   exponential run times encountered when looking for longer sequences
-   with alpha_emit_set_const.  */
-
-static rtx
-alpha_emit_set_long_const (rtx target, HOST_WIDE_INT c1)
-{
-  HOST_WIDE_INT d1, d2, d3, d4;
-
-  /* Decompose the entire word */
-
-  d1 = ((c1 & 0xffff) ^ 0x8000) - 0x8000;
-  c1 -= d1;
-  d2 = ((c1 & 0xffffffff) ^ 0x80000000) - 0x80000000;
-  c1 = (c1 - d2) >> 32;
-  d3 = ((c1 & 0xffff) ^ 0x8000) - 0x8000;
-  c1 -= d3;
-  d4 = ((c1 & 0xffffffff) ^ 0x80000000) - 0x80000000;
-  gcc_assert (c1 == d4);
-
-  /* Construct the high word */
-  if (d4)
-    {
-      emit_move_insn (target, GEN_INT (d4));
-      if (d3)
-	emit_move_insn (target, gen_rtx_PLUS (DImode, target, GEN_INT (d3)));
-    }
-  else
-    emit_move_insn (target, GEN_INT (d3));
-
-  /* Shift it into place */
-  emit_move_insn (target, gen_rtx_ASHIFT (DImode, target, GEN_INT (32)));
-
-  /* Add in the low bits.  */
-  if (d2)
-    emit_move_insn (target, gen_rtx_PLUS (DImode, target, GEN_INT (d2)));
-  if (d1)
-    emit_move_insn (target, gen_rtx_PLUS (DImode, target, GEN_INT (d1)));
-
-  return target;
-}
-
 /* Given an integral CONST_INT or CONST_VECTOR, return the low 64 bits.  */
 
 static HOST_WIDE_INT
@@ -2177,16 +2134,10 @@ alpha_legitimate_constant_p (machine_mode mode, rtx x)
 bool
 alpha_split_const_mov (machine_mode mode, rtx *operands)
 {
-  HOST_WIDE_INT i0;
-  rtx temp = NULL_RTX;
-
-  i0 = alpha_extract_integer (operands[1]);
-
-  temp = alpha_emit_set_const (operands[0], mode, i0, 3);
-
-  if (!temp && TARGET_BUILD_CONSTANTS)
-    temp = alpha_emit_set_long_const (operands[0], i0);
-
+  HOST_WIDE_INT i0 = alpha_extract_integer (operands[1]);
+  rtx temp = alpha_emit_set_const (operands[0], mode, i0,
+				   TARGET_BUILD_CONSTANTS
+				   ? genimm_alpha::max_cost : 3);
   if (temp)
     {
       if (!rtx_equal_p (operands[0], temp))
@@ -8220,14 +8171,10 @@ alpha_expand_epilogue (void)
       else
 	{
 	  rtx tmp = gen_rtx_REG (DImode, 23);
-	  sp_adj2 = alpha_emit_set_const (tmp, DImode, frame_size, 3);
-	  if (!sp_adj2)
-	    {
-	      /* We can't drop new things to memory this late, afaik,
-		 so build it up by pieces.  */
-	      sp_adj2 = alpha_emit_set_long_const (tmp, frame_size);
-	      gcc_assert (sp_adj2);
-	    }
+	  /* We can't drop new things to memory this late, afaik,
+	     so force the constant to be built by pieces.  */
+	  sp_adj2 = alpha_emit_set_const (tmp, DImode, frame_size,
+					  genimm_alpha::max_cost);
 	}
 
       /* From now on, things must be in order.  So emit blockages.  */
@@ -8352,7 +8299,8 @@ alpha_output_mi_thunk_osf (FILE *file, tree thunk_fndecl ATTRIBUTE_UNUSED,
     }
   else
     {
-      rtx tmp = alpha_emit_set_long_const (gen_rtx_REG (Pmode, 0), delta);
+      rtx tmp = alpha_emit_set_const (gen_rtx_REG (Pmode, 0), Pmode, delta,
+				      genimm_alpha::max_cost);
       emit_insn (gen_adddi3 (this_rtx, this_rtx, tmp));
     }
 
@@ -8373,8 +8321,8 @@ alpha_output_mi_thunk_osf (FILE *file, tree thunk_fndecl ATTRIBUTE_UNUSED,
 	}
       else
 	{
-	  tmp2 = alpha_emit_set_long_const (gen_rtx_REG (Pmode, 1),
-					    vcall_offset);
+	  tmp2 = alpha_emit_set_const (gen_rtx_REG (Pmode, 1), Pmode,
+				       vcall_offset, genimm_alpha::max_cost);
           emit_insn (gen_adddi3 (tmp, tmp, tmp2));
 	  lo = 0;
 	}
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 10/15] rs6000: Use rotldi in constant generation
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (13 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 07/15] rs6000: Generalize left shift in constant generation Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  8:32 ` [PATCH ppc64,aarch64,alpha 00/15] Improve backend " Segher Boessenkool
  2015-08-12  8:32 ` Richard Earnshaw
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Helps for constants like 0x3300000000000033ul and 0xf5555ffffffffffful.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (genimm_ppc::exam_rotl): New.
	(genimm_ppc::exam_search): Use it.
	(genimm_ppc::generate): Handle ROTATE.
---
 gcc/config/rs6000/rs6000.c | 51 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 40b29b0..f5d6fdf 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8059,6 +8059,7 @@ struct genimm_ppc : genimm_base <rtx_code, 5>
   bool exam_simple (HOST_WIDE_INT c, machine_mode, int budget);
   bool exam_sub (HOST_WIDE_INT c, int budget);
   bool exam_mask (HOST_WIDE_INT c, HOST_WIDE_INT mask, int sub_budget);
+  bool exam_rotl (HOST_WIDE_INT c, int bits);
   bool exam_search (HOST_WIDE_INT c, int budget);
   void exam_full (HOST_WIDE_INT c);
   void generate (rtx dest, machine_mode mode) const;
@@ -8137,6 +8138,24 @@ genimm_ppc::exam_mask (HOST_WIDE_INT c, HOST_WIDE_INT mask, int sub_budget)
   return false;
 }
 
+/* If we're able to rotate a 16-bit signed constant to form C,
+   return true and fill in the recipe.  */
+
+bool
+genimm_ppc::exam_rotl (HOST_WIDE_INT c, int bits)
+{
+  HOST_WIDE_INT sub_c = (unsigned HOST_WIDE_INT)c >> bits;
+  sub_c |= (unsigned HOST_WIDE_INT)c << (64 - bits);
+
+  if ((unsigned HOST_WIDE_INT)sub_c + 0x8000 < 0x10000)
+    {
+      set0 (sub_c);
+      opN (ROTATE, bits);
+      return true;
+    }
+  return false;
+}
+
 /* The body of the recursive search for C within BUDGET.
    We've already failed exam_simple.  */
 
@@ -8212,6 +8231,35 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
 	}
     }
 
+  /* Rotate the constant left.  Because of combinatorial complexity,
+     only consider this with a 16-bit base, which means there's no
+     point in considering this high in the search tree.  */
+  if (sub_budget == 1)
+    {
+      /* The 16-bit signed constant upon which we are going to base
+	 the rotate can have no more than 15 bits set as a positive
+	 or no less than 49 bits set as a negative.  */
+      bits = popcount_hwi (c);
+      if (bits <= 15)
+	{
+	  /* The constant must be positive, and it must have failed
+	     the simpler shift test above.  Therefore, any success
+	     will be with a rotate of more than 48.  */
+	  bits = ctz_hwi (c & ~ HOST_WIDE_INT_UC (0x7fff));
+	  if (bits > 48 && exam_rotl (c, bits))
+	    return true;
+	}
+      else if (bits > 48)
+	{
+	  /* The constant must be negative, and it must have rotated
+	     copies of the sign bit around into the low order bits.
+	     Those copies must be the number of rotations.  */
+	  bits = ctz_hwi (~c);
+	  if (exam_rotl (c, bits))
+	    return true;
+	}
+    }
+
   return false;
 }
 
@@ -8255,6 +8303,9 @@ genimm_ppc::generate (rtx dest, machine_mode mode) const
 	case SET:
 	  x = op2;
 	  break;
+	case ROTATE:
+	  gcc_assert (mode == DImode);
+	  /* FALLTHRU */
 	case PLUS:
 	case AND:
 	case IOR:
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 09/15] rs6000: Use xoris in constant construction
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (5 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 02/15] rs6000: Make num_insns_constant_wide static Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 04/15] rs6000: Implement set_const_data infrastructure Richard Henderson
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Helps for constants like 0xfffff70008000ul, 0xffffffff55555555ul,
0xffffffff550ffffful.

There doesn't appear to be any benefit to using xori; every test that
I expected to use it found an alternate solution of the same cost.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (genimm_ppc::exam_search): Test for
	inverting the second half-word.
	(genimm_ppc::generate): Handle XOR.
---
 gcc/config/rs6000/rs6000.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9c08cca..40b29b0 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8169,6 +8169,11 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
 	  return true;
 	}
     }
+  if (exam_sub (c | 0xffff0000u, sub_budget))
+    {
+      opN (XOR, ~c & 0xffff0000u); /* XORIS */
+      return true;
+    }
 
   /* If C is a mask itself, apply it to all ones.  */
   if (exam_mask (-1, c, sub_budget))
@@ -8253,6 +8258,7 @@ genimm_ppc::generate (rtx dest, machine_mode mode) const
 	case PLUS:
 	case AND:
 	case IOR:
+	case XOR:
 	case ASHIFT:
 	  x = gen_rtx_fmt_ee (r, mode, op1, op2);
 	  break;
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 03/15] rs6000: Tidy num_insns_constant vs CONST_DOUBLE
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (10 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 08/15] rs6000: Generalize masking in constant generation Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 14/15] alpha: Split out alpha_cost_set_const Richard Henderson
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

After the fact I noticed this does temporarily remove the
rs6000_is_valid_and_mask test from the DFmode path, but that
will be fixed shortly.  And it was missing from the SFmode
path the whole time.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (num_insns_constant): Share code between
	single-precision and double-precision paths.  Form 64-bit constant
	for 64-bit guest.
---
 gcc/config/rs6000/rs6000.c | 54 ++++++++++++++++++----------------------------
 1 file changed, 21 insertions(+), 33 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index abaf7eb..c25aa60 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5299,50 +5299,38 @@ num_insns_constant (rtx op, machine_mode mode)
 	return ins;
       }
 
-      case CONST_DOUBLE:
+    case CONST_DOUBLE:
+      {
+	REAL_VALUE_TYPE rv;
+	long l[2];
+
 	if (mode == SFmode || mode == SDmode)
 	  {
-	    long l;
-	    REAL_VALUE_TYPE rv;
-
 	    REAL_VALUE_FROM_CONST_DOUBLE (rv, op);
 	    if (DECIMAL_FLOAT_MODE_P (mode))
-	      REAL_VALUE_TO_TARGET_DECIMAL32 (rv, l);
+	      REAL_VALUE_TO_TARGET_DECIMAL32 (rv, l[0]);
 	    else
-	      REAL_VALUE_TO_TARGET_SINGLE (rv, l);
-	    return num_insns_constant_wide ((HOST_WIDE_INT) l);
+	      REAL_VALUE_TO_TARGET_SINGLE (rv, l[0]);
+	    low = l[0];
 	  }
-
-	long l[2];
-	REAL_VALUE_TYPE rv;
-
-	REAL_VALUE_FROM_CONST_DOUBLE (rv, op);
-	if (DECIMAL_FLOAT_MODE_P (mode))
-	  REAL_VALUE_TO_TARGET_DECIMAL64 (rv, l);
-	else
-	  REAL_VALUE_TO_TARGET_DOUBLE (rv, l);
-	high = l[WORDS_BIG_ENDIAN == 0];
-	low  = l[WORDS_BIG_ENDIAN != 0];
-
-	if (TARGET_32BIT)
-	  return (num_insns_constant_wide (low)
-		  + num_insns_constant_wide (high));
 	else
 	  {
-	    if ((high == 0 && low >= 0)
-		|| (high == -1 && low < 0))
-	      return num_insns_constant_wide (low);
-
-	    else if (rs6000_is_valid_and_mask (op, mode))
-	      return 2;
+	    REAL_VALUE_FROM_CONST_DOUBLE (rv, op);
+	    if (DECIMAL_FLOAT_MODE_P (mode))
+	      REAL_VALUE_TO_TARGET_DECIMAL64 (rv, l);
+	    else
+	      REAL_VALUE_TO_TARGET_DOUBLE (rv, l);
+	    high = l[WORDS_BIG_ENDIAN == 0];
+	    low  = l[WORDS_BIG_ENDIAN != 0];
 
-	    else if (low == 0)
-	      return num_insns_constant_wide (high) + 1;
+	    if (TARGET_32BIT)
+	      return (num_insns_constant_wide (low)
+		      + num_insns_constant_wide (high));
 
-	    else
-	      return (num_insns_constant_wide (high)
-		      + num_insns_constant_wide (low) + 1);
+	    low = (low & 0xfffffffful) | (high << 32);
 	  }
+	return num_insns_constant_wide (low);
+      }
 
     default:
       gcc_unreachable ();
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 12/15] aarch64: Test for duplicated 32-bit halves
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
  2015-08-12  1:11 ` [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide Richard Henderson
  2015-08-12  1:11 ` [PATCH 05/15] rs6000: Move constant via mask into build_set_const_data Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 13/15] alpha: Use hashing infrastructure for generating constants Richard Henderson
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcus Shawcroft, Richard Earnshaw

Cc: Marcus Shawcroft <marcus.shawcroft@arm.com>
Cc: Richard Earnshaw <richard.earnshaw@arm.com>
---
	* config/aarch64/aarch64.c (AA_GI_DUP): New.
	(genimm_aa64::exam_full): Test for equal 32-bit parts.
	(genimm_aa64::generate): Handle AA_GI_DUP.
---
 gcc/config/aarch64/aarch64.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6b12a07..828526e 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1328,6 +1328,7 @@ STATIC_ASSERT (((int)AND & ~48) != 0);
 
 enum aa_gi_code
 {
+  AA_GI_DUP = -3,
   AA_GI_NIL = -2,
   AA_GI_SET = -1,
 
@@ -1565,6 +1566,17 @@ genimm_aa64::exam_full (unsigned HOST_WIDE_INT val)
       return;
     }
 
+  /* If the two halves of the constant are the same, use an insert.
+     Since we have already excluded one_match and zero_match == 2,
+     this must require three insns to generate.  */
+  if ((val >> 32) == (val & 0xffffffffu))
+    {
+      set0 (val & 0xffff);
+      insN (16, val);
+      opN (AA_GI_DUP, 32);
+      return;
+    }
+
  simple_sequence:
   cost = 0;
   for (int i = 0; i < 64; i += 16)
@@ -1629,6 +1641,11 @@ genimm_aa64::generate (rtx dest, machine_mode mode) const
 	  else
 	    x = gen_insv_immdi (dest, GEN_INT ((int)code[i]), x);
 	  break;
+	case AA_GI_DUP:
+	  x = gen_rtx_ASHIFT (mode, dest, x);
+	  x = gen_rtx_IOR (mode, x, dest);
+	  x = gen_rtx_SET (dest, x);
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 07/15] rs6000: Generalize left shift in constant generation
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (12 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 14/15] alpha: Split out alpha_cost_set_const Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 10/15] rs6000: Use rotldi " Richard Henderson
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Rather than always shift by 32, shift by the total number of zeros required.
The improvement to also look to shift out ones comes from Kenner via alpha.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (genimm_ppc::exam_search): Use ctz to
	compute the shift.  Also attempt to shift out ones.
---
 gcc/config/rs6000/rs6000.c | 21 +++++++++++++++++----
 1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 59c5014..b27e476 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8145,6 +8145,7 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
 {
   const int sub_budget = budget - 1;
   HOST_WIDE_INT test;
+  int bits;
 
   /* Simple arithmetic on the low 32-bits.  */
   test = c & 0xffff;
@@ -8181,11 +8182,23 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
     }
 
   /* Shift the constant left.  */
-  test = HOST_WIDE_INT_UC (0xffffffff00000000);
-  if ((c & test) == c && exam_sub (c >> 32, sub_budget))
+  bits = ctz_hwi (c);
+  if (bits > 0)
     {
-      opN (ASHIFT, 32); /* SLDI */
-      return true;
+      /* First try with zeros being shifted out.  */
+      HOST_WIDE_INT sub_c = (unsigned HOST_WIDE_INT)c >> bits;
+      bool ok = exam_sub (sub_c, sub_budget);
+      if (!ok)
+	{
+	  /* If that failed, try with ones being shifted out.  */
+	  sub_c |= ~(HOST_WIDE_INT_M1U >> bits);
+	  ok = exam_sub (sub_c, sub_budget);
+	}
+      if (ok)
+	{
+	  opN (ASHIFT, bits); /* SLDI */
+	  return true;
+	}
     }
 
   return false;
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 08/15] rs6000: Generalize masking in constant generation
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (9 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 11/15] aarch64: Use hashing infrastructure for generating constants Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 03/15] rs6000: Tidy num_insns_constant vs CONST_DOUBLE Richard Henderson
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Rather than only consider -1 as a base, consider any negative number.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (genimm_ppc::exam_search): Attempt to
	mask negative numbers.
---
 gcc/config/rs6000/rs6000.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index b27e476..9c08cca 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8174,6 +8174,12 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
   if (exam_mask (-1, c, sub_budget))
     return true;
 
+  /* If the low 32-bits are negative, see if we can mask
+     those with RLDICL.  */
+  test = ((c & 0xffffffffu) ^ 0x80000000u) - 0x80000000u;
+  if (test < 0 && exam_mask (test, c | 0xffffffffu, sub_budget))
+    return true;
+
   /* If the two halves are equal, use an insert.  */
   if (c >> 32 == test && exam_sub (test, sub_budget))
     {
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 11/15] aarch64: Use hashing infrastructure for generating constants
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (8 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 06/15] rs6000: Use rldiwi in constant construction Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 08/15] rs6000: Generalize masking in constant generation Richard Henderson
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: Marcus Shawcroft, Richard Earnshaw

Aside from the hashing, and the splitting of insn generation from
recipe creation, there's no algorithmic change.

Cc: Marcus Shawcroft <marcus.shawcroft@arm.com>
Cc: Richard Earnshaw <richard.earnshaw@arm.com>
---
	* config/aarch64/aarch64.c: Include genimm-hash.h
	(aa_gi_code): New enum.
	(genimm_aa64): New class.
	(genimm_aa64::genimm_aa64): New.
	(genimm_aa64::set0, genimm_aa64::opN, genimm_aa64::insN): New.
	(genimm_aa64::exam_simple): New.
	(genimm_aa64::exam_plus): New.
	(genimm_aa64::generate): New.
	(genimm_aa64::exam_full): Extract from the body of the
	old aarch64_internal_mov_immediate.
	(aarch64_internal_mov_immediate): Rewrite using genimm_hash.
---
 gcc/config/aarch64/aarch64.c | 446 +++++++++++++++++++++++++------------------
 1 file changed, 256 insertions(+), 190 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 1394ed7..6b12a07 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -76,6 +76,7 @@
 #include "sched-int.h"
 #include "cortex-a57-fma-steering.h"
 #include "target-globals.h"
+#include "genimm-hash.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -1317,54 +1318,144 @@ aarch64_add_offset (machine_mode mode, rtx temp, rtx reg, HOST_WIDE_INT offset)
   return plus_constant (mode, reg, offset);
 }
 
-static int
-aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
-				machine_mode mode)
+namespace {
+
+/* In order to simplify the below, make sure none of the
+   given rtx codes are in {0,16,32,48}.  */
+STATIC_ASSERT (((int)PLUS & ~48) != 0);
+STATIC_ASSERT (((int)IOR & ~48) != 0);
+STATIC_ASSERT (((int)AND & ~48) != 0);
+
+enum aa_gi_code
 {
-  unsigned HOST_WIDE_INT mask;
-  int i;
-  bool first;
-  unsigned HOST_WIDE_INT val;
-  bool subtargets;
-  rtx subtarget;
-  int one_match, zero_match, first_not_ffff_match;
-  int num_insns = 0;
+  AA_GI_NIL = -2,
+  AA_GI_SET = -1,
+
+  AA_GI_INS0 = 0,
+  AA_GI_INS1 = 16,
+  AA_GI_INS2 = 32,
+  AA_GI_INS3 = 48,
+
+  AA_GI_PLUS = PLUS,
+  AA_GI_IOR = IOR,
+  AA_GI_AND = AND
+};
+
+struct genimm_aa64 : genimm_base<aa_gi_code, 4>
+{
+  static const int max_simple = 2;
+
+  static rtx_code aa_gi_binop(aa_gi_code c)
+  {
+    return (c == AA_GI_PLUS || c == AA_GI_IOR || c == AA_GI_AND
+	    ? (rtx_code)c : UNKNOWN);
+  }
+
+  genimm_aa64 (HOST_WIDE_INT c);
+
+  void set0 (HOST_WIDE_INT v);
+  void opN (aa_gi_code o, HOST_WIDE_INT v);
+  void insN (int b, unsigned HOST_WIDE_INT v);
+
+  /* The search algorithm that we use for aarch64 is non-recursive.
+     Thus we do not require the iteration provided by genimm_hash.
+     Produce an empty loop and go straight to exam_full.  */
+  bool exam_search (unsigned HOST_WIDE_INT, int) { return false; }
+
+  bool exam_simple (HOST_WIDE_INT val, machine_mode mode, int);
+  bool exam_plus (unsigned HOST_WIDE_INT val, unsigned HOST_WIDE_INT base);
+  void exam_full (unsigned HOST_WIDE_INT val);
+  void generate (rtx dest, machine_mode mode) const;
+};
 
-  if (CONST_INT_P (imm) && aarch64_move_imm (INTVAL (imm), mode))
+genimm_aa64::genimm_aa64 (HOST_WIDE_INT c)
+  : genimm_base (c)
+{
+#ifdef ENABLE_CHECKING
+  code[0] = code[1] = code[2] = code[3] = AA_GI_NIL;
+  op[0] = op[1] = op[2] = op[3] = 0;
+#endif
+}
+
+void
+genimm_aa64::set0 (HOST_WIDE_INT v)
+{
+  cost = 1;
+  code[0] = AA_GI_SET;
+  op[0] = v;
+}
+
+void
+genimm_aa64::opN (aa_gi_code c, HOST_WIDE_INT v)
+{
+  int n = cost++;
+  gcc_checking_assert (n > 0 && n < max_cost);
+  code[n] = c;
+  op[n] = v;
+}
+
+void
+genimm_aa64::insN (int b, unsigned HOST_WIDE_INT v)
+{
+  int n = cost++;
+  gcc_checking_assert (n > 0 && n < max_cost);
+  gcc_checking_assert ((b & ~48) == 0);
+  code[n] = (aa_gi_code)b;
+  op[n] = (v >> b) & 0xffff;
+}
+
+/* Look for simple constants that aren't worth hashing.  */
+
+bool
+genimm_aa64::exam_simple (HOST_WIDE_INT val, machine_mode mode, int)
+{
+  if (aarch64_move_imm (val, mode))
     {
-      if (generate)
-	emit_insn (gen_rtx_SET (dest, imm));
-      num_insns++;
-      return num_insns;
+      set0 (val);
+      return true;
     }
-
   if (mode == SImode)
     {
       /* We know we can't do this in 1 insn, and we must be able to do it
 	 in two; so don't mess around looking for sequences that don't buy
 	 us anything.  */
-      if (generate)
-	{
-	  emit_insn (gen_rtx_SET (dest, GEN_INT (INTVAL (imm) & 0xffff)));
-	  emit_insn (gen_insv_immsi (dest, GEN_INT (16),
-				     GEN_INT ((INTVAL (imm) >> 16) & 0xffff)));
-	}
-      num_insns += 2;
-      return num_insns;
+      set0 (val & 0xffff);
+      insN (16, val);
+      return true;
     }
+  return false;
+}
 
-  /* Remaining cases are all for DImode.  */
+/* A subroutine of genimm_aa64::exam_full.  If VAL can be created from BASE
+   via the addition of a constant, construct the recipe as appropriate and
+   return true.  Otherwise return false.  */
 
-  val = INTVAL (imm);
-  subtargets = optimize && can_create_pseudo_p ();
+bool
+genimm_aa64::exam_plus (unsigned HOST_WIDE_INT val, unsigned HOST_WIDE_INT base)
+{
+  HOST_WIDE_INT diff = val - base;
+  if (aarch64_uimm12_shift (diff < 0 ? -diff : diff))
+    {
+      set0 (base);
+      opN (AA_GI_PLUS, diff);
+      return true;
+    }
+  return false;
+}
 
-  one_match = 0;
-  zero_match = 0;
-  mask = 0xffff;
-  first_not_ffff_match = -1;
+/* Examine the DImode quantity VAL, and store a recipe for its creation.  */
 
-  for (i = 0; i < 64; i += 16, mask <<= 16)
+void
+genimm_aa64::exam_full (unsigned HOST_WIDE_INT val)
+{
+  unsigned HOST_WIDE_INT mask;
+  int one_match = 0;
+  int zero_match = 0;
+  int first_not_ffff_match = -1;
+
+  for (int i = 0; i < 64; i += 16)
     {
+      mask = HOST_WIDE_INT_UC (0xffff) << i;
       if ((val & mask) == mask)
 	one_match++;
       else
@@ -1379,211 +1470,186 @@ aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
   if (one_match == 2)
     {
       /* Set one of the quarters and then insert back into result.  */
-      mask = 0xffffll << first_not_ffff_match;
-      if (generate)
-	{
-	  emit_insn (gen_rtx_SET (dest, GEN_INT (val | mask)));
-	  emit_insn (gen_insv_immdi (dest, GEN_INT (first_not_ffff_match),
-				     GEN_INT ((val >> first_not_ffff_match)
-					      & 0xffff)));
-	}
-      num_insns += 2;
-      return num_insns;
+      mask = HOST_WIDE_INT_UC (0xffff) << first_not_ffff_match;
+      set0 (val | mask);
+      insN (first_not_ffff_match, val);
+      return;
     }
 
   if (zero_match == 2)
     goto simple_sequence;
 
-  mask = 0x0ffff0000UL;
-  for (i = 16; i < 64; i += 16, mask <<= 16)
+  for (int i = 16; i < 64; i += 16, mask <<= 16)
     {
-      HOST_WIDE_INT comp = mask & ~(mask - 1);
+      unsigned HOST_WIDE_INT comp = HOST_WIDE_INT_1U << i;
+      mask = HOST_WIDE_INT_UC (0xffff) << i;
 
-      if (aarch64_uimm12_shift (val - (val & mask)))
-	{
-	  if (generate)
-	    {
-	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	      emit_insn (gen_rtx_SET (subtarget, GEN_INT (val & mask)));
-	      emit_insn (gen_adddi3 (dest, subtarget,
-				     GEN_INT (val - (val & mask))));
-	    }
-	  num_insns += 2;
-	  return num_insns;
-	}
-      else if (aarch64_uimm12_shift (-(val - ((val + comp) & mask))))
-	{
-	  if (generate)
-	    {
-	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	      emit_insn (gen_rtx_SET (subtarget,
-				      GEN_INT ((val + comp) & mask)));
-	      emit_insn (gen_adddi3 (dest, subtarget,
-				     GEN_INT (val - ((val + comp) & mask))));
-	    }
-	  num_insns += 2;
-	  return num_insns;
-	}
-      else if (aarch64_uimm12_shift (val - ((val - comp) | ~mask)))
-	{
-	  if (generate)
-	    {
-	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	      emit_insn (gen_rtx_SET (subtarget,
-				      GEN_INT ((val - comp) | ~mask)));
-	      emit_insn (gen_adddi3 (dest, subtarget,
-				     GEN_INT (val - ((val - comp) | ~mask))));
-	    }
-	  num_insns += 2;
-	  return num_insns;
-	}
-      else if (aarch64_uimm12_shift (-(val - (val | ~mask))))
-	{
-	  if (generate)
-	    {
-	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	      emit_insn (gen_rtx_SET (subtarget, GEN_INT (val | ~mask)));
-	      emit_insn (gen_adddi3 (dest, subtarget,
-				     GEN_INT (val - (val | ~mask))));
-	    }
-	  num_insns += 2;
-	  return num_insns;
-	}
+      if (exam_plus (val, val & mask))
+	return;
+      if (exam_plus (val, (val + comp) & mask))
+	return;
+      if (exam_plus (val, (val - comp) | ~mask))
+	return;
+      if (exam_plus (val, val | ~mask))
+	return;
     }
 
-  /* See if we can do it by arithmetically combining two
-     immediates.  */
-  for (i = 0; i < AARCH64_NUM_BITMASKS; i++)
+  /* See if we can do it by arithmetically combining two immediates.  */
+  for (int i = 0; i < AARCH64_NUM_BITMASKS; i++)
     {
-      int j;
-      mask = 0xffff;
+      unsigned HOST_WIDE_INT bmi = aarch64_bitmasks[i];
 
-      if (aarch64_uimm12_shift (val - aarch64_bitmasks[i])
-	  || aarch64_uimm12_shift (-val + aarch64_bitmasks[i]))
-	{
-	  if (generate)
-	    {
-	      subtarget = subtargets ? gen_reg_rtx (DImode) : dest;
-	      emit_insn (gen_rtx_SET (subtarget,
-				      GEN_INT (aarch64_bitmasks[i])));
-	      emit_insn (gen_adddi3 (dest, subtarget,
-				     GEN_INT (val - aarch64_bitmasks[i])));
-	    }
-	  num_insns += 2;
-	  return num_insns;
-	}
+      if (exam_plus (val, bmi))
+	return;
 
-      for (j = 0; j < 64; j += 16, mask <<= 16)
+      for (int j = 0; j < 64; j += 16)
 	{
-	  if ((aarch64_bitmasks[i] & ~mask) == (val & ~mask))
+          mask = HOST_WIDE_INT_UC (0xffff) << j;
+	  if ((bmi & ~mask) == (val & ~mask))
 	    {
-	      if (generate)
-		{
-		  emit_insn (gen_rtx_SET (dest,
-					  GEN_INT (aarch64_bitmasks[i])));
-		  emit_insn (gen_insv_immdi (dest, GEN_INT (j),
-					     GEN_INT ((val >> j) & 0xffff)));
-		}
-	      num_insns += 2;
-	      return num_insns;
+	      set0 (bmi);
+	      insN (j, val);
+	      return;
 	    }
 	}
     }
 
   /* See if we can do it by logically combining two immediates.  */
-  for (i = 0; i < AARCH64_NUM_BITMASKS; i++)
+  for (int i = 0; i < AARCH64_NUM_BITMASKS; i++)
     {
-      if ((aarch64_bitmasks[i] & val) == aarch64_bitmasks[i])
+      unsigned HOST_WIDE_INT bmi = aarch64_bitmasks[i];
+
+      if ((bmi & val) == bmi)
 	{
-	  int j;
+	  for (int j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
+	    {
+	      unsigned HOST_WIDE_INT bmj = aarch64_bitmasks[j];
 
-	  for (j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
-	    if (val == (aarch64_bitmasks[i] | aarch64_bitmasks[j]))
-	      {
-		if (generate)
-		  {
-		    subtarget = subtargets ? gen_reg_rtx (mode) : dest;
-		    emit_insn (gen_rtx_SET (subtarget,
-					    GEN_INT (aarch64_bitmasks[i])));
-		    emit_insn (gen_iordi3 (dest, subtarget,
-					   GEN_INT (aarch64_bitmasks[j])));
-		  }
-		num_insns += 2;
-		return num_insns;
-	      }
+	      if (val == (bmi | bmj))
+		{
+		  set0 (bmi);
+		  opN (AA_GI_IOR, bmj);
+		  return;
+		}
+	    }
 	}
-      else if ((val & aarch64_bitmasks[i]) == val)
+      else if ((val & bmi) == val)
 	{
-	  int j;
+	  for (int j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
+	    {
+	      unsigned HOST_WIDE_INT bmj = aarch64_bitmasks[j];
 
-	  for (j = i + 1; j < AARCH64_NUM_BITMASKS; j++)
-	    if (val == (aarch64_bitmasks[j] & aarch64_bitmasks[i]))
-	      {
-		if (generate)
-		  {
-		    subtarget = subtargets ? gen_reg_rtx (mode) : dest;
-		    emit_insn (gen_rtx_SET (subtarget,
-					    GEN_INT (aarch64_bitmasks[j])));
-		    emit_insn (gen_anddi3 (dest, subtarget,
-					   GEN_INT (aarch64_bitmasks[i])));
-		  }
-		num_insns += 2;
-		return num_insns;
-	      }
+	      if (val == (bmi & bmj))
+		{
+		  set0 (bmi);
+		  opN (AA_GI_AND, bmj);
+		  return;
+		}
+	    }
 	}
     }
 
   if (one_match > zero_match)
     {
       /* Set either first three quarters or all but the third.	 */
-      mask = 0xffffll << (16 - first_not_ffff_match);
-      if (generate)
-	emit_insn (gen_rtx_SET (dest,
-				GEN_INT (val | mask | 0xffffffff00000000ull)));
-      num_insns ++;
+      mask = HOST_WIDE_INT_UC (0xffff) << (16 - first_not_ffff_match);
+      set0 (val | mask | HOST_WIDE_INT_UC (0xffffffff00000000));
 
       /* Now insert other two quarters.	 */
-      for (i = first_not_ffff_match + 16, mask <<= (first_not_ffff_match << 1);
-	   i < 64; i += 16, mask <<= 16)
+      for (int i = first_not_ffff_match + 16; i < 64; i += 16)
 	{
+	  mask = HOST_WIDE_INT_UC (0xffff) << i;
 	  if ((val & mask) != mask)
-	    {
-	      if (generate)
-		emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-					   GEN_INT ((val >> i) & 0xffff)));
-	      num_insns ++;
-	    }
+	    insN (i, val);
 	}
-      return num_insns;
+      return;
     }
 
  simple_sequence:
-  first = true;
-  mask = 0xffff;
-  for (i = 0; i < 64; i += 16, mask <<= 16)
+  cost = 0;
+  for (int i = 0; i < 64; i += 16)
     {
+      mask = HOST_WIDE_INT_UC (0xffff) << i;
       if ((val & mask) != 0)
 	{
-	  if (first)
-	    {
-	      if (generate)
-		emit_insn (gen_rtx_SET (dest, GEN_INT (val & mask)));
-	      num_insns ++;
-	      first = false;
-	    }
+	  if (cost == 0)
+	    set0 (val & mask);
 	  else
-	    {
-	      if (generate)
-		emit_insn (gen_insv_immdi (dest, GEN_INT (i),
-					   GEN_INT ((val >> i) & 0xffff)));
-	      num_insns ++;
-	    }
+	    insN (i, val);
 	}
     }
+}
+
+/* Follow the recipe to construct a value in MODE
+   placing the result in DEST.  */
+
+void
+genimm_aa64::generate (rtx dest, machine_mode mode) const
+{
+  int n = cost;
 
-  return num_insns;
+  gcc_checking_assert (n >= 1 && n <= max_cost);
+  gcc_checking_assert (code[0] == AA_GI_SET);
+
+  /* If possible, put the original SET into its own pseudo, so that
+     it might be CSE'd.  We can't do this if we use INSV, and we only
+     ever use arithmetic with N == 2.  */
+  if (n == 2 && optimize && can_create_pseudo_p ())
+    {
+      rtx_code rc = aa_gi_binop (code[1]);
+      if (rc != UNKNOWN)
+	{
+	  rtx sub = gen_reg_rtx (mode);
+	  emit_insn (gen_rtx_SET (sub, GEN_INT (op[0])));
+	  sub = gen_rtx_fmt_ee (rc, mode, sub, GEN_INT (op[1]));
+	  emit_insn (gen_rtx_SET (dest, sub));
+	  return;
+	}
+    }
+
+  emit_insn (gen_rtx_SET (dest, GEN_INT (op[0])));
+
+  for (int i = 1; i < n; ++i)
+    {
+      rtx x = GEN_INT (op[i]);
+      switch (code[i])
+	{
+	case AA_GI_PLUS:
+	case AA_GI_IOR:
+	case AA_GI_AND:
+	  x = gen_rtx_fmt_ee (aa_gi_binop (code[i]), mode, dest, x);
+	  x = gen_rtx_SET (dest, x);
+	  break;
+	case AA_GI_INS0:
+	case AA_GI_INS1:
+	case AA_GI_INS2:
+	case AA_GI_INS3:
+	  if (mode == SImode)
+	    x = gen_insv_immsi (dest, GEN_INT ((int)code[i]), x);
+	  else
+	    x = gen_insv_immdi (dest, GEN_INT ((int)code[i]), x);
+	  break;
+	default:
+	  gcc_unreachable ();
+	}
+      emit_insn (x);
+    }
 }
 
+} // anon namespace
+
+/* Examine IMM in MODE and return the number insns required to construct it.
+   If GENERATE is true, emit instructions to compute IMM into DEST.  */
+
+static inline int
+aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
+                                machine_mode mode)
+{
+  genimm_aa64 data = genimm_hash<genimm_aa64>::hash (INTVAL (imm), mode);
+  if (generate)
+    data.generate (dest, mode);
+  return data.cost;
+}
 
 void
 aarch64_expand_mov_immediate (rtx dest, rtx imm)
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 06/15] rs6000: Use rldiwi in constant construction
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (7 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 04/15] rs6000: Implement set_const_data infrastructure Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12 14:02   ` Segher Boessenkool
  2015-08-12  1:12 ` [PATCH 11/15] aarch64: Use hashing infrastructure for generating constants Richard Henderson
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Save 2 insns for constants like 0x5555555555555555 where
the low 32-bits can be replicated to the high 32-bits.

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000.c (genimm_ppc::exam_search): Check for
	two equal 32-bit pieces.
	(genimm_ppc::generate): Handle VEC_DUPLICATE.
---
 gcc/config/rs6000/rs6000.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6af5cf3..59c5014 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
   if (exam_mask (-1, c, sub_budget))
     return true;
 
+  /* If the two halves are equal, use an insert.  */
+  if (c >> 32 == test && exam_sub (test, sub_budget))
+    {
+      opN (VEC_DUPLICATE, 0xffffffffu); /* RLDIMI */
+      return true;
+    }
+
   /* Shift the constant left.  */
   test = HOST_WIDE_INT_UC (0xffffffff00000000);
   if ((c & test) == c && exam_sub (c >> 32, sub_budget))
@@ -8230,6 +8237,14 @@ genimm_ppc::generate (rtx dest, machine_mode mode) const
 	case ASHIFT:
 	  x = gen_rtx_fmt_ee (r, mode, op1, op2);
 	  break;
+	case VEC_DUPLICATE:
+	  /* Abusing the rtx code to indicate RLDIMI.
+	     This should match *rotl<mode>3_insert_3.  */
+	  x = GEN_INT (exact_log2 (op[i] + 1));
+	  x = gen_rtx_IOR (mode,
+			   gen_rtx_AND (mode, op1, op2),
+			   gen_rtx_ASHIFT (mode, op1, x));
+	  break;
 	default:
 	  gcc_unreachable ();
 	}
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 02/15] rs6000: Make num_insns_constant_wide static
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (4 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 15/15] alpha: Remove alpha_emit_set_long_const Richard Henderson
@ 2015-08-12  1:12 ` Richard Henderson
  2015-08-12  1:12 ` [PATCH 09/15] rs6000: Use xoris in constant construction Richard Henderson
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12  1:12 UTC (permalink / raw)
  To: gcc-patches; +Cc: David Edelsohn

Cc: David Edelsohn <dje.gcc@gmail.com>
---
	* config/rs6000/rs6000-protos.h (num_insns_constant_wide): Move...
	* config/rs6000/rs6000.c: ... prototype here.  Make static.
---
 gcc/config/rs6000/rs6000-protos.h | 1 -
 gcc/config/rs6000/rs6000.c        | 3 ++-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h
index f5d3476..2407060 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -34,7 +34,6 @@ extern bool easy_altivec_constant (rtx, machine_mode);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
 extern int num_insns_constant (rtx, machine_mode);
-extern int num_insns_constant_wide (HOST_WIDE_INT);
 extern int small_data_operand (rtx, machine_mode);
 extern bool mem_operand_gpr (rtx, machine_mode);
 extern bool toc_relative_expr_p (const_rtx, bool);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index a33b9d3..abaf7eb 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1108,6 +1108,7 @@ static tree rs6000_handle_altivec_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_handle_struct_attribute (tree *, tree, tree, int, bool *);
 static tree rs6000_builtin_vectorized_libmass (tree, tree, tree);
 static void rs6000_emit_set_long_const (rtx, HOST_WIDE_INT);
+static int num_insns_constant_wide (HOST_WIDE_INT);
 static bool rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val,
 					   machine_mode mode);
 static int rs6000_memory_move_cost (machine_mode, reg_class_t, bool);
@@ -5240,7 +5241,7 @@ direct_return (void)
 /* Return the number of instructions it takes to form a constant in an
    integer register.  */
 
-int
+static int
 num_insns_constant_wide (HOST_WIDE_INT value)
 {
   /* signed constant loadable with addi */
-- 
2.4.3

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (15 preceding siblings ...)
  2015-08-12  8:32 ` [PATCH ppc64,aarch64,alpha 00/15] Improve backend " Segher Boessenkool
@ 2015-08-12  8:32 ` Richard Earnshaw
  2015-08-12  8:43   ` Richard Earnshaw
  2015-08-12 15:45   ` Richard Henderson
  16 siblings, 2 replies; 35+ messages in thread
From: Richard Earnshaw @ 2015-08-12  8:32 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches
  Cc: David Edelsohn, Marcus Shawcroft, Richard Earnshaw

On 12/08/15 02:11, Richard Henderson wrote:
> Something last week had me looking at ppc64 code generation,
> and some of what I saw was fairly bad.  Fixing it wasn't going
> to be easy, due to the fact that the logic for generating
> constants wasn't contained within a single function.
> 
> Better is the way that aarch64 and alpha have done it in the
> past, sharing a single function with all of the logical that
> can be used for both cost calculation and the actual emission
> of the constants.
> 
> However, the way that aarch64 and alpha have done it hasn't
> been ideal, in that there's a fairly costly search that must
> be done every time.  I've thought before about changing this
> so that we would be able to cache results, akin to how we do
> it in expmed.c for multiplication.
> 
> I've implemented such a caching scheme for three targets, as
> a test of how much code could be shared.  The answer appears
> to be about 100 lines of boiler-plate.  Minimal, true, but it
> may still be worth it as a way of encouraging backends to do
> similar things in a similar way.
> 

I've got a short week this week, so won't have time to look at this in
detail for a while.  So a bunch of questions... but not necessarily
objections :-)

How do we clear the cache, and when?  For example, on ARM, switching
between ARM and Thumb state means we need to generate potentially
radically different sequences?  We can do such splitting at function
boundaries now.

Can we generate different sequences for hot/cold code within a single
function?

Can we cache sequences with the context (eg use with AND, OR, ADD, etc)?


> Some notes about ppc64 in particular:
> 
>   * Constants aren't split until quite late, preventing all hope of
>     CSE'ing portions of the generated code.  My gut feeling is that
>     this is in general a mistake, but...
> 
>     I did attempt to fix it, and got nothing for my troubles except
>     poorer code generation for AND/IOR/XOR with non-trivial constants.
> 
On AArch64 in particular, building complex constants is generally
destructive on the source register (if you want to preserve intermediate
values you have to make intermediate copies); that's clearly never going
to be a win if you don't need at least 3 instructions to form the
constant.

There might be some cases where you could form a second constant as a
difference from an earlier one, but that then creates data-flow
dependencies and in OoO machines that might not be worth-while.  Even
for in-order machines it can restrict scheduling and result in worse code.


>     I'm somewhat surprised that the operands to the logicals aren't
>     visible at rtl generation time, given all the work done in gimple.
>     And failing that, combine has enough REG_EQUAL notes that it ought
>     to be able to put things back together and see the simpler pattern.
> 

We've tried it in the past.  Exposing the individual steps prevents the
higher-level rtl-based optimizations since they can no-longer deal with
the complete sub-expression.

>     Perhaps there's some other predication or costing error that's
>     getting in the way, and it simply wasn't obvious to me.   In any
>     case, nothing in this patch set addresses this at all.
> 
>   * I go on to add 4 new methods of generating a constant, each of
>     which typically saves 2 insns over the current algorithm.  There
>     are a couple more that might be useful but...
> 
>   * Constants are split *really* late.  In particular, after reload.
>     It would be awesome if we could at least have them all split before
>     register allocation so that we arrange to use ADDI and ADDIS when
>     that could save a few instructions.  But that does of course mean
>     avoiding r0 for the input.  Again, nothing here attempts to change
>     when constants are split.
> 

certainly in the ARM port we try to split immediately before register
allocation, that way we can be sure that we have scratch registers
available if that helps with generating more efficient sequences.

R.

>   * This is the only platform for which I bothered collecting any sort
>     of performance data:
> 
>     As best I can tell, there is a 9% improvement in bootstrap speed
>     for ppc64.  That is, 10 minutes off the original 109 minute build.
> 
>     For aarch64 and alpha, I simply assumed there would be no loss,
>     since the basic search algorithm is unchanged for each.
> 
> Comments?  Especially on the shared header?
> 
> 
> r~
> 
> Cc: David Edelsohn <dje.gcc@gmail.com>
> Cc: Marcus Shawcroft <marcus.shawcroft@arm.com>
> Cc: Richard Earnshaw <richard.earnshaw@arm.com>
> 
> Richard Henderson (15):
>   rs6000: Split out rs6000_is_valid_and_mask_wide
>   rs6000: Make num_insns_constant_wide static
>   rs6000: Tidy num_insns_constant vs CONST_DOUBLE
>   rs6000: Implement set_const_data infrastructure
>   rs6000: Move constant via mask into build_set_const_data
>   rs6000: Use rldiwi in constant construction
>   rs6000: Generalize left shift in constant generation
>   rs6000: Generalize masking in constant generation
>   rs6000: Use xoris in constant construction
>   rs6000: Use rotldi in constant generation
>   aarch64: Use hashing infrastructure for generating constants
>   aarch64: Test for duplicated 32-bit halves
>   alpha: Use hashing infrastructure for generating constants
>   alpha: Split out alpha_cost_set_const
>   alpha: Remove alpha_emit_set_long_const
> 
>  gcc/config/aarch64/aarch64.c      | 463 ++++++++++++++++------------
>  gcc/config/alpha/alpha.c          | 583 +++++++++++++++++------------------
>  gcc/config/rs6000/rs6000-protos.h |   1 -
>  gcc/config/rs6000/rs6000.c        | 617 ++++++++++++++++++++++++--------------
>  gcc/config/rs6000/rs6000.md       |  15 -
>  gcc/genimm-hash.h                 | 122 ++++++++
>  6 files changed, 1057 insertions(+), 744 deletions(-)
>  create mode 100644 gcc/genimm-hash.h
> 

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
                   ` (14 preceding siblings ...)
  2015-08-12  1:12 ` [PATCH 10/15] rs6000: Use rotldi " Richard Henderson
@ 2015-08-12  8:32 ` Segher Boessenkool
  2015-08-12 15:32   ` Richard Henderson
  2015-08-13  3:10   ` Segher Boessenkool
  2015-08-12  8:32 ` Richard Earnshaw
  16 siblings, 2 replies; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-12  8:32 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, David Edelsohn, Marcus Shawcroft, Richard Earnshaw

Hi!

This looks really nice.  I'll try it out soon :-)

Some comments now...


On Tue, Aug 11, 2015 at 06:11:29PM -0700, Richard Henderson wrote:
> However, the way that aarch64 and alpha have done it hasn't
> been ideal, in that there's a fairly costly search that must
> be done every time.  I've thought before about changing this
> so that we would be able to cache results, akin to how we do
> it in expmed.c for multiplication.

Is there something that makes the cache not get too big?  Do we
care, anyway?

> Some notes about ppc64 in particular:
> 
>   * Constants aren't split until quite late, preventing all hope of
>     CSE'ing portions of the generated code.  My gut feeling is that
>     this is in general a mistake, but...

Constant arguments to IOR/XOR/AND that can be done with two machine
insns are split at expand.  Then combine comes along and just loves
to recombine them, but then they are split again at split1 (before
RA).

For AND this was optimal in my experiments; for IOR/XOR it has been
this way since the dawn of time.

Simple SETs aren't split at expand, maybe they should be.  But they
are split at split1.

>     I did attempt to fix it, and got nothing for my troubles except
>     poorer code generation for AND/IOR/XOR with non-trivial constants.

Could you give an example of code that isn't split early enough?

>     I'm somewhat surprised that the operands to the logicals aren't
>     visible at rtl generation time, given all the work done in gimple.

So am I, because that is not what I'm seeing?  E.g.

int f(int x) { return x | 0x12345678; }

is expanded as two IORs already.  There must be something in your
testcases that prevents this?

>     And failing that, combine has enough REG_EQUAL notes that it ought
>     to be able to put things back together and see the simpler pattern.
> 
>     Perhaps there's some other predication or costing error that's
>     getting in the way, and it simply wasn't obvious to me.   In any
>     case, nothing in this patch set addresses this at all.

The instruction (set (reg) (const_int 0x12345678)) is costed as 4
(i.e. one insn).  That cannot be good.  This is alternative #5 in
*movsi_internal1_single (there are many more variants of that
pattern).

>   * I go on to add 4 new methods of generating a constant, each of
>     which typically saves 2 insns over the current algorithm.  There
>     are a couple more that might be useful but...

New methods look to be really simple to add with your framework,
very nice :-)

>   * Constants are split *really* late.  In particular, after reload.

Yeah that is bad.  But I'm still not seeing it.  Hrm, maybe only
DImode ones?

>     It would be awesome if we could at least have them all split before
>     register allocation

And before sched1, yeah.

>     so that we arrange to use ADDI and ADDIS when
>     that could save a few instructions.  But that does of course mean
>     avoiding r0 for the input.

That is no problem at all before RA.

>     Again, nothing here attempts to change
>     when constants are split.
> 
>   * This is the only platform for which I bothered collecting any sort
>     of performance data:
> 
>     As best I can tell, there is a 9% improvement in bootstrap speed
>     for ppc64.  That is, 10 minutes off the original 109 minute build.

That is, wow.  Wow :-)

Have you looked at generated code quality?


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  8:32 ` Richard Earnshaw
@ 2015-08-12  8:43   ` Richard Earnshaw
  2015-08-12  9:02     ` Richard Earnshaw
  2015-08-12 15:45   ` Richard Henderson
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Earnshaw @ 2015-08-12  8:43 UTC (permalink / raw)
  To: Richard Earnshaw, Richard Henderson, gcc-patches
  Cc: David Edelsohn, Marcus Shawcroft

On 12/08/15 09:32, Richard Earnshaw wrote:
> On 12/08/15 02:11, Richard Henderson wrote:
>> Something last week had me looking at ppc64 code generation,
>> and some of what I saw was fairly bad.  Fixing it wasn't going
>> to be easy, due to the fact that the logic for generating
>> constants wasn't contained within a single function.
>>
>> Better is the way that aarch64 and alpha have done it in the
>> past, sharing a single function with all of the logical that
>> can be used for both cost calculation and the actual emission
>> of the constants.
>>
>> However, the way that aarch64 and alpha have done it hasn't
>> been ideal, in that there's a fairly costly search that must
>> be done every time.  I've thought before about changing this
>> so that we would be able to cache results, akin to how we do
>> it in expmed.c for multiplication.
>>
>> I've implemented such a caching scheme for three targets, as
>> a test of how much code could be shared.  The answer appears
>> to be about 100 lines of boiler-plate.  Minimal, true, but it
>> may still be worth it as a way of encouraging backends to do
>> similar things in a similar way.
>>
> 
> I've got a short week this week, so won't have time to look at this in
> detail for a while.  So a bunch of questions... but not necessarily
> objections :-)
> 
> How do we clear the cache, and when?  For example, on ARM, switching
> between ARM and Thumb state means we need to generate potentially
> radically different sequences?  We can do such splitting at function
> boundaries now.
> 
> Can we generate different sequences for hot/cold code within a single
> function?
> 
> Can we cache sequences with the context (eg use with AND, OR, ADD, etc)?
> 
> 
>> Some notes about ppc64 in particular:
>>
>>   * Constants aren't split until quite late, preventing all hope of
>>     CSE'ing portions of the generated code.  My gut feeling is that
>>     this is in general a mistake, but...
>>
>>     I did attempt to fix it, and got nothing for my troubles except
>>     poorer code generation for AND/IOR/XOR with non-trivial constants.
>>
> On AArch64 in particular, building complex constants is generally
> destructive on the source register (if you want to preserve intermediate
> values you have to make intermediate copies); that's clearly never going
> to be a win if you don't need at least 3 instructions to form the
> constant.
> 
> There might be some cases where you could form a second constant as a
> difference from an earlier one, but that then creates data-flow
> dependencies and in OoO machines that might not be worth-while.  Even
> for in-order machines it can restrict scheduling and result in worse code.
> 
> 
>>     I'm somewhat surprised that the operands to the logicals aren't
>>     visible at rtl generation time, given all the work done in gimple.
>>     And failing that, combine has enough REG_EQUAL notes that it ought
>>     to be able to put things back together and see the simpler pattern.
>>
> 
> We've tried it in the past.  Exposing the individual steps prevents the
> higher-level rtl-based optimizations since they can no-longer deal with
> the complete sub-expression.

Eg. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63724

R.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  8:43   ` Richard Earnshaw
@ 2015-08-12  9:02     ` Richard Earnshaw
  0 siblings, 0 replies; 35+ messages in thread
From: Richard Earnshaw @ 2015-08-12  9:02 UTC (permalink / raw)
  To: Richard Henderson, gcc-patches; +Cc: David Edelsohn, Marcus Shawcroft

On 12/08/15 09:43, Richard Earnshaw wrote:
> On 12/08/15 09:32, Richard Earnshaw wrote:
>> On 12/08/15 02:11, Richard Henderson wrote:
>>>     I'm somewhat surprised that the operands to the logicals aren't
>>>     visible at rtl generation time, given all the work done in gimple.
>>>     And failing that, combine has enough REG_EQUAL notes that it ought
>>>     to be able to put things back together and see the simpler pattern.
>>>
>>
>> We've tried it in the past.  Exposing the individual steps prevents the
>> higher-level rtl-based optimizations since they can no-longer deal with
>> the complete sub-expression.
> 
> Eg. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63724
> 
> R.
> 
And https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65768

R.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
  2015-08-12  1:11 ` [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide Richard Henderson
@ 2015-08-12 13:24   ` Segher Boessenkool
  2015-08-12 15:50     ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-12 13:24 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, David Edelsohn

On Tue, Aug 11, 2015 at 06:11:30PM -0700, Richard Henderson wrote:
> This allows testing for a mask without having to call GEN_INT.
> 
> Cc: David Edelsohn <dje.gcc@gmail.com>
> ---
> 	* config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from...
> 	(rs6000_is_valid_mask): ... here.
> 	(rs6000_is_valid_and_mask_wide): Split out from...
> 	(rs6000_is_valid_and_mask): ... here.

I don't like these "_wide" names much.  You could overload the shorter
name, if you really think creating some garbage const_int's is too much
overhead (it might well be if you use it a lot more in later patches).

The original functions really want rtx's since they are used like
predicates (so should look and behave like one); rs6000_is_valid_mask
itself is different (and a lousy name; suggestions welcome).

> -bool
> -rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
> +static bool
> +rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n)

But why change the mode parameter?  The code was clearer before.

>  {
> -  unsigned HOST_WIDE_INT val = INTVAL (mask);
>    unsigned HOST_WIDE_INT bit;
>    int nb, ne;
> -  int n = GET_MODE_PRECISION (mode);
>  
> -  if (mode != DImode && mode != SImode)
> -    return false;
> -
> -  if (INTVAL (mask) >= 0)
> +  if ((HOST_WIDE_INT)val >= 0)
                        ^ missing space

>      {
>        bit = val & -val;
>        ne = exact_log2 (bit);
> @@ -16430,27 +16427,54 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
>    return true;
>  }
>  
> +bool
> +rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
> +{
> +  int n;
> +
> +  if (mode == DImode)
> +    n = 64;
> +  else if (mode == SImode)
> +    n = 32;
> +  else
> +    return false;
> +
> +  unsigned HOST_WIDE_INT val = INTVAL (mask);
> +  return rs6000_is_valid_mask_wide (val, b, e, n);
> +}
> +
>  /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, rldicl,
>     or rldicr instruction, to implement an AND with it in mode MODE.  */
>  
> -bool
> -rs6000_is_valid_and_mask (rtx mask, machine_mode mode)
> +static bool
> +rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode)
>  {
>    int nb, ne;
>  
> -  if (!rs6000_is_valid_mask (mask, &nb, &ne, mode))
> -    return false;
> +  switch (mode)
> +    {
> +    case DImode:
> +      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 64))
> +	return false;
> +      /* For DImode, we need a rldicl, rldicr, or a rlwinm with
> +	 mask that does not wrap.  */
> +      return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
>  
> -  /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that
> -     does not wrap.  */
> -  if (mode == DImode)
> -    return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
> +    case SImode:
> +      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 32))
> +	return false;
> +      /* For SImode, rlwinm can do everything.  */
> +      return (nb < 32 && ne < 32);
>  
> -  /* For SImode, rlwinm can do everything.  */
> -  if (mode == SImode)
> -    return (nb < 32 && ne < 32);
> +    default:
> +      return false;
> +    }
> +}
>  
> -  return false;

You don't need any of these changes then, either.

> +bool
> +rs6000_is_valid_and_mask (rtx mask, machine_mode mode)
> +{
> +  return rs6000_is_valid_and_mask_wide (UINTVAL (mask), mode);
>  }
>  
>  /* Return the instruction template for an AND with mask in mode MODE, with
> @@ -16739,12 +16763,12 @@ rs6000_is_valid_2insn_and (rtx c, machine_mode mode)
>  
>    /* Otherwise, fill in the lowest "hole"; if we can do the result with
>       one insn, we can do the whole thing with two.  */
> -  unsigned HOST_WIDE_INT val = INTVAL (c);
> +  unsigned HOST_WIDE_INT val = UINTVAL (c);

Does it matter?


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/15] rs6000: Implement set_const_data infrastructure
  2015-08-12  1:12 ` [PATCH 04/15] rs6000: Implement set_const_data infrastructure Richard Henderson
@ 2015-08-12 13:53   ` Segher Boessenkool
  0 siblings, 0 replies; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-12 13:53 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, David Edelsohn

Hi Richard,

You wanted us to read this file...

On Tue, Aug 11, 2015 at 06:11:33PM -0700, Richard Henderson wrote:
> +   -- The fallback generation for the most complex word_mode constants.
> +      The receipe built will be the full MAX_COST insns, as we will
                ^-- typo.


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
  2015-08-12  1:12 ` [PATCH 06/15] rs6000: Use rldiwi in constant construction Richard Henderson
@ 2015-08-12 14:02   ` Segher Boessenkool
  2015-08-12 15:55     ` Richard Henderson
  0 siblings, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-12 14:02 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, David Edelsohn

On Tue, Aug 11, 2015 at 06:11:35PM -0700, Richard Henderson wrote:
> @@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
>    if (exam_mask (-1, c, sub_budget))
>      return true;
>  
> +  /* If the two halves are equal, use an insert.  */
> +  if (c >> 32 == test && exam_sub (test, sub_budget))
> +    {
> +      opN (VEC_DUPLICATE, 0xffffffffu); /* RLDIMI */
> +      return true;
> +    }

Does this work for c with the high bit set?  I think you need
to cast it to unsigned HOST_WIDE_INT first?


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  8:32 ` [PATCH ppc64,aarch64,alpha 00/15] Improve backend " Segher Boessenkool
@ 2015-08-12 15:32   ` Richard Henderson
  2015-08-13  3:07     ` Segher Boessenkool
  2015-08-13  3:10   ` Segher Boessenkool
  1 sibling, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2015-08-12 15:32 UTC (permalink / raw)
  To: Segher Boessenkool
  Cc: gcc-patches, David Edelsohn, Marcus Shawcroft, Richard Earnshaw

On 08/12/2015 01:31 AM, Segher Boessenkool wrote:
> Is there something that makes the cache not get too big?  Do we
> care, anyway?

No, nothing ever reduces the size of the cache.  I doubt we care, but I haven't
instrumented to see how big it grows.

My intuition says the most important thing about managing this cache is not to
put the most common trivial constants in, and I already do that.

>>     I did attempt to fix it, and got nothing for my troubles except
>>     poorer code generation for AND/IOR/XOR with non-trivial constants.
> 
> Could you give an example of code that isn't split early enough?

I believe the examples I was seeing was in the libdecnumber code.  I'd have to
go back and reproduce it now...

>>     Perhaps there's some other predication or costing error that's
>>     getting in the way, and it simply wasn't obvious to me.   In any
>>     case, nothing in this patch set addresses this at all.
> 
> The instruction (set (reg) (const_int 0x12345678)) is costed as 4
> (i.e. one insn).  That cannot be good.  This is alternative #5 in
> *movsi_internal1_single (there are many more variants of that
> pattern).

Yes, when I tried to fix it, I did adjust that costing, but still...

>>   * Constants are split *really* late.  In particular, after reload.
> 
> Yeah that is bad.  But I'm still not seeing it.  Hrm, maybe only
> DImode ones?

Dunno.  I found it after I did add a recipe to use ADDI, which then triggered
an ICE due to the r0 base register.

> Have you looked at generated code quality?

I've looked at diffs of all of the object files in the target directory.  There
were definite spots of improvement.  I wasn't able to spot any regressions.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  8:32 ` Richard Earnshaw
  2015-08-12  8:43   ` Richard Earnshaw
@ 2015-08-12 15:45   ` Richard Henderson
  1 sibling, 0 replies; 35+ messages in thread
From: Richard Henderson @ 2015-08-12 15:45 UTC (permalink / raw)
  To: Richard Earnshaw, gcc-patches
  Cc: David Edelsohn, Marcus Shawcroft, Richard Earnshaw

On 08/12/2015 01:32 AM, Richard Earnshaw wrote:
> How do we clear the cache, and when?  For example, on ARM, switching
> between ARM and Thumb state means we need to generate potentially
> radically different sequences?  We can do such splitting at function
> boundaries now.

At present I never clear the cache.  Maybe we'll find that's a mistake.

For arm vs thumb I would start with just using two different caches.  The way
the code is structured currently, that would mean two different classes.  Which
could just be trivial wrappers around a common base class containing the
generator code.

> Can we generate different sequences for hot/cold code within a single
> function?

Not without using different caches.

> Can we cache sequences with the context (eg use with AND, OR, ADD, etc)?

No.  At least not without...


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
  2015-08-12 13:24   ` Segher Boessenkool
@ 2015-08-12 15:50     ` Richard Henderson
  2015-08-13  2:29       ` Segher Boessenkool
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2015-08-12 15:50 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn

On 08/12/2015 06:23 AM, Segher Boessenkool wrote:
> On Tue, Aug 11, 2015 at 06:11:30PM -0700, Richard Henderson wrote:
>> This allows testing for a mask without having to call GEN_INT.
>>
>> Cc: David Edelsohn <dje.gcc@gmail.com>
>> ---
>> 	* config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from...
>> 	(rs6000_is_valid_mask): ... here.
>> 	(rs6000_is_valid_and_mask_wide): Split out from...
>> 	(rs6000_is_valid_and_mask): ... here.
> 
> I don't like these "_wide" names much.

It follows the existing practice within the backend.

>  You could overload the shorter
> name, if you really think creating some garbage const_int's is too much
> overhead (it might well be if you use it a lot more in later patches).

At one stage in the development (before I became much leaner with the search
for rotate), it really really mattered.

>> -bool
>> -rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
>> +static bool
>> +rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n)
> 
> But why change the mode parameter?  The code was clearer before.

So that we don't have to look up GET_MODE_BITSIZE (mode).

>> +static bool
>> +rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode)
>>  {
>>    int nb, ne;
>>  
>> -  if (!rs6000_is_valid_mask (mask, &nb, &ne, mode))
>> -    return false;
>> +  switch (mode)
>> +    {
>> +    case DImode:
>> +      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 64))
>> +	return false;
>> +      /* For DImode, we need a rldicl, rldicr, or a rlwinm with
>> +	 mask that does not wrap.  */
>> +      return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
>>  
>> -  /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that
>> -     does not wrap.  */
>> -  if (mode == DImode)
>> -    return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
>> +    case SImode:
>> +      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 32))
>> +	return false;
>> +      /* For SImode, rlwinm can do everything.  */
>> +      return (nb < 32 && ne < 32);
>>  
>> -  /* For SImode, rlwinm can do everything.  */
>> -  if (mode == SImode)
>> -    return (nb < 32 && ne < 32);
>> +    default:
>> +      return false;
>> +    }
>> +}
>>  
>> -  return false;
> 
> You don't need any of these changes then, either.

True, not *needed* per-se, but if you look closer I'm combining conditionals.
I think the replacement here is clearer.

>>    /* Otherwise, fill in the lowest "hole"; if we can do the result with
>>       one insn, we can do the whole thing with two.  */
>> -  unsigned HOST_WIDE_INT val = INTVAL (c);
>> +  unsigned HOST_WIDE_INT val = UINTVAL (c);
> 
> Does it matter?

No.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
  2015-08-12 14:02   ` Segher Boessenkool
@ 2015-08-12 15:55     ` Richard Henderson
  2015-08-13  2:43       ` Segher Boessenkool
  0 siblings, 1 reply; 35+ messages in thread
From: Richard Henderson @ 2015-08-12 15:55 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: gcc-patches, David Edelsohn

On 08/12/2015 07:02 AM, Segher Boessenkool wrote:
> On Tue, Aug 11, 2015 at 06:11:35PM -0700, Richard Henderson wrote:
>> @@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
>>    if (exam_mask (-1, c, sub_budget))
>>      return true;
>>  
>> +  /* If the two halves are equal, use an insert.  */
>> +  if (c >> 32 == test && exam_sub (test, sub_budget))
>> +    {
>> +      opN (VEC_DUPLICATE, 0xffffffffu); /* RLDIMI */
>> +      return true;
>> +    }
> 
> Does this work for c with the high bit set?  I think you need
> to cast it to unsigned HOST_WIDE_INT first?

Indeed, a sign-extension works better.  It means the base constant will use
LIS+ORIS without trying to create an unsigned version.

If you're talking about ubsan sort of restrictions on shifting signed
constants...  I choose to totally ignore that.  Certainly no where else in gcc
has been audited for that, beginning with hwint.h itself.


r~

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide
  2015-08-12 15:50     ` Richard Henderson
@ 2015-08-13  2:29       ` Segher Boessenkool
  0 siblings, 0 replies; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-13  2:29 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, David Edelsohn

On Wed, Aug 12, 2015 at 08:50:48AM -0700, Richard Henderson wrote:
> On 08/12/2015 06:23 AM, Segher Boessenkool wrote:
> > On Tue, Aug 11, 2015 at 06:11:30PM -0700, Richard Henderson wrote:
> >> This allows testing for a mask without having to call GEN_INT.
> >>
> >> Cc: David Edelsohn <dje.gcc@gmail.com>
> >> ---
> >> 	* config/rs6000/rs6000.c (rs6000_is_valid_mask_wide): Split out from...
> >> 	(rs6000_is_valid_mask): ... here.
> >> 	(rs6000_is_valid_and_mask_wide): Split out from...
> >> 	(rs6000_is_valid_and_mask): ... here.
> > 
> > I don't like these "_wide" names much.
> 
> It follows the existing practice within the backend.

One existing function name, yes.  And you are replacing that function :-)

> >  You could overload the shorter
> > name, if you really think creating some garbage const_int's is too much
> > overhead (it might well be if you use it a lot more in later patches).
> 
> At one stage in the development (before I became much leaner with the search
> for rotate), it really really mattered.

For the AND patterns I considered such a search too; I didn't do it
because as you say it would have to consider a *lot* of possibilities,
most useless.  AND sequences of more than two insns often prevented
other optimisation too, so I settled on two insns maximum, and then you
can generate it directly no problem.

So yes if you call it way too often it also creates too much garbage.

> >> -bool
> >> -rs6000_is_valid_mask (rtx mask, int *b, int *e, machine_mode mode)
> >> +static bool
> >> +rs6000_is_valid_mask_wide (unsigned HOST_WIDE_INT val, int *b, int *e, int n)
> > 
> > But why change the mode parameter?  The code was clearer before.
> 
> So that we don't have to look up GET_MODE_BITSIZE (mode).

Getting rid of a single array lookup matters more than interface clarity?
You must have been calling it *very* often!  Thankfully you don't anymore.

> >> +static bool
> >> +rs6000_is_valid_and_mask_wide (unsigned HOST_WIDE_INT val, machine_mode mode)
> >>  {
> >>    int nb, ne;
> >>  
> >> -  if (!rs6000_is_valid_mask (mask, &nb, &ne, mode))
> >> -    return false;
> >> +  switch (mode)
> >> +    {
> >> +    case DImode:
> >> +      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 64))
> >> +	return false;
> >> +      /* For DImode, we need a rldicl, rldicr, or a rlwinm with
> >> +	 mask that does not wrap.  */
> >> +      return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
> >>  
> >> -  /* For DImode, we need a rldicl, rldicr, or a rlwinm with mask that
> >> -     does not wrap.  */
> >> -  if (mode == DImode)
> >> -    return (ne == 0 || nb == 63 || (nb < 32 && ne <= nb));
> >> +    case SImode:
> >> +      if (!rs6000_is_valid_mask_wide (val, &nb, &ne, 32))
> >> +	return false;
> >> +      /* For SImode, rlwinm can do everything.  */
> >> +      return (nb < 32 && ne < 32);
> >>  
> >> -  /* For SImode, rlwinm can do everything.  */
> >> -  if (mode == SImode)
> >> -    return (nb < 32 && ne < 32);
> >> +    default:
> >> +      return false;
> >> +    }
> >> +}
> >>  
> >> -  return false;
> > 
> > You don't need any of these changes then, either.
> 
> True, not *needed* per-se, but if you look closer I'm combining conditionals.
> I think the replacement here is clearer.

You're combining a conditional that you add (for mode -> 32,64), and
the code doesn't become any clearer at all IMHO.

> >> -  unsigned HOST_WIDE_INT val = INTVAL (c);
> >> +  unsigned HOST_WIDE_INT val = UINTVAL (c);
> > 
> > Does it matter?
> 
> No.

Ah okay, you were getting me worried!  :-)


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
  2015-08-12 15:55     ` Richard Henderson
@ 2015-08-13  2:43       ` Segher Boessenkool
  2015-08-13 19:01         ` Mike Stump
  0 siblings, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-13  2:43 UTC (permalink / raw)
  To: Richard Henderson; +Cc: gcc-patches, David Edelsohn

On Wed, Aug 12, 2015 at 08:55:51AM -0700, Richard Henderson wrote:
> On 08/12/2015 07:02 AM, Segher Boessenkool wrote:
> > On Tue, Aug 11, 2015 at 06:11:35PM -0700, Richard Henderson wrote:
> >> @@ -8173,6 +8173,13 @@ genimm_ppc::exam_search (HOST_WIDE_INT c, int budget)
> >>    if (exam_mask (-1, c, sub_budget))
> >>      return true;
> >>  
> >> +  /* If the two halves are equal, use an insert.  */
> >> +  if (c >> 32 == test && exam_sub (test, sub_budget))
> >> +    {
> >> +      opN (VEC_DUPLICATE, 0xffffffffu); /* RLDIMI */
> >> +      return true;
> >> +    }
> > 
> > Does this work for c with the high bit set?  I think you need
> > to cast it to unsigned HOST_WIDE_INT first?
> 
> Indeed, a sign-extension works better.  It means the base constant will use
> LIS+ORIS without trying to create an unsigned version.

Patch 8/15 changes this so that "test" is assigned the sign-extended low
32 bits right before this code; that should work just fine.

> If you're talking about ubsan sort of restrictions on shifting signed
> constants...  I choose to totally ignore that.

Good plan.  We rely on arithmetic shifts rounding towards negative
infinity, and so does the rest of the world.

> Certainly no where else in gcc
> has been audited for that, beginning with hwint.h itself.

Yes.  And there are much worse problems, like many things not working
right if your HOST_WIDE_INT would happen to be more than 64 bits; we
cannot really shake those out because there is no actual system to
test that on -- but it also doesn't actually matter, because there is
no system to run it on :-)


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12 15:32   ` Richard Henderson
@ 2015-08-13  3:07     ` Segher Boessenkool
  2015-08-13  5:36       ` Segher Boessenkool
  0 siblings, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-13  3:07 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, David Edelsohn, Marcus Shawcroft, Richard Earnshaw

On Wed, Aug 12, 2015 at 08:32:46AM -0700, Richard Henderson wrote:
> On 08/12/2015 01:31 AM, Segher Boessenkool wrote:
> > Is there something that makes the cache not get too big?  Do we
> > care, anyway?
> 
> No, nothing ever reduces the size of the cache.  I doubt we care, but I haven't
> instrumented to see how big it grows.
> 
> My intuition says the most important thing about managing this cache is not to
> put the most common trivial constants in, and I already do that.

Right.  And it seems to cache negative results too (the five-insn
sequence).

> >>     I did attempt to fix it, and got nothing for my troubles except
> >>     poorer code generation for AND/IOR/XOR with non-trivial constants.
> > 
> > Could you give an example of code that isn't split early enough?
> 
> I believe the examples I was seeing was in the libdecnumber code.  I'd have to
> go back and reproduce it now...

If you could, please do.

> >>     Perhaps there's some other predication or costing error that's
> >>     getting in the way, and it simply wasn't obvious to me.   In any
> >>     case, nothing in this patch set addresses this at all.
> > 
> > The instruction (set (reg) (const_int 0x12345678)) is costed as 4
> > (i.e. one insn).  That cannot be good.  This is alternative #5 in
> > *movsi_internal1_single (there are many more variants of that
> > pattern).
> 
> Yes, when I tried to fix it, I did adjust that costing, but still...

I misread it (it's alt #6, with cost 8).  Maybe Alan's patches would
fix this one?

> >>   * Constants are split *really* late.  In particular, after reload.
> > 
> > Yeah that is bad.  But I'm still not seeing it.  Hrm, maybe only
> > DImode ones?
> 
> Dunno.  I found it after I did add a recipe to use ADDI, which then triggered
> an ICE due to the r0 base register.

Ah!  Reload generates *new* addition insns out of thin air, and that
is exactly the case where ADDI won't work.  LRA works a bit better
there it seems (but it is still not the default for rs6000); if the
constraint ("b" in this case) would not match, it tries other things.

> > Have you looked at generated code quality?
> 
> I've looked at diffs of all of the object files in the target directory.  There
> were definite spots of improvement.  I wasn't able to spot any regressions.

[ I've looked now. ]

Some of the new combos help a bit, yes.  Total code size increased
a tiny bit; it looks to be all because of unfortunate scheduling and
less tail merging.  The usual.

32-bit code is identical, as it should be ;-)


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-12  8:32 ` [PATCH ppc64,aarch64,alpha 00/15] Improve backend " Segher Boessenkool
  2015-08-12 15:32   ` Richard Henderson
@ 2015-08-13  3:10   ` Segher Boessenkool
  2015-08-13 11:32     ` David Edelsohn
  1 sibling, 1 reply; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-13  3:10 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, David Edelsohn, Marcus Shawcroft, Richard Earnshaw

On Wed, Aug 12, 2015 at 03:31:48AM -0500, Segher Boessenkool wrote:
> >   * This is the only platform for which I bothered collecting any sort
> >     of performance data:
> > 
> >     As best I can tell, there is a 9% improvement in bootstrap speed
> >     for ppc64.  That is, 10 minutes off the original 109 minute build.
> 
> That is, wow.  Wow :-)

Bootstrap + 4 regtests of a virgin trunk du jour took me 127m;
with your patches, 130m.  So I'm not seeing that improvement (but
no regression either).  This is gcc110 fwiw.


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-13  3:07     ` Segher Boessenkool
@ 2015-08-13  5:36       ` Segher Boessenkool
  0 siblings, 0 replies; 35+ messages in thread
From: Segher Boessenkool @ 2015-08-13  5:36 UTC (permalink / raw)
  To: Richard Henderson
  Cc: gcc-patches, David Edelsohn, Marcus Shawcroft, Richard Earnshaw

On Wed, Aug 12, 2015 at 08:32:46AM -0700, Richard Henderson wrote:
> On 08/12/2015 01:31 AM, Segher Boessenkool wrote:
> > Is there something that makes the cache not get too big?  Do we
> > care, anyway?
> 
> No, nothing ever reduces the size of the cache.  I doubt we care, but I haven't
> instrumented to see how big it grows.
> 
> My intuition says the most important thing about managing this cache is not to
> put the most common trivial constants in, and I already do that.

Right.  And it seems to cache negative results too (the five-insn
sequence).

> >>     I did attempt to fix it, and got nothing for my troubles except
> >>     poorer code generation for AND/IOR/XOR with non-trivial constants.
> > 
> > Could you give an example of code that isn't split early enough?
> 
> I believe the examples I was seeing was in the libdecnumber code.  I'd have to
> go back and reproduce it now...

If you could, please do.

> >>     Perhaps there's some other predication or costing error that's
> >>     getting in the way, and it simply wasn't obvious to me.   In any
> >>     case, nothing in this patch set addresses this at all.
> > 
> > The instruction (set (reg) (const_int 0x12345678)) is costed as 4
> > (i.e. one insn).  That cannot be good.  This is alternative #5 in
> > *movsi_internal1_single (there are many more variants of that
> > pattern).
> 
> Yes, when I tried to fix it, I did adjust that costing, but still...

I misread it (it's alt #6, with cost 8).  Maybe Alan's patches would
fix this one?

> >>   * Constants are split *really* late.  In particular, after reload.
> > 
> > Yeah that is bad.  But I'm still not seeing it.  Hrm, maybe only
> > DImode ones?
> 
> Dunno.  I found it after I did add a recipe to use ADDI, which then triggered
> an ICE due to the r0 base register.

Ah!  Reload generates *new* addition insns out of thin air, and that
is exactly the case where ADDI won't work.  LRA works a bit better
there it seems (but it is still not the default for rs6000); if the
constraint ("b" in this case) would not match, it tries other things.

> > Have you looked at generated code quality?
> 
> I've looked at diffs of all of the object files in the target directory.  There
> were definite spots of improvement.  I wasn't able to spot any regressions.

[ I've looked now. ]

Some of the new combos help a bit, yes.  Total code size increased
a tiny bit; it looks to be all because of unfortunate scheduling and
less tail merging.  The usual.

32-bit code is identical, as it should be ;-)


Segher

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation
  2015-08-13  3:10   ` Segher Boessenkool
@ 2015-08-13 11:32     ` David Edelsohn
  0 siblings, 0 replies; 35+ messages in thread
From: David Edelsohn @ 2015-08-13 11:32 UTC (permalink / raw)
  To: Segher Boessenkool, Richard Henderson
  Cc: GCC Patches, Marcus Shawcroft, Richard Earnshaw

On Wed, Aug 12, 2015 at 11:10 PM, Segher Boessenkool
<segher@kernel.crashing.org> wrote:
> On Wed, Aug 12, 2015 at 03:31:48AM -0500, Segher Boessenkool wrote:
>> >   * This is the only platform for which I bothered collecting any sort
>> >     of performance data:
>> >
>> >     As best I can tell, there is a 9% improvement in bootstrap speed
>> >     for ppc64.  That is, 10 minutes off the original 109 minute build.
>>
>> That is, wow.  Wow :-)
>
> Bootstrap + 4 regtests of a virgin trunk du jour took me 127m;
> with your patches, 130m.  So I'm not seeing that improvement (but
> no regression either).  This is gcc110 fwiw.

The patch series is fine with me.

As Segher mentioned, the performance impact is in the noise for my tests.

Thanks, David

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
  2015-08-13  2:43       ` Segher Boessenkool
@ 2015-08-13 19:01         ` Mike Stump
  2015-08-13 20:30           ` Joseph Myers
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Stump @ 2015-08-13 19:01 UTC (permalink / raw)
  To: Segher Boessenkool; +Cc: Richard Henderson, gcc-patches, David Edelsohn

On Aug 12, 2015, at 7:43 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> Yes.  And there are much worse problems, like many things not working
> right if your HOST_WIDE_INT would happen to be more than 64 bits; we
> cannot really shake those out because there is no actual system to
> test that on -- but it also doesn't actually matter, because there is
> no system to run it on :-)

Lots of systems support 128 bit types just fine, and one could use TImode for HOST_WIDE_INT, if one really, really wanted to.  x86_64 I think is one of those systems.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 06/15] rs6000: Use rldiwi in constant construction
  2015-08-13 19:01         ` Mike Stump
@ 2015-08-13 20:30           ` Joseph Myers
  0 siblings, 0 replies; 35+ messages in thread
From: Joseph Myers @ 2015-08-13 20:30 UTC (permalink / raw)
  To: Mike Stump
  Cc: Segher Boessenkool, Richard Henderson, gcc-patches, David Edelsohn

On Thu, 13 Aug 2015, Mike Stump wrote:

> On Aug 12, 2015, at 7:43 PM, Segher Boessenkool <segher@kernel.crashing.org> wrote:
> > Yes.  And there are much worse problems, like many things not working
> > right if your HOST_WIDE_INT would happen to be more than 64 bits; we
> > cannot really shake those out because there is no actual system to
> > test that on -- but it also doesn't actually matter, because there is
> > no system to run it on :-)
> 
> Lots of systems support 128 bit types just fine, and one could use 
> TImode for HOST_WIDE_INT, if one really, really wanted to.  x86_64 I 
> think is one of those systems.

There's no printf support (or other standard library support); you'd run 
into practical problems there.

-- 
Joseph S. Myers
joseph@codesourcery.com

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2015-08-13 20:23 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-08-12  1:11 [PATCH ppc64,aarch64,alpha 00/15] Improve backend constant generation Richard Henderson
2015-08-12  1:11 ` [PATCH 01/15] rs6000: Split out rs6000_is_valid_and_mask_wide Richard Henderson
2015-08-12 13:24   ` Segher Boessenkool
2015-08-12 15:50     ` Richard Henderson
2015-08-13  2:29       ` Segher Boessenkool
2015-08-12  1:11 ` [PATCH 05/15] rs6000: Move constant via mask into build_set_const_data Richard Henderson
2015-08-12  1:12 ` [PATCH 12/15] aarch64: Test for duplicated 32-bit halves Richard Henderson
2015-08-12  1:12 ` [PATCH 13/15] alpha: Use hashing infrastructure for generating constants Richard Henderson
2015-08-12  1:12 ` [PATCH 15/15] alpha: Remove alpha_emit_set_long_const Richard Henderson
2015-08-12  1:12 ` [PATCH 02/15] rs6000: Make num_insns_constant_wide static Richard Henderson
2015-08-12  1:12 ` [PATCH 09/15] rs6000: Use xoris in constant construction Richard Henderson
2015-08-12  1:12 ` [PATCH 04/15] rs6000: Implement set_const_data infrastructure Richard Henderson
2015-08-12 13:53   ` Segher Boessenkool
2015-08-12  1:12 ` [PATCH 06/15] rs6000: Use rldiwi in constant construction Richard Henderson
2015-08-12 14:02   ` Segher Boessenkool
2015-08-12 15:55     ` Richard Henderson
2015-08-13  2:43       ` Segher Boessenkool
2015-08-13 19:01         ` Mike Stump
2015-08-13 20:30           ` Joseph Myers
2015-08-12  1:12 ` [PATCH 11/15] aarch64: Use hashing infrastructure for generating constants Richard Henderson
2015-08-12  1:12 ` [PATCH 08/15] rs6000: Generalize masking in constant generation Richard Henderson
2015-08-12  1:12 ` [PATCH 03/15] rs6000: Tidy num_insns_constant vs CONST_DOUBLE Richard Henderson
2015-08-12  1:12 ` [PATCH 14/15] alpha: Split out alpha_cost_set_const Richard Henderson
2015-08-12  1:12 ` [PATCH 07/15] rs6000: Generalize left shift in constant generation Richard Henderson
2015-08-12  1:12 ` [PATCH 10/15] rs6000: Use rotldi " Richard Henderson
2015-08-12  8:32 ` [PATCH ppc64,aarch64,alpha 00/15] Improve backend " Segher Boessenkool
2015-08-12 15:32   ` Richard Henderson
2015-08-13  3:07     ` Segher Boessenkool
2015-08-13  5:36       ` Segher Boessenkool
2015-08-13  3:10   ` Segher Boessenkool
2015-08-13 11:32     ` David Edelsohn
2015-08-12  8:32 ` Richard Earnshaw
2015-08-12  8:43   ` Richard Earnshaw
2015-08-12  9:02     ` Richard Earnshaw
2015-08-12 15:45   ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).