[00/nn] Patches preparing for runtime offsets and sizes

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

* [00/nn] Patches preparing for runtime offsets and sizes
@ 2017-10-23 11:16 Richard Sandiford
  2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
                   ` (21 more replies)
  0 siblings, 22 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:16 UTC (permalink / raw)
  To: gcc-patches

This series of patches adds or does things are needed for SVE
runtime offsets and sizes, but aren't directly related to offsets
and sizes themselves.  It's a prerequisite to the main series that
I'll post later today.

Tested by compiling the testsuite before and after the series on:

    aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu arc-elf
    arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf c6x-elf
    cr16-elf cris-elf epiphany-elf fr30-elf frv-linux-gnu ft32-elf
    h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
    i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
    m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
    mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
    nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnuspe
    powerpc-eabispe powerpc64-linux-gnu powerpc64le-linux-gnu
    powerpc-ibm-aix7.0 riscv32-elf riscv64-elf rl78-elf rx-elf
    s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
    sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
    xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
    x86_64-linux-gnu xtensa-elf

There were no differences besides the ones described in the
covering notes (except on powerpc-ibm-aix7.0, where symbol names
aren't stable).

Also tested normally on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64le-linux-gnu.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [01/nn] Add gen_(const_)vec_duplicate helpers
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
@ 2017-10-23 11:17 ` Richard Sandiford
  2017-10-25 16:29   ` Jeff Law
  2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
                   ` (20 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:17 UTC (permalink / raw)
  To: gcc-patches

This patch adds helper functions for generating constant and
non-constant vector duplicates.  These routines help with SVE because
it is then easier to use:

   (const:M (vec_duplicate:M X))

for a broadcast of X, even if the number of elements in M isn't known
at compile time.  It also makes it easier for general rtx code to treat
constant and non-constant duplicates in the same way.

In the target code, the patch uses gen_vec_duplicate instead of
gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
useful.  It might be that some or all of the call sites only handle
non-constants in practice, in which case the change is a harmless
no-op (and a saving of a few characters).

Otherwise, the target changes use gen_const_vec_duplicate instead
of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
They also include some changes to use CONSTxx_RTX for easy global
constants.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* emit-rtl.h (gen_const_vec_duplicate): Declare.
	(gen_vec_duplicate): Likewise.
	* emit-rtl.c (gen_const_vec_duplicate_1): New function, split
	out from...
	(gen_const_vector): ...here.
	(gen_const_vec_duplicate, gen_vec_duplicate): New functions.
	(gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
	whose elements are all equal.
	* optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
	* simplify-rtx.c (simplify_const_unary_operation): Likewise.
	(simplify_relational_operation): Likewise.
	* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
	Likewise.
	(aarch64_simd_dup_constant): Use gen_vec_duplicate.
	(aarch64_expand_vector_init): Likewise.
	* config/arm/arm.c (neon_vdup_constant): Likewise.
	(neon_expand_vector_init): Likewise.
	(arm_expand_vec_perm): Use gen_const_vec_duplicate.
	(arm_block_set_unaligned_vect): Likewise.
	(arm_block_set_aligned_vect): Likewise.
	* config/arm/neon.md (neon_copysignf<mode>): Likewise.
	* config/i386/i386.c (ix86_expand_vec_perm): Likewise.
	(expand_vec_perm_even_odd_pack): Likewise.
	(ix86_vector_duplicate_value): Use gen_vec_duplicate.
	* config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
	* config/ia64/ia64.c (ia64_expand_vecint_compare): Use
	gen_const_vec_duplicate.
	* config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
	* config/mips/mips.c (mips_gen_const_int_vector): Use
	gen_const_vec_duplicate.
	(mips_expand_vector_init): Use CONST0_RTX.
	* config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
	(define_split): Use gen_const_vec_duplicate.
	* config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
	(define_split): Use gen_const_vec_duplicate.
	* config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
	(vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
	* config/spu/spu.c (spu_const): Likewise.

Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 11:40:11.561479591 +0100
+++ gcc/emit-rtl.h	2017-10-23 11:41:32.369050264 +0100
@@ -438,6 +438,9 @@ get_max_uid (void)
   return crtl->emit.x_cur_insn_uid;
 }
 
+extern rtx gen_const_vec_duplicate (machine_mode, rtx);
+extern rtx gen_vec_duplicate (machine_mode, rtx);
+
 extern void set_decl_incoming_rtl (tree, rtx, bool);
 
 /* Return a memory reference like MEMREF, but with its mode changed
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 11:41:25.541909864 +0100
+++ gcc/emit-rtl.c	2017-10-23 11:41:32.369050264 +0100
@@ -5756,32 +5756,60 @@ init_emit (void)
 #endif
 }
 
-/* Generate a vector constant for mode MODE and constant value CONSTANT.  */
+/* Like gen_const_vec_duplicate, but ignore const_tiny_rtx.  */
 
 static rtx
-gen_const_vector (machine_mode mode, int constant)
+gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
 {
-  rtx tem;
-  rtvec v;
-  int units, i;
-  machine_mode inner;
+  int nunits = GET_MODE_NUNITS (mode);
+  rtvec v = rtvec_alloc (nunits);
+  for (int i = 0; i < nunits; ++i)
+    RTVEC_ELT (v, i) = el;
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
 
-  units = GET_MODE_NUNITS (mode);
-  inner = GET_MODE_INNER (mode);
+/* Generate a vector constant of mode MODE in which every element has
+   value ELT.  */
 
-  gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+rtx
+gen_const_vec_duplicate (machine_mode mode, rtx elt)
+{
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  if (elt == CONST0_RTX (inner_mode))
+    return CONST0_RTX (mode);
+  else if (elt == CONST1_RTX (inner_mode))
+    return CONST1_RTX (mode);
+  else if (elt == CONSTM1_RTX (inner_mode))
+    return CONSTM1_RTX (mode);
+
+  return gen_const_vec_duplicate_1 (mode, elt);
+}
 
-  v = rtvec_alloc (units);
+/* Return a vector rtx of mode MODE in which every element has value X.
+   The result will be a constant if X is constant.  */
 
-  /* We need to call this function after we set the scalar const_tiny_rtx
-     entries.  */
-  gcc_assert (const_tiny_rtx[constant][(int) inner]);
+rtx
+gen_vec_duplicate (machine_mode mode, rtx x)
+{
+  if (CONSTANT_P (x))
+    return gen_const_vec_duplicate (mode, x);
+  return gen_rtx_VEC_DUPLICATE (mode, x);
+}
 
-  for (i = 0; i < units; ++i)
-    RTVEC_ELT (v, i) = const_tiny_rtx[constant][(int) inner];
+/* Generate a new vector constant for mode MODE and constant value
+   CONSTANT.  */
 
-  tem = gen_rtx_raw_CONST_VECTOR (mode, v);
-  return tem;
+static rtx
+gen_const_vector (machine_mode mode, int constant)
+{
+  machine_mode inner = GET_MODE_INNER (mode);
+
+  gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+
+  rtx el = const_tiny_rtx[constant][(int) inner];
+  gcc_assert (el);
+
+  return gen_const_vec_duplicate_1 (mode, el);
 }
 
 /* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when
@@ -5789,28 +5817,12 @@ gen_const_vector (machine_mode mode, int
 rtx
 gen_rtx_CONST_VECTOR (machine_mode mode, rtvec v)
 {
-  machine_mode inner = GET_MODE_INNER (mode);
-  int nunits = GET_MODE_NUNITS (mode);
-  rtx x;
-  int i;
-
-  /* Check to see if all of the elements have the same value.  */
-  x = RTVEC_ELT (v, nunits - 1);
-  for (i = nunits - 2; i >= 0; i--)
-    if (RTVEC_ELT (v, i) != x)
-      break;
+  gcc_assert (GET_MODE_NUNITS (mode) == GET_NUM_ELEM (v));
 
   /* If the values are all the same, check to see if we can use one of the
      standard constant vectors.  */
-  if (i == -1)
-    {
-      if (x == CONST0_RTX (inner))
-	return CONST0_RTX (mode);
-      else if (x == CONST1_RTX (inner))
-	return CONST1_RTX (mode);
-      else if (x == CONSTM1_RTX (inner))
-	return CONSTM1_RTX (mode);
-    }
+  if (rtvec_all_equal_p (v))
+    return gen_const_vec_duplicate (mode, RTVEC_ELT (v, 0));
 
   return gen_rtx_raw_CONST_VECTOR (mode, v);
 }
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:41:23.502006982 +0100
+++ gcc/optabs.c	2017-10-23 11:41:32.369050264 +0100
@@ -377,13 +377,8 @@ expand_vector_broadcast (machine_mode vm
 
   gcc_checking_assert (VECTOR_MODE_P (vmode));
 
-  n = GET_MODE_NUNITS (vmode);
-  vec = rtvec_alloc (n);
-  for (i = 0; i < n; ++i)
-    RTVEC_ELT (vec, i) = op;
-
   if (CONSTANT_P (op))
-    return gen_rtx_CONST_VECTOR (vmode, vec);
+    return gen_const_vec_duplicate (vmode, op);
 
   /* ??? If the target doesn't have a vec_init, then we have no easy way
      of performing this operation.  Most of this sort of generic support
@@ -393,6 +388,10 @@ expand_vector_broadcast (machine_mode vm
   if (icode == CODE_FOR_nothing)
     return NULL;
 
+  n = GET_MODE_NUNITS (vmode);
+  vec = rtvec_alloc (n);
+  for (i = 0; i < n; ++i)
+    RTVEC_ELT (vec, i) = op;
   ret = gen_reg_rtx (vmode);
   emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));
 
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 11:41:25.549647760 +0100
+++ gcc/simplify-rtx.c	2017-10-23 11:41:32.370050264 +0100
@@ -1704,28 +1704,23 @@ simplify_const_unary_operation (enum rtx
 	  gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER
 						(GET_MODE (op)));
       }
-      if (CONST_SCALAR_INT_P (op) || CONST_DOUBLE_AS_FLOAT_P (op)
-	  || GET_CODE (op) == CONST_VECTOR)
+      if (CONST_SCALAR_INT_P (op) || CONST_DOUBLE_AS_FLOAT_P (op))
+	return gen_const_vec_duplicate (mode, op);
+      if (GET_CODE (op) == CONST_VECTOR)
 	{
 	  int elt_size = GET_MODE_UNIT_SIZE (mode);
           unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
 	  rtvec v = rtvec_alloc (n_elts);
 	  unsigned int i;
 
-	  if (GET_CODE (op) != CONST_VECTOR)
-	    for (i = 0; i < n_elts; i++)
-	      RTVEC_ELT (v, i) = op;
-	  else
-	    {
-	      machine_mode inmode = GET_MODE (op);
-	      int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
-              unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
-
-	      gcc_assert (in_n_elts < n_elts);
-	      gcc_assert ((n_elts % in_n_elts) == 0);
-	      for (i = 0; i < n_elts; i++)
-	        RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
-	    }
+	  machine_mode inmode = GET_MODE (op);
+	  int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
+	  unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
+
+	  gcc_assert (in_n_elts < n_elts);
+	  gcc_assert ((n_elts % in_n_elts) == 0);
+	  for (i = 0; i < n_elts; i++)
+	    RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
 	  return gen_rtx_CONST_VECTOR (mode, v);
 	}
     }
@@ -4632,20 +4627,13 @@ simplify_relational_operation (enum rtx_
 	    return CONST0_RTX (mode);
 #ifdef VECTOR_STORE_FLAG_VALUE
 	  {
-	    int i, units;
-	    rtvec v;
-
 	    rtx val = VECTOR_STORE_FLAG_VALUE (mode);
 	    if (val == NULL_RTX)
 	      return NULL_RTX;
 	    if (val == const1_rtx)
 	      return CONST1_RTX (mode);
 
-	    units = GET_MODE_NUNITS (mode);
-	    v = rtvec_alloc (units);
-	    for (i = 0; i < units; i++)
-	      RTVEC_ELT (v, i) = val;
-	    return gen_rtx_raw_CONST_VECTOR (mode, v);
+	    return gen_const_vec_duplicate (mode, val);
 	  }
 #else
 	  return NULL_RTX;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-10-23 11:41:23.125751780 +0100
+++ gcc/config/aarch64/aarch64.c	2017-10-23 11:41:32.352050263 +0100
@@ -11726,16 +11726,8 @@ aarch64_mov_operand_p (rtx x, machine_mo
 rtx
 aarch64_simd_gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
 {
-  int nunits = GET_MODE_NUNITS (mode);
-  rtvec v = rtvec_alloc (nunits);
-  int i;
-
-  rtx cache = GEN_INT (val);
-
-  for (i=0; i < nunits; i++)
-    RTVEC_ELT (v, i) = cache;
-
-  return gen_rtx_CONST_VECTOR (mode, v);
+  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+  return gen_const_vec_duplicate (mode, c);
 }
 
 /* Check OP is a legal scalar immediate for the MOVI instruction.  */
@@ -11947,7 +11939,7 @@ aarch64_simd_dup_constant (rtx vals)
      single ARM register.  This will be cheaper than a vector
      load.  */
   x = copy_to_mode_reg (inner_mode, x);
-  return gen_rtx_VEC_DUPLICATE (mode, x);
+  return gen_vec_duplicate (mode, x);
 }
 
 
@@ -12046,7 +12038,7 @@ aarch64_expand_vector_init (rtx target,
   if (all_same)
     {
       rtx x = copy_to_mode_reg (inner_mode, v0);
-      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+      aarch64_emit_move (target, gen_vec_duplicate (mode, x));
       return;
     }
 
@@ -12087,7 +12079,7 @@ aarch64_expand_vector_init (rtx target,
 
       /* Create a duplicate of the most common element.  */
       rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
-      aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+      aarch64_emit_move (target, gen_vec_duplicate (mode, x));
 
       /* Insert the rest.  */
       for (int i = 0; i < n_elts; i++)
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2017-10-23 11:41:22.965190434 +0100
+++ gcc/config/arm/arm.c	2017-10-23 11:41:32.355050263 +0100
@@ -12151,7 +12151,7 @@ neon_vdup_constant (rtx vals)
      load.  */
 
   x = copy_to_mode_reg (inner_mode, x);
-  return gen_rtx_VEC_DUPLICATE (mode, x);
+  return gen_vec_duplicate (mode, x);
 }
 
 /* Generate code to load VALS, which is a PARALLEL containing only
@@ -12246,7 +12246,7 @@ neon_expand_vector_init (rtx target, rtx
   if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
     {
       x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
-      emit_insn (gen_rtx_SET (target, gen_rtx_VEC_DUPLICATE (mode, x)));
+      emit_insn (gen_rtx_SET (target, gen_vec_duplicate (mode, x)));
       return;
     }
 
@@ -28731,9 +28731,9 @@ arm_expand_vec_perm_1 (rtx target, rtx o
 arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
 {
   machine_mode vmode = GET_MODE (target);
-  unsigned int i, nelt = GET_MODE_NUNITS (vmode);
+  unsigned int nelt = GET_MODE_NUNITS (vmode);
   bool one_vector_p = rtx_equal_p (op0, op1);
-  rtx rmask[MAX_VECT_LEN], mask;
+  rtx mask;
 
   /* TODO: ARM's VTBL indexing is little-endian.  In order to handle GCC's
      numbering of elements for big-endian, we must reverse the order.  */
@@ -28742,9 +28742,7 @@ arm_expand_vec_perm (rtx target, rtx op0
   /* The VTBL instruction does not use a modulo index, so we must take care
      of that ourselves.  */
   mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1);
-  for (i = 0; i < nelt; ++i)
-    rmask[i] = mask;
-  mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask));
+  mask = gen_const_vec_duplicate (vmode, mask);
   sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN);
 
   arm_expand_vec_perm_1 (target, op0, op1, sel);
@@ -29798,10 +29796,9 @@ arm_block_set_unaligned_vect (rtx dstbas
 			      unsigned HOST_WIDE_INT value,
 			      unsigned HOST_WIDE_INT align)
 {
-  unsigned int i, j, nelt_v16, nelt_v8, nelt_mode;
+  unsigned int i, nelt_v16, nelt_v8, nelt_mode;
   rtx dst, mem;
-  rtx val_elt, val_vec, reg;
-  rtx rval[MAX_VECT_LEN];
+  rtx val_vec, reg;
   rtx (*gen_func) (rtx, rtx);
   machine_mode mode;
   unsigned HOST_WIDE_INT v = value;
@@ -29829,12 +29826,9 @@ arm_block_set_unaligned_vect (rtx dstbas
   mem = adjust_automodify_address (dstbase, mode, dst, offset);
 
   v = sext_hwi (v, BITS_PER_WORD);
-  val_elt = GEN_INT (v);
-  for (j = 0; j < nelt_mode; j++)
-    rval[j] = val_elt;
 
   reg = gen_reg_rtx (mode);
-  val_vec = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (nelt_mode, rval));
+  val_vec = gen_const_vec_duplicate (mode, GEN_INT (v));
   /* Emit instruction loading the constant value.  */
   emit_move_insn (reg, val_vec);
 
@@ -29898,10 +29892,9 @@ arm_block_set_aligned_vect (rtx dstbase,
 			    unsigned HOST_WIDE_INT value,
 			    unsigned HOST_WIDE_INT align)
 {
-  unsigned int i, j, nelt_v8, nelt_v16, nelt_mode;
+  unsigned int i, nelt_v8, nelt_v16, nelt_mode;
   rtx dst, addr, mem;
-  rtx val_elt, val_vec, reg;
-  rtx rval[MAX_VECT_LEN];
+  rtx val_vec, reg;
   machine_mode mode;
   unsigned HOST_WIDE_INT v = value;
   unsigned int offset = 0;
@@ -29923,12 +29916,9 @@ arm_block_set_aligned_vect (rtx dstbase,
   dst = copy_addr_to_reg (XEXP (dstbase, 0));
 
   v = sext_hwi (v, BITS_PER_WORD);
-  val_elt = GEN_INT (v);
-  for (j = 0; j < nelt_mode; j++)
-    rval[j] = val_elt;
 
   reg = gen_reg_rtx (mode);
-  val_vec = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (nelt_mode, rval));
+  val_vec = gen_const_vec_duplicate (mode, GEN_INT (v));
   /* Emit instruction loading the constant value.  */
   emit_move_insn (reg, val_vec);
 
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md	2017-10-23 11:41:22.968092145 +0100
+++ gcc/config/arm/neon.md	2017-10-23 11:41:32.356050263 +0100
@@ -3052,15 +3052,10 @@ (define_expand "neon_copysignf<mode>"
   "{
      rtx v_bitmask_cast;
      rtx v_bitmask = gen_reg_rtx (<VCVTF:V_cmp_result>mode);
-     int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
-     rtvec v = rtvec_alloc (n_elt);
-
-     /* Create bitmask for vector select.  */
-     for (i = 0; i < n_elt; ++i)
-       RTVEC_ELT (v, i) = GEN_INT (0x80000000);
+     rtx c = GEN_INT (0x80000000);
 
      emit_move_insn (v_bitmask,
-		     gen_rtx_CONST_VECTOR (<VCVTF:V_cmp_result>mode, v));
+		     gen_const_vec_duplicate (<VCVTF:V_cmp_result>mode, c));
      emit_move_insn (operands[0], operands[2]);
      v_bitmask_cast = simplify_gen_subreg (<MODE>mode, v_bitmask,
 					   <VCVTF:V_cmp_result>mode, 0);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c	2017-10-23 11:41:22.913926872 +0100
+++ gcc/config/i386/i386.c	2017-10-23 11:41:32.360050263 +0100
@@ -24117,9 +24117,7 @@ ix86_expand_vec_perm (rtx operands[])
 	  t2 = gen_reg_rtx (V32QImode);
 	  t3 = gen_reg_rtx (V32QImode);
 	  vt2 = GEN_INT (-128);
-	  for (i = 0; i < 32; i++)
-	    vec[i] = vt2;
-	  vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
+	  vt = gen_const_vec_duplicate (V32QImode, vt2);
 	  vt = force_reg (V32QImode, vt);
 	  for (i = 0; i < 32; i++)
 	    vec[i] = i < 16 ? vt2 : const0_rtx;
@@ -24227,9 +24225,7 @@ ix86_expand_vec_perm (rtx operands[])
       vt = GEN_INT (w - 1);
     }
 
-  for (i = 0; i < w; i++)
-    vec[i] = vt;
-  vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+  vt = gen_const_vec_duplicate (maskmode, vt);
   mask = expand_simple_binop (maskmode, AND, mask, vt,
 			      NULL_RTX, 0, OPTAB_DIRECT);
 
@@ -24319,9 +24315,7 @@ ix86_expand_vec_perm (rtx operands[])
 	  e = w = 4;
 	}
 
-      for (i = 0; i < w; i++)
-	vec[i] = vt;
-      vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+      vt = gen_const_vec_duplicate (maskmode, vt);
       vt = force_reg (maskmode, vt);
       mask = expand_simple_binop (maskmode, AND, mask, vt,
 				  NULL_RTX, 0, OPTAB_DIRECT);
@@ -40814,7 +40808,7 @@ ix86_vector_duplicate_value (machine_mod
   rtx dup;
 
   /* First attempt to recognize VAL as-is.  */
-  dup = gen_rtx_VEC_DUPLICATE (mode, val);
+  dup = gen_vec_duplicate (mode, val);
   insn = emit_insn (gen_rtx_SET (target, dup));
   if (recog_memoized (insn) < 0)
     {
@@ -46120,7 +46114,7 @@ expand_vec_perm_vpshufb2_vpermq_even_odd
 static bool
 expand_vec_perm_even_odd_pack (struct expand_vec_perm_d *d)
 {
-  rtx op, dop0, dop1, t, rperm[16];
+  rtx op, dop0, dop1, t;
   unsigned i, odd, c, s, nelt = d->nelt;
   bool end_perm = false;
   machine_mode half_mode;
@@ -46197,9 +46191,7 @@ expand_vec_perm_even_odd_pack (struct ex
   dop1 = gen_reg_rtx (half_mode);
   if (odd == 0)
     {
-      for (i = 0; i < nelt / 2; i++)
-	rperm[i] = GEN_INT (c);
-      t = gen_rtx_CONST_VECTOR (half_mode, gen_rtvec_v (nelt / 2, rperm));
+      t = gen_const_vec_duplicate (half_mode, GEN_INT (c));
       t = force_reg (half_mode, t);
       emit_insn (gen_and (dop0, t, gen_lowpart (half_mode, d->op0)));
       emit_insn (gen_and (dop1, t, gen_lowpart (half_mode, d->op1)));
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md	2017-10-23 11:41:22.905221739 +0100
+++ gcc/config/i386/sse.md	2017-10-23 11:41:32.362050263 +0100
@@ -11529,13 +11529,7 @@ (define_expand "one_cmpl<mode>2"
 		(match_dup 2)))]
   "TARGET_SSE"
 {
-  int i, n = GET_MODE_NUNITS (<MODE>mode);
-  rtvec v = rtvec_alloc (n);
-
-  for (i = 0; i < n; ++i)
-    RTVEC_ELT (v, i) = constm1_rtx;
-
-  operands[2] = force_reg (<MODE>mode, gen_rtx_CONST_VECTOR (<MODE>mode, v));
+  operands[2] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
 })
 
 (define_expand "<sse2_avx2>_andnot<mode>3"
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c	2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/ia64/ia64.c	2017-10-23 11:41:32.363050263 +0100
@@ -1938,7 +1938,7 @@ ia64_expand_vecint_compare (enum rtx_cod
 	    /* Subtract (-(INT MAX) - 1) from both operands to make
 	       them signed.  */
 	    mask = gen_int_mode (0x80000000, SImode);
-	    mask = gen_rtx_CONST_VECTOR (V2SImode, gen_rtvec (2, mask, mask));
+	    mask = gen_const_vec_duplicate (V2SImode, mask);
 	    mask = force_reg (mode, mask);
 	    t1 = gen_reg_rtx (mode);
 	    emit_insn (gen_subv2si3 (t1, op0, mask));
Index: gcc/config/ia64/vect.md
===================================================================
--- gcc/config/ia64/vect.md	2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/ia64/vect.md	2017-10-23 11:41:32.363050263 +0100
@@ -1138,8 +1138,7 @@ (define_expand "addv2sf3"
 		  (match_operand:V2SF 2 "fr_register_operand" "")))]
   ""
 {
-  rtvec v = gen_rtvec (2, CONST1_RTX (SFmode), CONST1_RTX (SFmode));
-  operands[3] = force_reg (V2SFmode, gen_rtx_CONST_VECTOR (V2SFmode, v));
+  operands[3] = force_reg (V2SFmode, CONST1_RTX (V2SFmode));
 })
 
 (define_expand "subv2sf3"
@@ -1150,8 +1149,7 @@ (define_expand "subv2sf3"
 	  (neg:V2SF (match_operand:V2SF 2 "fr_register_operand" ""))))]
   ""
 {
-  rtvec v = gen_rtvec (2, CONST1_RTX (SFmode), CONST1_RTX (SFmode));
-  operands[3] = force_reg (V2SFmode, gen_rtx_CONST_VECTOR (V2SFmode, v));
+  operands[3] = force_reg (V2SFmode, CONST1_RTX (V2SFmode));
 })
 
 (define_insn "mulv2sf3"
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2017-10-23 11:41:22.797858429 +0100
+++ gcc/config/mips/mips.c	2017-10-23 11:41:32.365050264 +0100
@@ -21681,14 +21681,8 @@ mips_expand_vi_broadcast (machine_mode v
 rtx
 mips_gen_const_int_vector (machine_mode mode, HOST_WIDE_INT val)
 {
-  int nunits = GET_MODE_NUNITS (mode);
-  rtvec v = rtvec_alloc (nunits);
-  int i;
-
-  for (i = 0; i < nunits; i++)
-    RTVEC_ELT (v, i) = gen_int_mode (val, GET_MODE_INNER (mode));
-
-  return gen_rtx_CONST_VECTOR (mode, v);
+  rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+  return gen_const_vec_duplicate (mode, c);
 }
 
 /* Return a vector of repeated 4-element sets generated from
@@ -21843,12 +21837,7 @@ mips_expand_vector_init (rtx target, rtx
 	}
       else
 	{
-	  rtvec vec = shallow_copy_rtvec (XVEC (vals, 0));
-
-	  for (i = 0; i < nelt; ++i)
-	    RTVEC_ELT (vec, i) = CONST0_RTX (imode);
-
-	  emit_move_insn (target, gen_rtx_CONST_VECTOR (vmode, vec));
+	  emit_move_insn (target, CONST0_RTX (vmode));
 
 	  for (i = 0; i < nelt; ++i)
 	    {
Index: gcc/config/powerpcspe/altivec.md
===================================================================
--- gcc/config/powerpcspe/altivec.md	2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/powerpcspe/altivec.md	2017-10-23 11:41:32.366050264 +0100
@@ -352,12 +352,10 @@ (define_split
   HOST_WIDE_INT val = const_vector_elt_as_int (op1, elt);
   rtx rtx_val = GEN_INT (val);
   int shift = vspltis_shifted (op1);
-  int nunits = GET_MODE_NUNITS (<MODE>mode);
-  int i;
 
   gcc_assert (shift != 0);
   operands[2] = gen_reg_rtx (<MODE>mode);
-  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, rtvec_alloc (nunits));
+  operands[3] = gen_const_vec_duplicate (<MODE>mode, rtx_val);
   operands[4] = gen_reg_rtx (<MODE>mode);
 
   if (shift < 0)
@@ -370,10 +368,6 @@ (define_split
       operands[5] = CONST0_RTX (<MODE>mode);
       operands[6] = GEN_INT (shift);
     }
-
-  /* Populate the constant vectors.  */
-  for (i = 0; i < nunits; i++)
-    XVECEXP (operands[3], 0, i) = rtx_val;
 })
 
 (define_insn "get_vrsave_internal"
@@ -2752,15 +2746,8 @@ (define_expand "abs<mode>2"
         (smax:VI2 (match_dup 1) (match_dup 4)))]
   "<VI_unit>"
 {
-  int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
-  rtvec v = rtvec_alloc (n_elt);
-
-  /* Create an all 0 constant.  */
-  for (i = 0; i < n_elt; ++i)
-    RTVEC_ELT (v, i) = const0_rtx;
-
   operands[2] = gen_reg_rtx (<MODE>mode);
-  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+  operands[3] = CONST0_RTX (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
 })
 
@@ -2777,17 +2764,8 @@ (define_expand "nabs<mode>2"
         (smin:VI2 (match_dup 1) (match_dup 4)))]
   "<VI_unit>"
 {
-  int i;
-  int n_elt = GET_MODE_NUNITS (<MODE>mode);
-
-  rtvec v = rtvec_alloc (n_elt);
-
-  /* Create an all 0 constant.  */
-  for (i = 0; i < n_elt; ++i)
-    RTVEC_ELT (v, i) = const0_rtx;
-
   operands[2] = gen_reg_rtx (<MODE>mode);
-  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+  operands[3] = CONST0_RTX (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
 })
 
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/rs6000/altivec.md	2017-10-23 11:41:32.366050264 +0100
@@ -363,12 +363,10 @@ (define_split
   HOST_WIDE_INT val = const_vector_elt_as_int (op1, elt);
   rtx rtx_val = GEN_INT (val);
   int shift = vspltis_shifted (op1);
-  int nunits = GET_MODE_NUNITS (<MODE>mode);
-  int i;
 
   gcc_assert (shift != 0);
   operands[2] = gen_reg_rtx (<MODE>mode);
-  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, rtvec_alloc (nunits));
+  operands[3] = gen_const_vec_duplicate (<MODE>mode, rtx_val);
   operands[4] = gen_reg_rtx (<MODE>mode);
 
   if (shift < 0)
@@ -381,10 +379,6 @@ (define_split
       operands[5] = CONST0_RTX (<MODE>mode);
       operands[6] = GEN_INT (shift);
     }
-
-  /* Populate the constant vectors.  */
-  for (i = 0; i < nunits; i++)
-    XVECEXP (operands[3], 0, i) = rtx_val;
 })
 
 (define_insn "get_vrsave_internal"
@@ -3237,15 +3231,8 @@ (define_expand "abs<mode>2"
         (smax:VI2 (match_dup 1) (match_dup 4)))]
   "<VI_unit>"
 {
-  int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
-  rtvec v = rtvec_alloc (n_elt);
-
-  /* Create an all 0 constant.  */
-  for (i = 0; i < n_elt; ++i)
-    RTVEC_ELT (v, i) = const0_rtx;
-
   operands[2] = gen_reg_rtx (<MODE>mode);
-  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+  operands[3] = CONST0_RTX (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
 })
 
@@ -3262,17 +3249,8 @@ (define_expand "nabs<mode>2"
         (smin:VI2 (match_dup 1) (match_dup 4)))]
   "<VI_unit>"
 {
-  int i;
-  int n_elt = GET_MODE_NUNITS (<MODE>mode);
-
-  rtvec v = rtvec_alloc (n_elt);
-
-  /* Create an all 0 constant.  */
-  for (i = 0; i < n_elt; ++i)
-    RTVEC_ELT (v, i) = const0_rtx;
-
   operands[2] = gen_reg_rtx (<MODE>mode);
-  operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+  operands[3] = CONST0_RTX (<MODE>mode);
   operands[4] = gen_reg_rtx (<MODE>mode);
 })
 
Index: gcc/config/s390/vx-builtins.md
===================================================================
--- gcc/config/s390/vx-builtins.md	2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/s390/vx-builtins.md	2017-10-23 11:41:32.367050264 +0100
@@ -91,12 +91,10 @@ (define_expand "vec_genmask<mode>"
    (match_operand:QI    2 "const_int_operand" "C")]
   "TARGET_VX"
 {
-  int nunits = GET_MODE_NUNITS (<VI_HW:MODE>mode);
   int bitlen = GET_MODE_UNIT_BITSIZE (<VI_HW:MODE>mode);
   /* To bit little endian style.  */
   int end = bitlen - 1 - INTVAL (operands[1]);
   int start = bitlen - 1 - INTVAL (operands[2]);
-  rtx const_vec[16];
   int i;
   unsigned HOST_WIDE_INT mask;
   bool swapped_p = false;
@@ -116,13 +114,11 @@ (define_expand "vec_genmask<mode>"
   if (swapped_p)
     mask = ~mask;
 
-  for (i = 0; i < nunits; i++)
-    const_vec[i] = GEN_INT (trunc_int_for_mode (mask,
-			      GET_MODE_INNER (<VI_HW:MODE>mode)));
+  rtx mask_rtx = gen_int_mode (mask, GET_MODE_INNER (<VI_HW:MODE>mode));
 
   emit_insn (gen_rtx_SET (operands[0],
-			  gen_rtx_CONST_VECTOR (<VI_HW:MODE>mode,
-						gen_rtvec_v (nunits, const_vec))));
+			  gen_const_vec_duplicate (<VI_HW:MODE>mode,
+						   mask_rtx)));
   DONE;
 })
 
@@ -1623,7 +1619,7 @@ (define_expand "vec_ctd_s64"
   real_2expN (&f, -INTVAL (operands[2]), DFmode);
   c = const_double_from_real_value (f, DFmode);
 
-  operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+  operands[3] = gen_const_vec_duplicate (V2DFmode, c);
   operands[3] = force_reg (V2DFmode, operands[3]);
 })
 
@@ -1654,7 +1650,7 @@ (define_expand "vec_ctd_u64"
   real_2expN (&f, -INTVAL (operands[2]), DFmode);
   c = const_double_from_real_value (f, DFmode);
 
-  operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+  operands[3] = gen_const_vec_duplicate (V2DFmode, c);
   operands[3] = force_reg (V2DFmode, operands[3]);
 })
 
@@ -1686,7 +1682,7 @@ (define_expand "vec_ctsl"
   real_2expN (&f, INTVAL (operands[2]), DFmode);
   c = const_double_from_real_value (f, DFmode);
 
-  operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+  operands[3] = gen_const_vec_duplicate (V2DFmode, c);
   operands[3] = force_reg (V2DFmode, operands[3]);
   operands[4] = gen_reg_rtx (V2DFmode);
 })
@@ -1719,7 +1715,7 @@ (define_expand "vec_ctul"
   real_2expN (&f, INTVAL (operands[2]), DFmode);
   c = const_double_from_real_value (f, DFmode);
 
-  operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+  operands[3] = gen_const_vec_duplicate (V2DFmode, c);
   operands[3] = force_reg (V2DFmode, operands[3]);
   operands[4] = gen_reg_rtx (V2DFmode);
 })
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c	2017-10-23 11:41:23.057077951 +0100
+++ gcc/config/spu/spu.c	2017-10-23 11:41:32.368050264 +0100
@@ -1903,8 +1903,6 @@ spu_return_addr (int count, rtx frame AT
 spu_const (machine_mode mode, HOST_WIDE_INT val)
 {
   rtx inner;
-  rtvec v;
-  int units, i;
 
   gcc_assert (GET_MODE_CLASS (mode) == MODE_INT
 	      || GET_MODE_CLASS (mode) == MODE_FLOAT
@@ -1923,14 +1921,7 @@ spu_const (machine_mode mode, HOST_WIDE_
   else 
     inner = hwint_to_const_double (GET_MODE_INNER (mode), val);
 
-  units = GET_MODE_NUNITS (mode);
-
-  v = rtvec_alloc (units);
-
-  for (i = 0; i < units; ++i)
-    RTVEC_ELT (v, i) = inner;
-
-  return gen_rtx_CONST_VECTOR (mode, v);
+  return gen_const_vec_duplicate (mode, inner);
 }
 
 /* Create a MODE vector constant from 4 ints. */

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [03/nn] Allow vector CONSTs
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
  2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
@ 2017-10-23 11:19 ` Richard Sandiford
  2017-10-25 16:59   ` Jeff Law
  2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
                   ` (19 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:19 UTC (permalink / raw)
  To: gcc-patches

This patch allows (const ...) wrappers to be used for rtx vector
constants, as an alternative to const_vector.  This is useful
for SVE, where the number of elements isn't known until runtime.

It could also be useful in future for fixed-length vectors, to
reduce the amount of memory needed to represent simple constants
with high element counts.  However, one nice thing about keeping
it restricted to variable-length vectors is that there is never
any need to handle combinations of (const ...) and CONST_VECTOR.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/rtl.texi (const): Update description of address constants.
	Say that vector constants are allowed too.
	* common.md (E, F): Use CONSTANT_P instead of checking for
	CONST_VECTOR.
	* emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
	checking for CONST_VECTOR.
	* expmed.c (make_tree): Use build_vector_from_val for a CONST
	VEC_DUPLICATE.
	* expr.c (expand_expr_real_2): Check for vector modes instead
	of checking for CONST_VECTOR.
	* rtl.h (const_vec_p): New function.
	(const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
	(unwrap_const_vec_duplicate): Handle them here too.


Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	2017-10-23 11:41:22.176892260 +0100
+++ gcc/doc/rtl.texi	2017-10-23 11:41:39.185050437 +0100
@@ -1667,14 +1667,17 @@ Usually that is the only mode for which
 
 @findex const
 @item (const:@var{m} @var{exp})
-Represents a constant that is the result of an assembly-time
-arithmetic computation.  The operand, @var{exp}, is an expression that
-contains only constants (@code{const_int}, @code{symbol_ref} and
-@code{label_ref} expressions) combined with @code{plus} and
-@code{minus}.  However, not all combinations are valid, since the
-assembler cannot do arbitrary arithmetic on relocatable symbols.
+Wraps an rtx computation @var{exp} whose inputs and result do not
+change during the execution of a thread.  There are two valid uses.
+The first is to represent a global or thread-local address calculation.
+In this case @var{exp} should contain @code{const_int},
+@code{symbol_ref}, @code{label_ref} or @code{unspec} expressions,
+combined with @code{plus} and @code{minus}.  Any such @code{unspec}s
+are target-specific and typically represent some form of relocation
+operator.  @var{m} should be a valid address mode.
 
-@var{m} should be @code{Pmode}.
+The second use of @code{const} is to wrap a vector operation.
+In this case @var{exp} must be a @code{vec_duplicate} expression.
 
 @findex high
 @item (high:@var{m} @var{exp})
Index: gcc/common.md
===================================================================
--- gcc/common.md	2017-10-23 11:40:11.431285821 +0100
+++ gcc/common.md	2017-10-23 11:41:39.184050436 +0100
@@ -80,14 +80,14 @@ (define_constraint "n"
 (define_constraint "E"
   "Matches a floating-point constant."
   (ior (match_test "CONST_DOUBLE_AS_FLOAT_P (op)")
-       (match_test "GET_CODE (op) == CONST_VECTOR
+       (match_test "CONSTANT_P (op)
 		    && GET_MODE_CLASS (GET_MODE (op)) == MODE_VECTOR_FLOAT")))
 
 ;; There is no longer a distinction between "E" and "F".
 (define_constraint "F"
   "Matches a floating-point constant."
   (ior (match_test "CONST_DOUBLE_AS_FLOAT_P (op)")
-       (match_test "GET_CODE (op) == CONST_VECTOR
+       (match_test "CONSTANT_P (op)
 		    && GET_MODE_CLASS (GET_MODE (op)) == MODE_VECTOR_FLOAT")))
 
 (define_constraint "X"
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 11:41:32.369050264 +0100
+++ gcc/emit-rtl.c	2017-10-23 11:41:39.186050437 +0100
@@ -1470,7 +1470,7 @@ gen_lowpart_common (machine_mode mode, r
 	return gen_rtx_fmt_e (GET_CODE (x), int_mode, XEXP (x, 0));
     }
   else if (GET_CODE (x) == SUBREG || REG_P (x)
-	   || GET_CODE (x) == CONCAT || GET_CODE (x) == CONST_VECTOR
+	   || GET_CODE (x) == CONCAT || const_vec_p (x)
 	   || CONST_DOUBLE_AS_FLOAT_P (x) || CONST_SCALAR_INT_P (x))
     return lowpart_subreg (mode, x, innermode);
 
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 11:41:25.541909864 +0100
+++ gcc/expmed.c	2017-10-23 11:41:39.186050437 +0100
@@ -5246,7 +5246,15 @@ make_tree (tree type, rtx x)
       return fold_convert (type, make_tree (t, XEXP (x, 0)));
 
     case CONST:
-      return make_tree (type, XEXP (x, 0));
+      {
+	rtx op = XEXP (x, 0);
+	if (GET_CODE (op) == VEC_DUPLICATE)
+	  {
+	    tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
+	    return build_vector_from_val (type, elt_tree);
+	  }
+	return make_tree (type, op);
+      }
 
     case SYMBOL_REF:
       t = SYMBOL_REF_DECL (x);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 11:41:24.408308073 +0100
+++ gcc/expr.c	2017-10-23 11:41:39.187050437 +0100
@@ -9429,7 +9429,7 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       /* Careful here: if the target doesn't support integral vector modes,
 	 a constant selection vector could wind up smooshed into a normal
 	 integral constant.  */
-      if (CONSTANT_P (op2) && GET_CODE (op2) != CONST_VECTOR)
+      if (CONSTANT_P (op2) && !VECTOR_MODE_P (GET_MODE (op2)))
 	{
 	  tree sel_type = TREE_TYPE (treeop2);
 	  machine_mode vmode
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 11:41:36.307050364 +0100
+++ gcc/rtl.h	2017-10-23 11:41:39.188050437 +0100
@@ -2749,12 +2749,22 @@ extern rtx shallow_copy_rtx (const_rtx C
 extern int rtx_equal_p (const_rtx, const_rtx);
 extern bool rtvec_all_equal_p (const_rtvec);
 
+/* Return true if X is some form of vector constant.  */
+
+inline bool
+const_vec_p (const_rtx x)
+{
+  return VECTOR_MODE_P (GET_MODE (x)) && CONSTANT_P (x);
+}
+
 /* Return true if X is a vector constant with a duplicated element value.  */
 
 inline bool
 const_vec_duplicate_p (const_rtx x)
 {
-  return GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0));
+  return ((GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0)))
+	  || (GET_CODE (x) == CONST
+	      && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE));
 }
 
 /* Return true if X is a vector constant with a duplicated element value.
@@ -2764,11 +2774,16 @@ const_vec_duplicate_p (const_rtx x)
 inline bool
 const_vec_duplicate_p (T x, T *elt)
 {
-  if (const_vec_duplicate_p (x))
+  if (GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0)))
     {
       *elt = CONST_VECTOR_ELT (x, 0);
       return true;
     }
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+    {
+      *elt = XEXP (XEXP (x, 0), 0);
+      return true;
+    }
   return false;
 }
 
@@ -2794,8 +2809,10 @@ vec_duplicate_p (T x, T *elt)
 inline T
 unwrap_const_vec_duplicate (T x)
 {
-  if (const_vec_duplicate_p (x))
-    x = CONST_VECTOR_ELT (x, 0);
+  if (GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0)))
+    return CONST_VECTOR_ELT (x, 0);
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+    return XEXP (XEXP (x, 0), 0);
   return x;
 }
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [02/nn] Add more vec_duplicate simplifications
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
  2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
  2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
@ 2017-10-23 11:19 ` Richard Sandiford
  2017-10-25 16:35   ` Jeff Law
  2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
                   ` (18 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:19 UTC (permalink / raw)
  To: gcc-patches

This patch adds a vec_duplicate_p helper that tests for constant
or non-constant vector duplicates.  Together with the existing
const_vec_duplicate_p, this complements the gen_vec_duplicate
and gen_const_vec_duplicate added by a previous patch.

The patch uses the new routines to add more rtx simplifications
involving vector duplicates.  These mirror simplifications that
we already do for CONST_VECTOR broadcasts and are needed for
variable-length SVE, which uses:

  (const:M (vec_duplicate:M X))

to represent constant broadcasts instead.  The simplifications do
trigger on the testsuite for variable duplicates too, and in each
case I saw the change was an improvement.  E.g.:

- Several targets had this simplification in gcc.dg/pr49948.c
  when compiled at -O3:

    -Failed to match this instruction:
    +Successfully matched this instruction:
     (set (reg:DI 88)
    -    (subreg:DI (vec_duplicate:V2DI (reg/f:DI 75 [ _4 ])) 0))
    +    (reg/f:DI 75 [ _4 ]))

  On aarch64 this gives:

            ret
            .p2align 2
     .L8:
    +       adrp    x1, b
            sub     sp, sp, #80
    -       adrp    x2, b
    -       add     x1, sp, 12
    +       add     x2, sp, 12
            str     wzr, [x0, #:lo12:a]
    +       str     x2, [x1, #:lo12:b]
            mov     w0, 0
    -       dup     v0.2d, x1
    -       str     d0, [x2, #:lo12:b]
            add     sp, sp, 80
            ret
            .size   foo, .-foo

  On x86_64:

            jg      .L2
            leaq    -76(%rsp), %rax
            movl    $0, a(%rip)
    -       movq    %rax, -96(%rsp)
    -       movq    -96(%rsp), %xmm0
    -       punpcklqdq      %xmm0, %xmm0
    -       movq    %xmm0, b(%rip)
    +       movq    %rax, b(%rip)
     .L2:
            xorl    %eax, %eax
            ret

  etc.

- gcc.dg/torture/pr58018.c compiled at -O3 on aarch64 has an instance of:

     Trying 50, 52, 46 -> 53:
     Failed to match this instruction:
     (set (reg:V4SI 167)
    -    (and:V4SI (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
    -            (reg:V4SI 209))
    -        (const_vector:V4SI [
    -                (const_int 1 [0x1])
    -                (const_int 1 [0x1])
    -                (const_int 1 [0x1])
    -                (const_int 1 [0x1])
    -            ])))
    +    (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
    +        (reg:V4SI 209)))
     Successfully matched this instruction:
     (set (reg:V4SI 163 [ vect_patt_16.14 ])
         (vec_duplicate:V4SI (reg:SI 132 [ _165 ])))
    +Successfully matched this instruction:
    +(set (reg:V4SI 167)
    +    (and:V4SI (reg:V4SI 163 [ vect_patt_16.14 ])
    +        (reg:V4SI 209)))

  where (reg:SI 132) is the result of a scalar comparison and so
  is known to be 0 or 1.  This saves a MOVI and vector AND:

            cmp     w7, 4
            bls     .L15
            dup     v1.4s, w2
    -       lsr     w2, w1, 2
    +       dup     v2.4s, w6
            movi    v3.4s, 0
    -       mov     w0, 0
    -       movi    v2.4s, 0x1
    +       lsr     w2, w1, 2
            mvni    v0.4s, 0
    +       mov     w0, 0
            cmge    v1.4s, v1.4s, v3.4s
            and     v1.16b, v2.16b, v1.16b
    -       dup     v2.4s, w6
    -       and     v1.16b, v1.16b, v2.16b
            .p2align 3
     .L7:
            and     v0.16b, v0.16b, v1.16b

- powerpc64le has many instances of things like:

    -Failed to match this instruction:
    +Successfully matched this instruction:
     (set (reg:V4SI 161 [ vect_cst__24 ])
    -    (vec_select:V4SI (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
    -                (parallel [
    -                        (const_int 0 [0])
    -                    ])))
    -        (parallel [
    -                (const_int 2 [0x2])
    -                (const_int 3 [0x3])
    -                (const_int 0 [0])
    -                (const_int 1 [0x1])
    -            ])))
    +    (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
    +            (parallel [
    +                    (const_int 0 [0])
    +                ]))))

  This removes redundant XXPERMDIs from many tests.

The best way of testing the new simplifications seemed to be
via selftests.  The patch cribs part of David's patch here:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    David Malcolm  <dmalcolm@redhat.com>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (vec_duplicate_p): New function.
	* selftest-rtl.c (assert_rtx_eq_at): New function.
	* selftest-rtl.h (ASSERT_RTX_EQ): New macro.
	(assert_rtx_eq_at): Declare.
	* selftest.h (selftest::simplify_rtx_c_tests): Declare.
	* selftest-run-tests.c (selftest::run_tests): Call it.
	* simplify-rtx.c: Include selftest.h and selftest-rtl.h.
	(simplify_unary_operation_1): Recursively handle vector duplicates.
	(simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
	vector duplicates.
	(simplify_subreg): Handle subregs of vector duplicates.
	(make_test_reg, test_vector_ops_duplicate, test_vector_ops)
	(selftest::simplify_rtx_c_tests): New functions.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 11:40:11.485292126 +0100
+++ gcc/rtl.h	2017-10-23 11:41:36.307050364 +0100
@@ -2772,6 +2772,21 @@ const_vec_duplicate_p (T x, T *elt)
   return false;
 }
 
+/* Return true if X is a vector with a duplicated element value, either
+   constant or nonconstant.  Store the duplicated element in *ELT if so.  */
+
+template <typename T>
+inline bool
+vec_duplicate_p (T x, T *elt)
+{
+  if (GET_CODE (x) == VEC_DUPLICATE)
+    {
+      *elt = XEXP (x, 0);
+      return true;
+    }
+  return const_vec_duplicate_p (x, elt);
+}
+
 /* If X is a vector constant with a duplicated element value, return that
    element value, otherwise return X.  */
 
Index: gcc/selftest-rtl.c
===================================================================
--- gcc/selftest-rtl.c	2017-10-23 11:40:11.485292126 +0100
+++ gcc/selftest-rtl.c	2017-10-23 11:41:36.307050364 +0100
@@ -35,6 +35,29 @@ Software Foundation; either version 3, o
 
 namespace selftest {
 
+/* Compare rtx EXPECTED and ACTUAL using rtx_equal_p, calling
+   ::selftest::pass if they are equal, aborting if they are non-equal.
+   LOC is the effective location of the assertion, MSG describes it.  */
+
+void
+assert_rtx_eq_at (const location &loc, const char *msg,
+		  rtx expected, rtx actual)
+{
+  if (rtx_equal_p (expected, actual))
+    ::selftest::pass (loc, msg);
+  else
+    {
+      fprintf (stderr, "%s:%i: %s: FAIL: %s\n", loc.m_file, loc.m_line,
+	       loc.m_function, msg);
+      fprintf (stderr, "  expected: ");
+      print_rtl (stderr, expected);
+      fprintf (stderr, "\n  actual: ");
+      print_rtl (stderr, actual);
+      fprintf (stderr, "\n");
+      abort ();
+    }
+}
+
 /* Compare rtx EXPECTED and ACTUAL by pointer equality, calling
    ::selftest::pass if they are equal, aborting if they are non-equal.
    LOC is the effective location of the assertion, MSG describes it.  */
Index: gcc/selftest-rtl.h
===================================================================
--- gcc/selftest-rtl.h	2017-10-23 11:40:11.485292126 +0100
+++ gcc/selftest-rtl.h	2017-10-23 11:41:36.307050364 +0100
@@ -47,6 +47,15 @@ #define ASSERT_RTL_DUMP_EQ_WITH_REUSE(EX
   assert_rtl_dump_eq (SELFTEST_LOCATION, (EXPECTED_DUMP), (RTX), \
 		      (REUSE_MANAGER))
 
+#define ASSERT_RTX_EQ(EXPECTED, ACTUAL) 				\
+  SELFTEST_BEGIN_STMT							\
+  const char *desc = "ASSERT_RTX_EQ (" #EXPECTED ", " #ACTUAL ")";	\
+  ::selftest::assert_rtx_eq_at (SELFTEST_LOCATION, desc, (EXPECTED),	\
+				(ACTUAL));				\
+  SELFTEST_END_STMT
+
+extern void assert_rtx_eq_at (const location &, const char *, rtx, rtx);
+
 /* Evaluate rtx EXPECTED and ACTUAL and compare them with ==
    (i.e. pointer equality), calling ::selftest::pass if they are
    equal, aborting if they are non-equal.  */
Index: gcc/selftest.h
===================================================================
--- gcc/selftest.h	2017-10-23 11:41:25.513859990 +0100
+++ gcc/selftest.h	2017-10-23 11:41:36.308050364 +0100
@@ -198,6 +198,7 @@ extern void tree_cfg_c_tests ();
 extern void vec_c_tests ();
 extern void wide_int_cc_tests ();
 extern void predict_c_tests ();
+extern void simplify_rtx_c_tests ();
 
 extern int num_passes;
 
Index: gcc/selftest-run-tests.c
===================================================================
--- gcc/selftest-run-tests.c	2017-10-23 11:41:25.872704926 +0100
+++ gcc/selftest-run-tests.c	2017-10-23 11:41:36.308050364 +0100
@@ -94,6 +94,7 @@ selftest::run_tests ()
 
   store_merging_c_tests ();
   predict_c_tests ();
+  simplify_rtx_c_tests ();
 
   /* Run any lang-specific selftests.  */
   lang_hooks.run_lang_selftests ();
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 11:41:32.370050264 +0100
+++ gcc/simplify-rtx.c	2017-10-23 11:41:36.309050364 +0100
@@ -33,6 +33,8 @@ Software Foundation; either version 3, o
 #include "diagnostic-core.h"
 #include "varasm.h"
 #include "flags.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
 
 /* Simplification and canonicalization of RTL.  */
 
@@ -925,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
 simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
 {
   enum rtx_code reversed;
-  rtx temp;
+  rtx temp, elt;
   scalar_int_mode inner, int_mode, op_mode, op0_mode;
 
   switch (code)
@@ -1681,6 +1683,28 @@ simplify_unary_operation_1 (enum rtx_cod
       break;
     }
 
+  if (VECTOR_MODE_P (mode) && vec_duplicate_p (op, &elt))
+    {
+      /* Try applying the operator to ELT and see if that simplifies.
+	 We can duplicate the result if so.
+
+	 The reason we don't use simplify_gen_unary is that it isn't
+	 necessarily a win to convert things like:
+
+	   (neg:V (vec_duplicate:V (reg:S R)))
+
+	 to:
+
+	   (vec_duplicate:V (neg:S (reg:S R)))
+
+	 The first might be done entirely in vector registers while the
+	 second might need a move between register files.  */
+      temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
+				       elt, GET_MODE_INNER (GET_MODE (op)));
+      if (temp)
+	return gen_vec_duplicate (mode, temp);
+    }
+
   return 0;
 }
 
@@ -2138,7 +2162,7 @@ simplify_binary_operation (enum rtx_code
 simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 			     rtx op0, rtx op1, rtx trueop0, rtx trueop1)
 {
-  rtx tem, reversed, opleft, opright;
+  rtx tem, reversed, opleft, opright, elt0, elt1;
   HOST_WIDE_INT val;
   unsigned int width = GET_MODE_PRECISION (mode);
   scalar_int_mode int_mode, inner_mode;
@@ -3480,6 +3504,9 @@ simplify_binary_operation_1 (enum rtx_co
 	  gcc_assert (XVECLEN (trueop1, 0) == 1);
 	  gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0)));
 
+	  if (vec_duplicate_p (trueop0, &elt0))
+	    return elt0;
+
 	  if (GET_CODE (trueop0) == CONST_VECTOR)
 	    return CONST_VECTOR_ELT (trueop0, INTVAL (XVECEXP
 						      (trueop1, 0, 0)));
@@ -3562,9 +3589,6 @@ simplify_binary_operation_1 (enum rtx_co
 				    tmp_op, gen_rtx_PARALLEL (VOIDmode, vec));
 	      return tmp;
 	    }
-	  if (GET_CODE (trueop0) == VEC_DUPLICATE
-	      && GET_MODE (XEXP (trueop0, 0)) == mode)
-	    return XEXP (trueop0, 0);
 	}
       else
 	{
@@ -3573,6 +3597,11 @@ simplify_binary_operation_1 (enum rtx_co
 		      == GET_MODE_INNER (GET_MODE (trueop0)));
 	  gcc_assert (GET_CODE (trueop1) == PARALLEL);
 
+	  if (vec_duplicate_p (trueop0, &elt0))
+	    /* It doesn't matter which elements are selected by trueop1,
+	       because they are all the same.  */
+	    return gen_vec_duplicate (mode, elt0);
+
 	  if (GET_CODE (trueop0) == CONST_VECTOR)
 	    {
 	      int elt_size = GET_MODE_UNIT_SIZE (mode);
@@ -3873,6 +3902,32 @@ simplify_binary_operation_1 (enum rtx_co
       gcc_unreachable ();
     }
 
+  if (mode == GET_MODE (op0)
+      && mode == GET_MODE (op1)
+      && vec_duplicate_p (op0, &elt0)
+      && vec_duplicate_p (op1, &elt1))
+    {
+      /* Try applying the operator to ELT and see if that simplifies.
+	 We can duplicate the result if so.
+
+	 The reason we don't use simplify_gen_binary is that it isn't
+	 necessarily a win to convert things like:
+
+	   (plus:V (vec_duplicate:V (reg:S R1))
+		   (vec_duplicate:V (reg:S R2)))
+
+	 to:
+
+	   (vec_duplicate:V (plus:S (reg:S R1) (reg:S R2)))
+
+	 The first might be done entirely in vector registers while the
+	 second might need a move between register files.  */
+      tem = simplify_binary_operation (code, GET_MODE_INNER (mode),
+				       elt0, elt1);
+      if (tem)
+	return gen_vec_duplicate (mode, tem);
+    }
+
   return 0;
 }
 
@@ -6021,6 +6076,20 @@ simplify_subreg (machine_mode outermode,
   if (outermode == innermode && !byte)
     return op;
 
+  if (byte % GET_MODE_UNIT_SIZE (innermode) == 0)
+    {
+      rtx elt;
+
+      if (VECTOR_MODE_P (outermode)
+	  && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
+	  && vec_duplicate_p (op, &elt))
+	return gen_vec_duplicate (outermode, elt);
+
+      if (outermode == GET_MODE_INNER (innermode)
+	  && vec_duplicate_p (op, &elt))
+	return elt;
+    }
+
   if (CONST_SCALAR_INT_P (op)
       || CONST_DOUBLE_AS_FLOAT_P (op)
       || GET_CODE (op) == CONST_FIXED
@@ -6326,3 +6395,125 @@ simplify_rtx (const_rtx x)
     }
   return NULL;
 }
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Make a unique pseudo REG of mode MODE for use by selftests.  */
+
+static rtx
+make_test_reg (machine_mode mode)
+{
+  static int test_reg_num = LAST_VIRTUAL_REGISTER + 1;
+
+  return gen_rtx_REG (mode, test_reg_num++);
+}
+
+/* Test vector simplifications involving VEC_DUPLICATE in which the
+   operands and result have vector mode MODE.  SCALAR_REG is a pseudo
+   register that holds one element of MODE.  */
+
+static void
+test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
+{
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
+  unsigned int nunits = GET_MODE_NUNITS (mode);
+  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+    {
+      /* Test some simple unary cases with VEC_DUPLICATE arguments.  */
+      rtx not_scalar_reg = gen_rtx_NOT (inner_mode, scalar_reg);
+      rtx duplicate_not = gen_rtx_VEC_DUPLICATE (mode, not_scalar_reg);
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_unary_operation (NOT, mode,
+					       duplicate_not, mode));
+
+      rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
+      rtx duplicate_neg = gen_rtx_VEC_DUPLICATE (mode, neg_scalar_reg);
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_unary_operation (NEG, mode,
+					       duplicate_neg, mode));
+
+      /* Test some simple binary cases with VEC_DUPLICATE arguments.  */
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_binary_operation (PLUS, mode, duplicate,
+						CONST0_RTX (mode)));
+
+      ASSERT_RTX_EQ (duplicate,
+		     simplify_binary_operation (MINUS, mode, duplicate,
+						CONST0_RTX (mode)));
+
+      ASSERT_RTX_PTR_EQ (CONST0_RTX (mode),
+			 simplify_binary_operation (MINUS, mode, duplicate,
+						    duplicate));
+    }
+
+  /* Test a scalar VEC_SELECT of a VEC_DUPLICATE.  */
+  rtx zero_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const0_rtx));
+  ASSERT_RTX_PTR_EQ (scalar_reg,
+		     simplify_binary_operation (VEC_SELECT, inner_mode,
+						duplicate, zero_par));
+
+  /* And again with the final element.  */
+  rtx last_index = gen_int_mode (GET_MODE_NUNITS (mode) - 1, word_mode);
+  rtx last_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, last_index));
+  ASSERT_RTX_PTR_EQ (scalar_reg,
+		     simplify_binary_operation (VEC_SELECT, inner_mode,
+						duplicate, last_par));
+
+  /* Test a scalar subreg of a VEC_DUPLICATE.  */
+  unsigned int offset = subreg_lowpart_offset (inner_mode, mode);
+  ASSERT_RTX_EQ (scalar_reg,
+		 simplify_gen_subreg (inner_mode, duplicate,
+				      mode, offset));
+
+  machine_mode narrower_mode;
+  if (nunits > 2
+      && mode_for_vector (inner_mode, 2).exists (&narrower_mode)
+      && VECTOR_MODE_P (narrower_mode))
+    {
+      /* Test VEC_SELECT of a vector.  */
+      rtx vec_par
+	= gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, const1_rtx, const0_rtx));
+      rtx narrower_duplicate
+	= gen_rtx_VEC_DUPLICATE (narrower_mode, scalar_reg);
+      ASSERT_RTX_EQ (narrower_duplicate,
+		     simplify_binary_operation (VEC_SELECT, narrower_mode,
+						duplicate, vec_par));
+
+      /* Test a vector subreg of a VEC_DUPLICATE.  */
+      unsigned int offset = subreg_lowpart_offset (narrower_mode, mode);
+      ASSERT_RTX_EQ (narrower_duplicate,
+		     simplify_gen_subreg (narrower_mode, duplicate,
+					  mode, offset));
+    }
+}
+
+/* Verify some simplifications involving vectors.  */
+
+static void
+test_vector_ops ()
+{
+  for (unsigned int i = 0; i < NUM_MACHINE_MODES; ++i)
+    {
+      machine_mode mode = (machine_mode) i;
+      if (VECTOR_MODE_P (mode))
+	{
+	  rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
+	  test_vector_ops_duplicate (mode, scalar_reg);
+	}
+    }
+}
+
+/* Run all of the selftests within this file.  */
+
+void
+simplify_rtx_c_tests ()
+{
+  test_vector_ops ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [04/nn] Add a VEC_SERIES rtl code
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (2 preceding siblings ...)
  2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
@ 2017-10-23 11:20 ` Richard Sandiford
  2017-10-26 11:49   ` Richard Biener
  2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
                   ` (17 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:20 UTC (permalink / raw)
  To: gcc-patches

This patch adds an rtl representation of a vector linear series
of the form:

  a[I] = BASE + I * STEP

Like vec_duplicate;

- the new rtx can be used for both constant and non-constant vectors
- when used for constant vectors it is wrapped in a (const ...)
- the constant form is only used for variable-length vectors;
  fixed-length vectors still use CONST_VECTOR

At the moment the code is restricted to integer elements, to avoid
concerns over floating-point rounding.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/rtl.texi (vec_series): Document.
	(const): Say that the operand can be a vec_series.
	* rtl.def (VEC_SERIES): New rtx code.
	* rtl.h (const_vec_series_p_1): Declare.
	(const_vec_series_p): New function.
	* emit-rtl.h (gen_const_vec_series): Declare.
	(gen_vec_series): Likewise.
	* emit-rtl.c (const_vec_series_p_1, gen_const_vec_series)
	(gen_vec_series): Likewise.
	* optabs.c (expand_mult_highpart): Use gen_const_vec_series.
	* simplify-rtx.c (simplify_unary_operation): Handle negations
	of vector series.
	(simplify_binary_operation_series): New function.
	(simplify_binary_operation_1): Use it.  Handle VEC_SERIES.
	(test_vector_ops_series): New function.
	(test_vector_ops): Call it.
	* config/powerpcspe/altivec.md (altivec_lvsl): Use
	gen_const_vec_series.
	(altivec_lvsr): Likewise.
	* config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	2017-10-23 11:41:39.185050437 +0100
+++ gcc/doc/rtl.texi	2017-10-23 11:41:41.547050496 +0100
@@ -1677,7 +1677,8 @@ are target-specific and typically repres
 operator.  @var{m} should be a valid address mode.
 
 The second use of @code{const} is to wrap a vector operation.
-In this case @var{exp} must be a @code{vec_duplicate} expression.
+In this case @var{exp} must be a @code{vec_duplicate} or
+@code{vec_series} expression.
 
 @findex high
 @item (high:@var{m} @var{exp})
@@ -2722,6 +2723,10 @@ the same submodes as the input vector mo
 number of output parts must be an integer multiple of the number of input
 parts.
 
+@findex vec_series
+@item (vec_series:@var{m} @var{base} @var{step})
+This operation creates a vector in which element @var{i} is equal to
+@samp{@var{base} + @var{i}*@var{step}}.  @var{m} must be a vector integer mode.
 @end table
 
 @node Conversions
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def	2017-10-23 11:40:11.378243915 +0100
+++ gcc/rtl.def	2017-10-23 11:41:41.549050496 +0100
@@ -710,6 +710,11 @@ DEF_RTL_EXPR(VEC_CONCAT, "vec_concat", "
    an integer multiple of the number of input parts.  */
 DEF_RTL_EXPR(VEC_DUPLICATE, "vec_duplicate", "e", RTX_UNARY)
 
+/* Creation of a vector in which element I has the value BASE + I * STEP,
+   where BASE is the first operand and STEP is the second.  The result
+   must have a vector integer mode.  */
+DEF_RTL_EXPR(VEC_SERIES, "vec_series", "ee", RTX_BIN_ARITH)
+
 /* Addition with signed saturation */
 DEF_RTL_EXPR(SS_PLUS, "ss_plus", "ee", RTX_COMM_ARITH)
 
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 11:41:39.188050437 +0100
+++ gcc/rtl.h	2017-10-23 11:41:41.549050496 +0100
@@ -2816,6 +2816,51 @@ unwrap_const_vec_duplicate (T x)
   return x;
 }
 
+/* In emit-rtl.c.  */
+extern bool const_vec_series_p_1 (const_rtx, rtx *, rtx *);
+
+/* Return true if X is a constant vector that contains a linear series
+   of the form:
+
+   { B, B + S, B + 2 * S, B + 3 * S, ... }
+
+   for a nonzero S.  Store B and S in *BASE_OUT and *STEP_OUT on sucess.  */
+
+inline bool
+const_vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
+{
+  if (GET_CODE (x) == CONST_VECTOR
+      && GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+    return const_vec_series_p_1 (x, base_out, step_out);
+  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_SERIES)
+    {
+      *base_out = XEXP (XEXP (x, 0), 0);
+      *step_out = XEXP (XEXP (x, 0), 1);
+      return true;
+    }
+  return false;
+}
+
+/* Return true if X is a vector that contains a linear series of the
+   form:
+
+   { B, B + S, B + 2 * S, B + 3 * S, ... }
+
+   where B and S are constant or nonconstant.  Store B and S in
+   *BASE_OUT and *STEP_OUT on sucess.  */
+
+inline bool
+vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
+{
+  if (GET_CODE (x) == VEC_SERIES)
+    {
+      *base_out = XEXP (x, 0);
+      *step_out = XEXP (x, 1);
+      return true;
+    }
+  return const_vec_series_p (x, base_out, step_out);
+}
+
 /* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X.  */
 
 inline scalar_int_mode
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 11:41:32.369050264 +0100
+++ gcc/emit-rtl.h	2017-10-23 11:41:41.548050496 +0100
@@ -441,6 +441,9 @@ get_max_uid (void)
 extern rtx gen_const_vec_duplicate (machine_mode, rtx);
 extern rtx gen_vec_duplicate (machine_mode, rtx);
 
+extern rtx gen_const_vec_series (machine_mode, rtx, rtx);
+extern rtx gen_vec_series (machine_mode, rtx, rtx);
+
 extern void set_decl_incoming_rtl (tree, rtx, bool);
 
 /* Return a memory reference like MEMREF, but with its mode changed
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 11:41:39.186050437 +0100
+++ gcc/emit-rtl.c	2017-10-23 11:41:41.548050496 +0100
@@ -5796,6 +5796,69 @@ gen_vec_duplicate (machine_mode mode, rt
   return gen_rtx_VEC_DUPLICATE (mode, x);
 }
 
+/* A subroutine of const_vec_series_p that handles the case in which
+   X is known to be an integer CONST_VECTOR.  */
+
+bool
+const_vec_series_p_1 (const_rtx x, rtx *base_out, rtx *step_out)
+{
+  unsigned int nelts = CONST_VECTOR_NUNITS (x);
+  if (nelts < 2)
+    return false;
+
+  scalar_mode inner = GET_MODE_INNER (GET_MODE (x));
+  rtx base = CONST_VECTOR_ELT (x, 0);
+  rtx step = simplify_binary_operation (MINUS, inner,
+					CONST_VECTOR_ELT (x, 1), base);
+  if (rtx_equal_p (step, CONST0_RTX (inner)))
+    return false;
+
+  for (unsigned int i = 2; i < nelts; ++i)
+    {
+      rtx diff = simplify_binary_operation (MINUS, inner,
+					    CONST_VECTOR_ELT (x, i),
+					    CONST_VECTOR_ELT (x, i - 1));
+      if (!rtx_equal_p (step, diff))
+	return false;
+    }
+
+  *base_out = base;
+  *step_out = step;
+  return true;
+}
+
+/* Generate a vector constant of mode MODE in which element I has
+   the value BASE + I * STEP.  */
+
+rtx
+gen_const_vec_series (machine_mode mode, rtx base, rtx step)
+{
+  gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
+
+  int nunits = GET_MODE_NUNITS (mode);
+  rtvec v = rtvec_alloc (nunits);
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  RTVEC_ELT (v, 0) = base;
+  for (int i = 1; i < nunits; ++i)
+    RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
+					    RTVEC_ELT (v, i - 1), step);
+  return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
+/* Generate a vector of mode MODE in which element I has the value
+   BASE + I * STEP.  The result will be a constant if BASE and STEP
+   are both constants.  */
+
+rtx
+gen_vec_series (machine_mode mode, rtx base, rtx step)
+{
+  if (step == const0_rtx)
+    return gen_vec_duplicate (mode, base);
+  if (CONSTANT_P (base) && CONSTANT_P (step))
+    return gen_const_vec_series (mode, base, step);
+  return gen_rtx_VEC_SERIES (mode, base, step);
+}
+
 /* Generate a new vector constant for mode MODE and constant value
    CONSTANT.  */
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:41:32.369050264 +0100
+++ gcc/optabs.c	2017-10-23 11:41:41.549050496 +0100
@@ -5784,13 +5784,13 @@ expand_mult_highpart (machine_mode mode,
       for (i = 0; i < nunits; ++i)
 	RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
 				    + ((i & 1) ? nunits : 0));
+      perm = gen_rtx_CONST_VECTOR (mode, v);
     }
   else
     {
-      for (i = 0; i < nunits; ++i)
-	RTVEC_ELT (v, i) = GEN_INT (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
+      int base = BYTES_BIG_ENDIAN ? 0 : 1;
+      perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
     }
-  perm = gen_rtx_CONST_VECTOR (mode, v);
 
   return expand_vec_perm (mode, m1, m2, perm, target);
 }
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 11:41:36.309050364 +0100
+++ gcc/simplify-rtx.c	2017-10-23 11:41:41.550050496 +0100
@@ -927,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
 simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
 {
   enum rtx_code reversed;
-  rtx temp, elt;
+  rtx temp, elt, base, step;
   scalar_int_mode inner, int_mode, op_mode, op0_mode;
 
   switch (code)
@@ -1185,6 +1185,22 @@ simplify_unary_operation_1 (enum rtx_cod
 	      return simplify_gen_unary (TRUNCATE, int_mode, temp, inner);
 	    }
 	}
+
+      if (vec_series_p (op, &base, &step))
+	{
+	  /* Only create a new series if we can simplify both parts.  In other
+	     cases this isn't really a simplification, and it's not necessarily
+	     a win to replace a vector operation with a scalar operation.  */
+	  scalar_mode inner_mode = GET_MODE_INNER (mode);
+	  base = simplify_unary_operation (NEG, inner_mode, base, inner_mode);
+	  if (base)
+	    {
+	      step = simplify_unary_operation (NEG, inner_mode,
+					       step, inner_mode);
+	      if (step)
+		return gen_vec_series (mode, base, step);
+	    }
+	}
       break;
 
     case TRUNCATE:
@@ -2153,6 +2169,46 @@ simplify_binary_operation (enum rtx_code
   return NULL_RTX;
 }
 
+/* Subroutine of simplify_binary_operation_1 that looks for cases in
+   which OP0 and OP1 are both vector series or vector duplicates
+   (which are really just series with a step of 0).  If so, try to
+   form a new series by applying CODE to the bases and to the steps.
+   Return null if no simplification is possible.
+
+   MODE is the mode of the operation and is known to be a vector
+   integer mode.  */
+
+static rtx
+simplify_binary_operation_series (rtx_code code, machine_mode mode,
+				  rtx op0, rtx op1)
+{
+  rtx base0, step0;
+  if (vec_duplicate_p (op0, &base0))
+    step0 = const0_rtx;
+  else if (!vec_series_p (op0, &base0, &step0))
+    return NULL_RTX;
+
+  rtx base1, step1;
+  if (vec_duplicate_p (op1, &base1))
+    step1 = const0_rtx;
+  else if (!vec_series_p (op1, &base1, &step1))
+    return NULL_RTX;
+
+  /* Only create a new series if we can simplify both parts.  In other
+     cases this isn't really a simplification, and it's not necessarily
+     a win to replace a vector operation with a scalar operation.  */
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  rtx new_base = simplify_binary_operation (code, inner_mode, base0, base1);
+  if (!new_base)
+    return NULL_RTX;
+
+  rtx new_step = simplify_binary_operation (code, inner_mode, step0, step1);
+  if (!new_step)
+    return NULL_RTX;
+
+  return gen_vec_series (mode, new_base, new_step);
+}
+
 /* Subroutine of simplify_binary_operation.  Simplify a binary operation
    CODE with result mode MODE, operating on OP0 and OP1.  If OP0 and/or
    OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
@@ -2333,6 +2389,14 @@ simplify_binary_operation_1 (enum rtx_co
 	  if (tem)
 	    return tem;
 	}
+
+      /* Handle vector series.  */
+      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+	{
+	  tem = simplify_binary_operation_series (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
       break;
 
     case COMPARE:
@@ -2544,6 +2608,14 @@ simplify_binary_operation_1 (enum rtx_co
 	      || plus_minus_operand_p (op1))
 	  && (tem = simplify_plus_minus (code, mode, op0, op1)) != 0)
 	return tem;
+
+      /* Handle vector series.  */
+      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+	{
+	  tem = simplify_binary_operation_series (code, mode, op0, op1);
+	  if (tem)
+	    return tem;
+	}
       break;
 
     case MULT:
@@ -3495,6 +3567,11 @@ simplify_binary_operation_1 (enum rtx_co
       /* ??? There are simplifications that can be done.  */
       return 0;
 
+    case VEC_SERIES:
+      if (op1 == CONST0_RTX (GET_MODE_INNER (mode)))
+	return gen_vec_duplicate (mode, op0);
+      return 0;
+
     case VEC_SELECT:
       if (!VECTOR_MODE_P (mode))
 	{
@@ -6490,6 +6567,60 @@ test_vector_ops_duplicate (machine_mode
     }
 }
 
+/* Test vector simplifications involving VEC_SERIES in which the
+   operands and result have vector mode MODE.  SCALAR_REG is a pseudo
+   register that holds one element of MODE.  */
+
+static void
+test_vector_ops_series (machine_mode mode, rtx scalar_reg)
+{
+  /* Test unary cases with VEC_SERIES arguments.  */
+  scalar_mode inner_mode = GET_MODE_INNER (mode);
+  rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
+  rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
+  rtx series_0_r = gen_rtx_VEC_SERIES (mode, const0_rtx, scalar_reg);
+  rtx series_0_nr = gen_rtx_VEC_SERIES (mode, const0_rtx, neg_scalar_reg);
+  rtx series_nr_1 = gen_rtx_VEC_SERIES (mode, neg_scalar_reg, const1_rtx);
+  rtx series_r_m1 = gen_rtx_VEC_SERIES (mode, scalar_reg, constm1_rtx);
+  rtx series_r_r = gen_rtx_VEC_SERIES (mode, scalar_reg, scalar_reg);
+  rtx series_nr_nr = gen_rtx_VEC_SERIES (mode, neg_scalar_reg,
+					 neg_scalar_reg);
+  ASSERT_RTX_EQ (series_0_r,
+		 simplify_unary_operation (NEG, mode, series_0_nr, mode));
+  ASSERT_RTX_EQ (series_r_m1,
+		 simplify_unary_operation (NEG, mode, series_nr_1, mode));
+  ASSERT_RTX_EQ (series_r_r,
+		 simplify_unary_operation (NEG, mode, series_nr_nr, mode));
+
+  /* Test that a VEC_SERIES with a zero step is simplified away.  */
+  ASSERT_RTX_EQ (duplicate,
+		 simplify_binary_operation (VEC_SERIES, mode,
+					    scalar_reg, const0_rtx));
+
+  /* Test PLUS and MINUS with VEC_SERIES.  */
+  rtx series_0_1 = gen_const_vec_series (mode, const0_rtx, const1_rtx);
+  rtx series_0_m1 = gen_const_vec_series (mode, const0_rtx, constm1_rtx);
+  rtx series_r_1 = gen_rtx_VEC_SERIES (mode, scalar_reg, const1_rtx);
+  ASSERT_RTX_EQ (series_r_r,
+		 simplify_binary_operation (PLUS, mode, series_0_r,
+					    duplicate));
+  ASSERT_RTX_EQ (series_r_1,
+		 simplify_binary_operation (PLUS, mode, duplicate,
+					    series_0_1));
+  ASSERT_RTX_EQ (series_r_m1,
+		 simplify_binary_operation (PLUS, mode, duplicate,
+					    series_0_m1));
+  ASSERT_RTX_EQ (series_0_r,
+		 simplify_binary_operation (MINUS, mode, series_r_r,
+					    duplicate));
+  ASSERT_RTX_EQ (series_r_m1,
+		 simplify_binary_operation (MINUS, mode, duplicate,
+					    series_0_1));
+  ASSERT_RTX_EQ (series_r_1,
+		 simplify_binary_operation (MINUS, mode, duplicate,
+					    series_0_m1));
+}
+
 /* Verify some simplifications involving vectors.  */
 
 static void
@@ -6502,6 +6633,9 @@ test_vector_ops ()
 	{
 	  rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
 	  test_vector_ops_duplicate (mode, scalar_reg);
+	  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+	      && GET_MODE_NUNITS (mode) > 2)
+	    test_vector_ops_series (mode, scalar_reg);
 	}
     }
 }
Index: gcc/config/powerpcspe/altivec.md
===================================================================
--- gcc/config/powerpcspe/altivec.md	2017-10-23 11:41:32.366050264 +0100
+++ gcc/config/powerpcspe/altivec.md	2017-10-23 11:41:41.546050496 +0100
@@ -2456,13 +2456,10 @@ (define_expand "altivec_lvsl"
     emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
   else
     {
-      int i;
-      rtx mask, perm[16], constv, vperm;
+      rtx mask, constv, vperm;
       mask = gen_reg_rtx (V16QImode);
       emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
-      for (i = 0; i < 16; ++i)
-        perm[i] = GEN_INT (i);
-      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
       constv = force_reg (V16QImode, constv);
       vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
                               UNSPEC_VPERM);
@@ -2488,13 +2485,10 @@ (define_expand "altivec_lvsr"
     emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
   else
     {
-      int i;
-      rtx mask, perm[16], constv, vperm;
+      rtx mask, constv, vperm;
       mask = gen_reg_rtx (V16QImode);
       emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
-      for (i = 0; i < 16; ++i)
-        perm[i] = GEN_INT (i);
-      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
       constv = force_reg (V16QImode, constv);
       vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
                               UNSPEC_VPERM);
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md	2017-10-23 11:41:32.366050264 +0100
+++ gcc/config/rs6000/altivec.md	2017-10-23 11:41:41.547050496 +0100
@@ -2573,13 +2573,10 @@ (define_expand "altivec_lvsl"
     emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
   else
     {
-      int i;
-      rtx mask, perm[16], constv, vperm;
+      rtx mask, constv, vperm;
       mask = gen_reg_rtx (V16QImode);
       emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
-      for (i = 0; i < 16; ++i)
-        perm[i] = GEN_INT (i);
-      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
       constv = force_reg (V16QImode, constv);
       vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
                               UNSPEC_VPERM);
@@ -2614,13 +2611,10 @@ (define_expand "altivec_lvsr"
     emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
   else
     {
-      int i;
-      rtx mask, perm[16], constv, vperm;
+      rtx mask, constv, vperm;
       mask = gen_reg_rtx (V16QImode);
       emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
-      for (i = 0; i < 16; ++i)
-        perm[i] = GEN_INT (i);
-      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
       constv = force_reg (V16QImode, constv);
       vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
                               UNSPEC_VPERM);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (3 preceding siblings ...)
  2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
@ 2017-10-23 11:21 ` Richard Sandiford
  2017-10-26 11:53   ` Richard Biener
  2017-12-15  0:29   ` Richard Sandiford
  2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
                   ` (16 subsequent siblings)
  21 siblings, 2 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:21 UTC (permalink / raw)
  To: gcc-patches

SVE needs a way of broadcasting a scalar to a variable-length vector.
This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
be used for fixed-length vectors.  VEC_DUPLICATE_EXPR is the tree
equivalent of the existing rtl code VEC_DUPLICATE.

Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
to mark constant nodes, but in response to last year's RFC, Richard B.
suggested it would be better to have separate codes for the constant
and non-constant cases.  This allows VEC_DUPLICATE_EXPR to be treated
as a normal unary operation and avoids the previous need for treating
it as a GIMPLE_SINGLE_RHS.

It might make sense to use VEC_DUPLICATE_CST for all duplicated
vector constants, since it's a bit more compact than VECTOR_CST
in that case, and is potentially more efficient to process.
However, the nice thing about keeping it restricted to variable-length
vectors is that there is then no need to handle combinations of
VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
VECTOR_CST or never use it.

The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hawyard@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
	(VEC_COND_EXPR): Add missing @tindex.
	* doc/md.texi (vec_duplicate@var{m}): Document.
	* tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
	* tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
	are used for VEC_DUPLICATE_CST as well.
	(tree_vector): Access base.n.nelts directly.
	* tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
	valid codes.
	(VEC_DUPLICATE_CST_ELT): New macro.
	(build_vec_duplicate_cst): Declare.
	* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
	(integer_zerop, integer_onep, integer_all_onesp, integer_truep)
	(real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
	(walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
	(build_vec_duplicate_cst): New function.
	(uniform_vector_p): Handle the new codes.
	(test_vec_duplicate_predicates_int): New function.
	(test_vec_duplicate_predicates_float): Likewise.
	(test_vec_duplicate_predicates): Likewise.
	(tree_c_tests): Call test_vec_duplicate_predicates.
	* cfgexpand.c (expand_debug_expr): Handle the new codes.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
	* gimple-expr.h (is_gimple_constant): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* graphite-isl-ast-to-gimple.c
	(translate_isl_ast_to_gimple::is_constant): Likewise.
	* graphite-scop-detection.c (scan_tree_for_params): Likewise.
	* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
	(func_checker::compare_operand): Likewise.
	* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
	* match.pd (negate_expr_p): Likewise.
	* print-tree.c (print_node): Likewise.
	* tree-chkp.c (chkp_find_bounds_1): Likewise.
	* tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
	* tree-ssa-loop.c (for_each_index): Likewise.
	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
	(ao_ref_init_from_vn_reference): Likewise.
	* tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
	* varasm.c (const_hash_1, compare_constant): Likewise.
	* fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
	(fold_convert_const, operand_equal_p, fold_view_convert_expr)
	(exact_inverse, fold_checksum_tree): Likewise.
	(const_unop): Likewise.  Fold VEC_DUPLICATE_EXPRs of a constant.
	(test_vec_duplicate_folding): New function.
	(fold_const_c_tests): Call it.
	* optabs.def (vec_duplicate_optab): New optab.
	* optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
	* optabs.h (expand_vector_broadcast): Declare.
	* optabs.c (expand_vector_broadcast): Make non-static.  Try using
	vec_duplicate_optab.
	* expr.c (store_constructor): Try using vec_duplicate_optab for
	uniform vectors.
	(const_vector_element): New function, split out from...
	(const_vector_from_tree): ...here.
	(expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
	(expand_expr_real_1): Handle VEC_DUPLICATE_CST.
	* internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
	instead of checking for VECTOR_CST.
	* tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
	(verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
	* tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-10-23 11:38:53.934094740 +0100
+++ gcc/doc/generic.texi	2017-10-23 11:41:51.760448406 +0100
@@ -1036,6 +1036,7 @@ As this example indicates, the operands
 @tindex FIXED_CST
 @tindex COMPLEX_CST
 @tindex VECTOR_CST
+@tindex VEC_DUPLICATE_CST
 @tindex STRING_CST
 @findex TREE_STRING_LENGTH
 @findex TREE_STRING_POINTER
@@ -1089,6 +1090,14 @@ constant nodes.  Each individual constan
 double constant node.  The first operand is a @code{TREE_LIST} of the
 constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
 
+@item VEC_DUPLICATE_CST
+These nodes represent a vector constant in which every element has the
+same scalar value.  At present only variable-length vectors use
+@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
+instead.  The scalar element value is given by
+@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
+element of a @code{VECTOR_CST}.
+
 @item STRING_CST
 These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
 returns the length of the string, as an @code{int}.  The
@@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
 
 @node Vectors
 @subsection Vectors
+@tindex VEC_DUPLICATE_EXPR
 @tindex VEC_LSHIFT_EXPR
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_COND_EXPR
 @tindex SAD_EXPR
 
 @table @code
+@item VEC_DUPLICATE_EXPR
+This node has a single operand and represents a vector in which every
+element is equal to that operand.
+
 @item VEC_LSHIFT_EXPR
 @itemx VEC_RSHIFT_EXPR
 These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-10-23 11:41:22.189466342 +0100
+++ gcc/doc/md.texi	2017-10-23 11:41:51.761413027 +0100
@@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
 the vector mode @var{m}, or a vector mode with the same element mode and
 smaller number of elements.
 
+@cindex @code{vec_duplicate@var{m}} instruction pattern
+@item @samp{vec_duplicate@var{m}}
+Initialize vector output operand 0 so that each element has the value given
+by scalar input operand 1.  The vector has mode @var{m} and the scalar has
+the mode appropriate for one element of @var{m}.
+
+This pattern only handles duplicates of non-constant inputs.  Constant
+vectors go through the @code{mov@var{m}} pattern instead.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
 Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree.def	2017-10-23 11:41:51.774917721 +0100
@@ -304,6 +304,10 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
 /* Contents are in VECTOR_CST_ELTS field.  */
 DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
 
+/* Represents a vector constant in which every element is equal to
+   VEC_DUPLICATE_CST_ELT.  */
+DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
+
 /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
 DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
 
@@ -534,6 +538,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
    1 and 2 are NULL.  The operands are then taken from the cfg edges. */
 DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
 
+/* Represents a vector in which every element is equal to operand 0.  */
+DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 
Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h	2017-10-23 11:41:25.862065318 +0100
+++ gcc/tree-core.h	2017-10-23 11:41:51.771059237 +0100
@@ -975,7 +975,8 @@ struct GTY(()) tree_base {
     /* VEC length.  This field is only used with TREE_VEC.  */
     int length;
 
-    /* Number of elements.  This field is only used with VECTOR_CST.  */
+    /* Number of elements.  This field is only used with VECTOR_CST
+       and VEC_DUPLICATE_CST.  It is always 1 for VEC_DUPLICATE_CST.  */
     unsigned int nelts;
 
     /* SSA version number.  This field is only used with SSA_NAME.  */
@@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
    public_flag:
 
        TREE_OVERFLOW in
-           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
+           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
 
        TREE_PUBLIC in
            VAR_DECL, FUNCTION_DECL
@@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
 
 struct GTY(()) tree_vector {
   struct tree_typed typed;
-  tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
+  tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
 };
 
 struct GTY(()) tree_identifier {
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 11:41:23.517482774 +0100
+++ gcc/tree.h	2017-10-23 11:41:51.775882341 +0100
@@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
 #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
   (PTR_OR_REF_CHECK (NODE)->base.static_flag)
 
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
-   there was an overflow in folding.  */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
+   this means there was an overflow in folding.  */
 
 #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
 
@@ -1030,6 +1030,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
 #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
 #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
 
+/* In a VEC_DUPLICATE_CST node.  */
+#define VEC_DUPLICATE_CST_ELT(NODE) \
+  (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
+
 /* Define fields and accessors for some special-purpose tree nodes.  */
 
 #define IDENTIFIER_LENGTH(NODE) \
@@ -4025,6 +4029,7 @@ extern tree build_int_cst (tree, HOST_WI
 extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
 extern tree build_int_cst_type (tree, HOST_WIDE_INT);
 extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
+extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
 extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 11:41:23.515548300 +0100
+++ gcc/tree.c	2017-10-23 11:41:51.774917721 +0100
@@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
     case FIXED_CST:		return TS_FIXED_CST;
     case COMPLEX_CST:		return TS_COMPLEX;
     case VECTOR_CST:		return TS_VECTOR;
+    case VEC_DUPLICATE_CST:	return TS_VECTOR;
     case STRING_CST:		return TS_STRING;
       /* tcc_exceptional cases.  */
     case ERROR_MARK:		return TS_COMMON;
@@ -816,6 +817,7 @@ tree_code_size (enum tree_code code)
 	case FIXED_CST:		return sizeof (struct tree_fixed_cst);
 	case COMPLEX_CST:	return sizeof (struct tree_complex);
 	case VECTOR_CST:	return sizeof (struct tree_vector);
+	case VEC_DUPLICATE_CST:	return sizeof (struct tree_vector);
 	case STRING_CST:	gcc_unreachable ();
 	default:
 	  return lang_hooks.tree_size (code);
@@ -875,6 +877,9 @@ tree_size (const_tree node)
       return (sizeof (struct tree_vector)
 	      + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
 
+    case VEC_DUPLICATE_CST:
+      return sizeof (struct tree_vector);
+
     case STRING_CST:
       return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
 
@@ -1682,6 +1687,30 @@ cst_and_fits_in_hwi (const_tree x)
 	  && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
 }
 
+/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
+
+   Note that this function is only suitable for callers that specifically
+   need a VEC_DUPLICATE_CST node.  Use build_vector_from_val to duplicate
+   a general scalar into a general vector type.  */
+
+tree
+build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
+{
+  int length = sizeof (struct tree_vector);
+
+  record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
+
+  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+  TREE_SET_CODE (t, VEC_DUPLICATE_CST);
+  TREE_TYPE (t) = type;
+  t->base.u.nelts = 1;
+  VEC_DUPLICATE_CST_ELT (t) = exp;
+  TREE_CONSTANT (t) = 1;
+
+  return t;
+}
+
 /* Build a newly constructed VECTOR_CST node of length LEN.  */
 
 tree
@@ -2343,6 +2372,8 @@ integer_zerop (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2369,6 +2400,8 @@ integer_onep (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2407,6 +2440,9 @@ integer_all_onesp (const_tree expr)
       return 1;
     }
 
+  else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
+    return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
+
   else if (TREE_CODE (expr) != INTEGER_CST)
     return 0;
 
@@ -2463,7 +2499,7 @@ integer_nonzerop (const_tree expr)
 int
 integer_truep (const_tree expr)
 {
-  if (TREE_CODE (expr) == VECTOR_CST)
+  if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
     return integer_all_onesp (expr);
   return integer_onep (expr);
 }
@@ -2634,6 +2670,8 @@ real_zerop (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2662,6 +2700,8 @@ real_onep (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return real_onep (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2689,6 +2729,8 @@ real_minus_onep (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -7091,6 +7133,9 @@ add_expr (const_tree t, inchash::hash &h
 	  inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
 	return;
       }
+    case VEC_DUPLICATE_CST:
+      inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
+      return;
     case SSA_NAME:
       /* We can just compare by pointer.  */
       hstate.add_wide_int (SSA_NAME_VERSION (t));
@@ -10345,6 +10390,9 @@ initializer_zerop (const_tree init)
 	return true;
       }
 
+    case VEC_DUPLICATE_CST:
+      return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
+
     case CONSTRUCTOR:
       {
 	unsigned HOST_WIDE_INT idx;
@@ -10390,7 +10438,13 @@ uniform_vector_p (const_tree vec)
 
   gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
 
-  if (TREE_CODE (vec) == VECTOR_CST)
+  if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
+    return VEC_DUPLICATE_CST_ELT (vec);
+
+  else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
+    return TREE_OPERAND (vec, 0);
+
+  else if (TREE_CODE (vec) == VECTOR_CST)
     {
       first = VECTOR_CST_ELT (vec, 0);
       for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
@@ -11095,6 +11149,7 @@ #define WALK_SUBTREE_TAIL(NODE)				\
     case REAL_CST:
     case FIXED_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
     case BLOCK:
     case PLACEHOLDER_EXPR:
@@ -12381,6 +12436,12 @@ drop_tree_overflow (tree t)
 	    elt = drop_tree_overflow (elt);
 	}
     }
+  if (TREE_CODE (t) == VEC_DUPLICATE_CST)
+    {
+      tree *elt = &VEC_DUPLICATE_CST_ELT (t);
+      if (TREE_OVERFLOW (*elt))
+	*elt = drop_tree_overflow (*elt);
+    }
   return t;
 }
 
@@ -13798,6 +13859,92 @@ test_integer_constants ()
   ASSERT_EQ (type, TREE_TYPE (zero));
 }
 
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
+   for integral type TYPE.  */
+
+static void
+test_vec_duplicate_predicates_int (tree type)
+{
+  tree vec_type = build_vector_type (type, 4);
+
+  tree zero = build_zero_cst (type);
+  tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
+  ASSERT_TRUE (integer_zerop (vec_zero));
+  ASSERT_FALSE (integer_onep (vec_zero));
+  ASSERT_FALSE (integer_minus_onep (vec_zero));
+  ASSERT_FALSE (integer_all_onesp (vec_zero));
+  ASSERT_FALSE (integer_truep (vec_zero));
+  ASSERT_TRUE (initializer_zerop (vec_zero));
+
+  tree one = build_one_cst (type);
+  tree vec_one = build_vec_duplicate_cst (vec_type, one);
+  ASSERT_FALSE (integer_zerop (vec_one));
+  ASSERT_TRUE (integer_onep (vec_one));
+  ASSERT_FALSE (integer_minus_onep (vec_one));
+  ASSERT_FALSE (integer_all_onesp (vec_one));
+  ASSERT_FALSE (integer_truep (vec_one));
+  ASSERT_FALSE (initializer_zerop (vec_one));
+
+  tree minus_one = build_minus_one_cst (type);
+  tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
+  ASSERT_FALSE (integer_zerop (vec_minus_one));
+  ASSERT_FALSE (integer_onep (vec_minus_one));
+  ASSERT_TRUE (integer_minus_onep (vec_minus_one));
+  ASSERT_TRUE (integer_all_onesp (vec_minus_one));
+  ASSERT_TRUE (integer_truep (vec_minus_one));
+  ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+  tree x = create_tmp_var_raw (type, "x");
+  tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
+  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+  ASSERT_EQ (uniform_vector_p (vec_one), one);
+  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+  ASSERT_EQ (uniform_vector_p (vec_x), x);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
+   type TYPE.  */
+
+static void
+test_vec_duplicate_predicates_float (tree type)
+{
+  tree vec_type = build_vector_type (type, 4);
+
+  tree zero = build_zero_cst (type);
+  tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
+  ASSERT_TRUE (real_zerop (vec_zero));
+  ASSERT_FALSE (real_onep (vec_zero));
+  ASSERT_FALSE (real_minus_onep (vec_zero));
+  ASSERT_TRUE (initializer_zerop (vec_zero));
+
+  tree one = build_one_cst (type);
+  tree vec_one = build_vec_duplicate_cst (vec_type, one);
+  ASSERT_FALSE (real_zerop (vec_one));
+  ASSERT_TRUE (real_onep (vec_one));
+  ASSERT_FALSE (real_minus_onep (vec_one));
+  ASSERT_FALSE (initializer_zerop (vec_one));
+
+  tree minus_one = build_minus_one_cst (type);
+  tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
+  ASSERT_FALSE (real_zerop (vec_minus_one));
+  ASSERT_FALSE (real_onep (vec_minus_one));
+  ASSERT_TRUE (real_minus_onep (vec_minus_one));
+  ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+  ASSERT_EQ (uniform_vector_p (vec_one), one);
+  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
+
+static void
+test_vec_duplicate_predicates ()
+{
+  test_vec_duplicate_predicates_int (integer_type_node);
+  test_vec_duplicate_predicates_float (float_type_node);
+}
+
 /* Verify identifiers.  */
 
 static void
@@ -13826,6 +13973,7 @@ test_labels ()
 tree_c_tests ()
 {
   test_integer_constants ();
+  test_vec_duplicate_predicates ();
   test_identifiers ();
   test_labels ();
 }
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 11:41:23.137358624 +0100
+++ gcc/cfgexpand.c	2017-10-23 11:41:51.760448406 +0100
@@ -5049,6 +5049,8 @@ expand_debug_expr (tree exp)
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_PERM_EXPR:
+    case VEC_DUPLICATE_CST:
+    case VEC_DUPLICATE_EXPR:
       return NULL;
 
     /* Misc codes.  */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-pretty-print.c	2017-10-23 11:41:51.772023858 +0100
@@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
       }
       break;
 
+    case VEC_DUPLICATE_CST:
+      pp_string (pp, "{ ");
+      dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
+      pp_string (pp, ", ... }");
+      break;
+
     case FUNCTION_TYPE:
     case METHOD_TYPE:
       dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
+    case VEC_DUPLICATE_EXPR:
+      pp_space (pp);
+      for (str = get_tree_code_name (code); *str; str++)
+	pp_character (pp, TOUPPER (*str));
+      pp_string (pp, " < ");
+      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (pp, " > ");
+      break;
+
     case VEC_UNPACK_HI_EXPR:
       pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
       dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 11:41:24.407340836 +0100
+++ gcc/dwarf2out.c	2017-10-23 11:41:51.763342269 +0100
@@ -18862,6 +18862,7 @@ rtl_for_decl_init (tree init, tree type)
 	switch (TREE_CODE (init))
 	  {
 	  case VECTOR_CST:
+	  case VEC_DUPLICATE_CST:
 	    break;
 	  case CONSTRUCTOR:
 	    if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h	2017-10-23 11:38:53.934094740 +0100
+++ gcc/gimple-expr.h	2017-10-23 11:41:51.765271511 +0100
@@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
     case FIXED_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
       return true;
 
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2017-10-23 11:41:25.531270256 +0100
+++ gcc/gimplify.c	2017-10-23 11:41:51.766236132 +0100
@@ -11506,6 +11506,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	case STRING_CST:
 	case COMPLEX_CST:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	  /* Drop the overflow flag on constants, we do not want
 	     that in the GIMPLE IL.  */
 	  if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-isl-ast-to-gimple.c
===================================================================
--- gcc/graphite-isl-ast-to-gimple.c	2017-10-23 11:41:23.205065216 +0100
+++ gcc/graphite-isl-ast-to-gimple.c	2017-10-23 11:41:51.767200753 +0100
@@ -222,7 +222,8 @@ enum phi_node_kind
     return TREE_CODE (op) == INTEGER_CST
       || TREE_CODE (op) == REAL_CST
       || TREE_CODE (op) == COMPLEX_CST
-      || TREE_CODE (op) == VECTOR_CST;
+      || TREE_CODE (op) == VECTOR_CST
+      || TREE_CODE (op) == VEC_DUPLICATE_CST;
   }
 
 private:
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c	2017-10-23 11:41:25.533204730 +0100
+++ gcc/graphite-scop-detection.c	2017-10-23 11:41:51.767200753 +0100
@@ -1243,6 +1243,7 @@ scan_tree_for_params (sese_info_p s, tre
     case REAL_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
       break;
 
    default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/ipa-icf-gimple.c	2017-10-23 11:41:51.767200753 +0100
@@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
     case REAL_CST:
       {
@@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
     case REAL_CST:
     case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c	2017-10-23 11:41:25.874639400 +0100
+++ gcc/ipa-icf.c	2017-10-23 11:41:51.768165374 +0100
@@ -1478,6 +1478,7 @@ sem_item::add_expr (const_tree exp, inch
     case STRING_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
       inchash::add_expr (exp, hstate);
       break;
     case CONSTRUCTOR:
@@ -2030,6 +2031,9 @@ sem_variable::equals (tree t1, tree t2)
 
 	return 1;
       }
+    case VEC_DUPLICATE_CST:
+      return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
+				   VEC_DUPLICATE_CST_ELT (t2));
     case ARRAY_REF:
     case ARRAY_RANGE_REF:
       {
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-10-23 11:38:53.934094740 +0100
+++ gcc/match.pd	2017-10-23 11:41:51.768165374 +0100
@@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match negate_expr_p
  VECTOR_CST
  (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
+(match negate_expr_p
+ VEC_DUPLICATE_CST
+ (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
 
 /* (-A) * (-B) -> A * B  */
 (simplify
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/print-tree.c	2017-10-23 11:41:51.769129995 +0100
@@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
 	  }
 	  break;
 
+	case VEC_DUPLICATE_CST:
+	  print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
+	  break;
+
 	case COMPLEX_CST:
 	  print_node (file, "real", TREE_REALPART (node), indent + 4);
 	  print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-chkp.c
===================================================================
--- gcc/tree-chkp.c	2017-10-23 11:41:23.201196268 +0100
+++ gcc/tree-chkp.c	2017-10-23 11:41:51.770094616 +0100
@@ -3800,6 +3800,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
       if (integer_zerop (ptr_src))
 	bounds = chkp_get_none_bounds ();
       else
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	2017-10-23 11:41:23.228278904 +0100
+++ gcc/tree-loop-distribution.c	2017-10-23 11:41:51.771059237 +0100
@@ -921,6 +921,9 @@ const_with_all_bytes_same (tree val)
           && CONSTRUCTOR_NELTS (val) == 0))
     return 0;
 
+  if (TREE_CODE (val) == VEC_DUPLICATE_CST)
+    return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
+
   if (real_zerop (val))
     {
       /* Only return 0 for +0.0, not for -0.0, which doesn't have
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-ssa-loop.c	2017-10-23 11:41:51.772023858 +0100
@@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
 	case STRING_CST:
 	case RESULT_DECL:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	case COMPLEX_CST:
 	case INTEGER_CST:
 	case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c	2017-10-23 11:41:25.549647760 +0100
+++ gcc/tree-ssa-pre.c	2017-10-23 11:41:51.772023858 +0100
@@ -2675,6 +2675,7 @@ create_component_ref_by_pieces_1 (basic_
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case REAL_CST:
     case CONSTRUCTOR:
     case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 11:41:51.773953100 +0100
@@ -858,6 +858,7 @@ copy_reference_ops_from_ref (tree ref, v
 	case INTEGER_CST:
 	case COMPLEX_CST:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	case REAL_CST:
 	case FIXED_CST:
 	case CONSTRUCTOR:
@@ -1050,6 +1051,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	case INTEGER_CST:
 	case COMPLEX_CST:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	case REAL_CST:
 	case CONSTRUCTOR:
 	case CONST_DECL:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-vect-generic.c	2017-10-23 11:41:51.773953100 +0100
@@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
 ssa_uniform_vector_p (tree op)
 {
   if (TREE_CODE (op) == VECTOR_CST
+      || TREE_CODE (op) == VEC_DUPLICATE_CST
       || TREE_CODE (op) == CONSTRUCTOR)
     return uniform_vector_p (op);
   if (TREE_CODE (op) == SSA_NAME)
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-10-23 11:41:25.822408600 +0100
+++ gcc/varasm.c	2017-10-23 11:41:51.775882341 +0100
@@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
     CASE_CONVERT:
       return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
 
+    case VEC_DUPLICATE_CST:
+      return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
+
     default:
       /* A language specific constant. Just hash the code.  */
       return code;
@@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
 	return 1;
       }
 
+    case VEC_DUPLICATE_CST:
+      return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
+			       VEC_DUPLICATE_CST_ELT (t2));
+
     case CONSTRUCTOR:
       {
 	vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 11:41:23.535860278 +0100
+++ gcc/fold-const.c	2017-10-23 11:41:51.765271511 +0100
@@ -418,6 +418,9 @@ negate_expr_p (tree t)
 	return true;
       }
 
+    case VEC_DUPLICATE_CST:
+      return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
+
     case COMPLEX_EXPR:
       return negate_expr_p (TREE_OPERAND (t, 0))
 	     && negate_expr_p (TREE_OPERAND (t, 1));
@@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
 	return build_vector (type, elts);
       }
 
+    case VEC_DUPLICATE_CST:
+      {
+	tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
+	if (!sub)
+	  return NULL_TREE;
+	return build_vector_from_val (type, sub);
+      }
+
     case COMPLEX_EXPR:
       if (negate_expr_p (t))
 	return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
       return build_vector (type, elts);
     }
 
+  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+      && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
+    {
+      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
+			      VEC_DUPLICATE_CST_ELT (arg2));
+      if (!sub)
+	return NULL_TREE;
+      return build_vector_from_val (TREE_TYPE (arg1), sub);
+    }
+
   /* Shifts allow a scalar offset for a vector.  */
   if (TREE_CODE (arg1) == VECTOR_CST
       && TREE_CODE (arg2) == INTEGER_CST)
@@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
 
       return build_vector (type, elts);
     }
+
+  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+      && TREE_CODE (arg2) == INTEGER_CST)
+    {
+      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
+      if (!sub)
+	return NULL_TREE;
+      return build_vector_from_val (TREE_TYPE (arg1), sub);
+    }
   return NULL_TREE;
 }
 
@@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
 	  if (i == count)
 	    return build_vector (type, elements);
 	}
+      else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
+	{
+	  tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
+				 VEC_DUPLICATE_CST_ELT (arg0));
+	  if (sub)
+	    return build_vector_from_val (type, sub);
+	}
       break;
 
     case TRUTH_NOT_EXPR:
@@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
 	return res;
       }
 
+    case VEC_DUPLICATE_EXPR:
+      if (CONSTANT_CLASS_P (arg0))
+	return build_vector_from_val (type, arg0);
+      return NULL_TREE;
+
     default:
       break;
     }
@@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
 	    }
 	  return build_vector (type, v);
 	}
+      if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+	  && (TYPE_VECTOR_SUBPARTS (type)
+	      == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
+	{
+	  tree sub = fold_convert_const (code, TREE_TYPE (type),
+					 VEC_DUPLICATE_CST_ELT (arg1));
+	  if (sub)
+	    return build_vector_from_val (type, sub);
+	}
     }
   return NULL_TREE;
 }
@@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
 	  return 1;
 	}
 
+      case VEC_DUPLICATE_CST:
+	return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
+				VEC_DUPLICATE_CST_ELT (arg1), flags);
+
       case COMPLEX_CST:
 	return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
 				 flags)
@@ -7492,6 +7547,20 @@ can_native_interpret_type_p (tree type)
 static tree
 fold_view_convert_expr (tree type, tree expr)
 {
+  /* Recurse on duplicated vectors if the target type is also a vector
+     and if the elements line up.  */
+  tree expr_type = TREE_TYPE (expr);
+  if (TREE_CODE (expr) == VEC_DUPLICATE_CST
+      && VECTOR_TYPE_P (type)
+      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
+      && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
+    {
+      tree sub = fold_view_convert_expr (TREE_TYPE (type),
+					 VEC_DUPLICATE_CST_ELT (expr));
+      if (sub)
+	return build_vector_from_val (type, sub);
+    }
+
   /* We support up to 512-bit values (for V8DFmode).  */
   unsigned char buffer[64];
   int len;
@@ -8891,6 +8960,15 @@ exact_inverse (tree type, tree cst)
 	return build_vector (type, elts);
       }
 
+    case VEC_DUPLICATE_CST:
+      {
+	tree sub = exact_inverse (TREE_TYPE (type),
+				  VEC_DUPLICATE_CST_ELT (cst));
+	if (!sub)
+	  return NULL_TREE;
+	return build_vector_from_val (type, sub);
+      }
+
     default:
       return NULL_TREE;
     }
@@ -11969,6 +12047,9 @@ fold_checksum_tree (const_tree expr, str
 	  for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
 	    fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
 	  break;
+	case VEC_DUPLICATE_CST:
+	  fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
+	  break;
 	default:
 	  break;
 	}
@@ -14436,6 +14517,36 @@ test_vector_folding ()
   ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
 }
 
+/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
+
+static void
+test_vec_duplicate_folding ()
+{
+  tree type = build_vector_type (ssizetype, 4);
+  tree dup5 = build_vec_duplicate_cst (type, ssize_int (5));
+  tree dup3 = build_vec_duplicate_cst (type, ssize_int (3));
+
+  tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
+  ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
+
+  tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
+  ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
+
+  tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
+  ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
+
+  tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
+  ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
+
+  tree size_vector = build_vector_type (sizetype, 4);
+  tree size_dup5 = fold_convert (size_vector, dup5);
+  ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
+
+  tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
+  tree dup5_cst = build_vector_from_val (type, ssize_int (5));
+  ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -14443,6 +14554,7 @@ fold_const_c_tests ()
 {
   test_arithmetic_folding ();
   test_vector_folding ();
+  test_vec_duplicate_folding ();
 }
 
 } // namespace selftest
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-10-23 11:38:53.934094740 +0100
+++ gcc/optabs.def	2017-10-23 11:41:51.769129995 +0100
@@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
 
 OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
 OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
+
+OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-10-23 11:38:53.934094740 +0100
+++ gcc/optabs-tree.c	2017-10-23 11:41:51.768165374 +0100
@@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
       return TYPE_UNSIGNED (type) ?
 	vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
 
+    case VEC_DUPLICATE_EXPR:
+      return vec_duplicate_optab;
+
     default:
       break;
     }
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-10-23 11:38:53.934094740 +0100
+++ gcc/optabs.h	2017-10-23 11:41:51.769129995 +0100
@@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
 				  enum optab_methods methods);
 extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
 				enum optab_methods);
+extern rtx expand_vector_broadcast (machine_mode, rtx);
 
 /* Generate code for a simple binary or unary operation.  "Simple" in
    this case means "can be unambiguously described by a (mode, code)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:41:41.549050496 +0100
+++ gcc/optabs.c	2017-10-23 11:41:51.769129995 +0100
@@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
    mode of OP must be the element mode of VMODE.  If OP is a constant,
    then the return value will be a constant.  */
 
-static rtx
+rtx
 expand_vector_broadcast (machine_mode vmode, rtx op)
 {
   enum insn_code icode;
@@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
   if (CONSTANT_P (op))
     return gen_const_vec_duplicate (vmode, op);
 
+  icode = optab_handler (vec_duplicate_optab, vmode);
+  if (icode != CODE_FOR_nothing)
+    {
+      struct expand_operand ops[2];
+      create_output_operand (&ops[0], NULL_RTX, vmode);
+      create_input_operand (&ops[1], op, GET_MODE (op));
+      expand_insn (icode, 2, ops);
+      return ops[0].value;
+    }
+
   /* ??? If the target doesn't have a vec_init, then we have no easy way
      of performing this operation.  Most of this sort of generic support
      is hidden away in the vector lowering support in gimple.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 11:41:39.187050437 +0100
+++ gcc/expr.c	2017-10-23 11:41:51.764306890 +0100
@@ -6572,7 +6572,8 @@ store_constructor (tree exp, rtx target,
 	constructor_elt *ce;
 	int i;
 	int need_to_clear;
-	int icode = CODE_FOR_nothing;
+	insn_code icode = CODE_FOR_nothing;
+	tree elt;
 	tree elttype = TREE_TYPE (type);
 	int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
 	machine_mode eltmode = TYPE_MODE (elttype);
@@ -6582,13 +6583,30 @@ store_constructor (tree exp, rtx target,
 	unsigned n_elts;
 	alias_set_type alias;
 	bool vec_vec_init_p = false;
+	machine_mode mode = GET_MODE (target);
 
 	gcc_assert (eltmode != BLKmode);
 
+	/* Try using vec_duplicate_optab for uniform vectors.  */
+	if (!TREE_SIDE_EFFECTS (exp)
+	    && VECTOR_MODE_P (mode)
+	    && eltmode == GET_MODE_INNER (mode)
+	    && ((icode = optab_handler (vec_duplicate_optab, mode))
+		!= CODE_FOR_nothing)
+	    && (elt = uniform_vector_p (exp)))
+	  {
+	    struct expand_operand ops[2];
+	    create_output_operand (&ops[0], target, mode);
+	    create_input_operand (&ops[1], expand_normal (elt), eltmode);
+	    expand_insn (icode, 2, ops);
+	    if (!rtx_equal_p (target, ops[0].value))
+	      emit_move_insn (target, ops[0].value);
+	    break;
+	  }
+
 	n_elts = TYPE_VECTOR_SUBPARTS (type);
-	if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
+	if (REG_P (target) && VECTOR_MODE_P (mode))
 	  {
-	    machine_mode mode = GET_MODE (target);
 	    machine_mode emode = eltmode;
 
 	    if (CONSTRUCTOR_NELTS (exp)
@@ -6600,7 +6618,7 @@ store_constructor (tree exp, rtx target,
 			    == n_elts);
 		emode = TYPE_MODE (etype);
 	      }
-	    icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
+	    icode = convert_optab_handler (vec_init_optab, mode, emode);
 	    if (icode != CODE_FOR_nothing)
 	      {
 		unsigned int i, n = n_elts;
@@ -6648,7 +6666,7 @@ store_constructor (tree exp, rtx target,
 	if (need_to_clear && size > 0 && !vector)
 	  {
 	    if (REG_P (target))
-	      emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+	      emit_move_insn (target, CONST0_RTX (mode));
 	    else
 	      clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
 	    cleared = 1;
@@ -6656,7 +6674,7 @@ store_constructor (tree exp, rtx target,
 
 	/* Inform later passes that the old value is dead.  */
 	if (!cleared && !vector && REG_P (target))
-	  emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+	  emit_move_insn (target, CONST0_RTX (mode));
 
         if (MEM_P (target))
 	  alias = MEM_ALIAS_SET (target);
@@ -6707,8 +6725,7 @@ store_constructor (tree exp, rtx target,
 
 	if (vector)
 	  emit_insn (GEN_FCN (icode) (target,
-				      gen_rtx_PARALLEL (GET_MODE (target),
-							vector)));
+				      gen_rtx_PARALLEL (mode, vector)));
 	break;
       }
 
@@ -7686,6 +7703,19 @@ expand_operands (tree exp0, tree exp1, r
 }
 
 \f
+/* Expand constant vector element ELT, which has mode MODE.  This is used
+   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
+
+static rtx
+const_vector_element (scalar_mode mode, const_tree elt)
+{
+  if (TREE_CODE (elt) == REAL_CST)
+    return const_double_from_real_value (TREE_REAL_CST (elt), mode);
+  if (TREE_CODE (elt) == FIXED_CST)
+    return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
+  return immed_wide_int_const (wi::to_wide (elt), mode);
+}
+
 /* Return a MEM that contains constant EXP.  DEFER is as for
    output_constant_def and MODIFIER is as for expand_expr.  */
 
@@ -9551,6 +9581,12 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
 
+    case VEC_DUPLICATE_EXPR:
+      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
+      target = expand_vector_broadcast (mode, op0);
+      gcc_assert (target);
+      return target;
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);
@@ -10003,6 +10039,11 @@ expand_expr_real_1 (tree exp, rtx target
 			    tmode, modifier);
       }
 
+    case VEC_DUPLICATE_CST:
+      op0 = const_vector_element (GET_MODE_INNER (mode),
+				  VEC_DUPLICATE_CST_ELT (exp));
+      return gen_const_vec_duplicate (mode, op0);
+
     case CONST_DECL:
       if (modifier == EXPAND_WRITE)
 	{
@@ -11764,8 +11805,7 @@ const_vector_from_tree (tree exp)
 {
   rtvec v;
   unsigned i, units;
-  tree elt;
-  machine_mode inner, mode;
+  machine_mode mode;
 
   mode = TYPE_MODE (TREE_TYPE (exp));
 
@@ -11776,23 +11816,12 @@ const_vector_from_tree (tree exp)
     return const_vector_mask_from_tree (exp);
 
   units = VECTOR_CST_NELTS (exp);
-  inner = GET_MODE_INNER (mode);
 
   v = rtvec_alloc (units);
 
   for (i = 0; i < units; ++i)
-    {
-      elt = VECTOR_CST_ELT (exp, i);
-
-      if (TREE_CODE (elt) == REAL_CST)
-	RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
-							 inner);
-      else if (TREE_CODE (elt) == FIXED_CST)
-	RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
-							 inner);
-      else
-	RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
-    }
+    RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
+					     VECTOR_CST_ELT (exp, i));
 
   return gen_rtx_CONST_VECTOR (mode, v);
 }
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2017-10-23 11:41:23.529089619 +0100
+++ gcc/internal-fn.c	2017-10-23 11:41:51.767200753 +0100
@@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
       emit_move_insn (cntvar, const0_rtx);
       emit_label (loop_lab);
     }
-  if (TREE_CODE (arg0) != VECTOR_CST)
+  if (!CONSTANT_CLASS_P (arg0))
     {
       rtx arg0r = expand_normal (arg0);
       arg0 = make_tree (TREE_TYPE (arg0), arg0r);
     }
-  if (TREE_CODE (arg1) != VECTOR_CST)
+  if (!CONSTANT_CLASS_P (arg1))
     {
       rtx arg1r = expand_normal (arg1);
       arg1 = make_tree (TREE_TYPE (arg1), arg1r);
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-10-23 11:41:25.864967029 +0100
+++ gcc/tree-cfg.c	2017-10-23 11:41:51.770094616 +0100
@@ -3803,6 +3803,17 @@ verify_gimple_assign_unary (gassign *stm
     case CONJ_EXPR:
       break;
 
+    case VEC_DUPLICATE_EXPR:
+      if (TREE_CODE (lhs_type) != VECTOR_TYPE
+	  || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+	{
+	  error ("vec_duplicate should be from a scalar to a like vector");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  return true;
+	}
+      return false;
+
     default:
       gcc_unreachable ();
     }
@@ -4473,6 +4484,7 @@ verify_gimple_assign_single (gassign *st
     case FIXED_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
       return res;
 
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-10-23 11:41:25.833048208 +0100
+++ gcc/tree-inline.c	2017-10-23 11:41:51.771059237 +0100
@@ -4002,6 +4002,7 @@ estimate_operator_cost (enum tree_code c
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_DUPLICATE_EXPR:
 
       return 1;
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (5 preceding siblings ...)
  2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
@ 2017-10-23 11:22 ` Richard Sandiford
  2017-10-26 12:26   ` Richard Biener
  2017-12-15  0:34   ` Richard Sandiford
  2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
                   ` (14 subsequent siblings)
  21 siblings, 2 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:22 UTC (permalink / raw)
  To: gcc-patches

Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
is a tcc_constant.

Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
vectors.  This avoids the need to handle combinations of VECTOR_CST
and VEC_SERIES_CST.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
	* doc/md.texi (vec_series@var{m}): Document.
	* tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
	* tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
	codes.
	(VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
	(build_vec_series_cst, build_vec_series): Declare.
	* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
	(add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
	(build_vec_series_cst, build_vec_series): New functions.
	* cfgexpand.c (expand_debug_expr): Handle the new codes.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
	* gimple-expr.h (is_gimple_constant): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* graphite-scop-detection.c (scan_tree_for_params): Likewise.
	* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
	(func_checker::compare_operand): Likewise.
	* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
	* print-tree.c (print_node): Likewise.
	* tree-ssa-loop.c (for_each_index): Likewise.
	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
	(ao_ref_init_from_vn_reference): Likewise.
	* varasm.c (const_hash_1, compare_constant): Likewise.
	* fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
	(fold_checksum_tree): Likewise.
	(vec_series_equivalent_p): New function.
	(const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
	* expmed.c (make_tree): Handle VEC_SERIES.
	* gimple-pretty-print.c (dump_binary_rhs): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
	(expand_expr_real_2): Handle VEC_SERIES_EXPR.
	(expand_expr_real_1): Handle VEC_SERIES_CST.
	* optabs.def (vec_series_optab): New optab.
	* optabs.h (expand_vec_series_expr): Declare.
	* optabs.c (expand_vec_series_expr): New function.
	* optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
	* tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
	(verify_gimple_assign_single): Handle VEC_SERIES_CST.
	* tree-vect-generic.c (expand_vector_operations_1): Check that
	the operands also have vector type.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-10-23 11:41:51.760448406 +0100
+++ gcc/doc/generic.texi	2017-10-23 11:42:34.910720660 +0100
@@ -1037,6 +1037,7 @@ As this example indicates, the operands
 @tindex COMPLEX_CST
 @tindex VECTOR_CST
 @tindex VEC_DUPLICATE_CST
+@tindex VEC_SERIES_CST
 @tindex STRING_CST
 @findex TREE_STRING_LENGTH
 @findex TREE_STRING_POINTER
@@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
 @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
 element of a @code{VECTOR_CST}.
 
+@item VEC_SERIES_CST
+These nodes represent a vector constant in which element @var{i}
+has the value @samp{@var{base} + @var{i} * @var{step}}, for some
+constant @var{base} and @var{step}.  The value of @var{base} is
+given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
+given by @code{VEC_SERIES_CST_STEP}.
+
+These nodes are restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
 @item STRING_CST
 These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
 returns the length of the string, as an @code{int}.  The
@@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
 @node Vectors
 @subsection Vectors
 @tindex VEC_DUPLICATE_EXPR
+@tindex VEC_SERIES_EXPR
 @tindex VEC_LSHIFT_EXPR
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1721,6 +1733,14 @@ a value from @code{enum annot_expr_kind}
 This node has a single operand and represents a vector in which every
 element is equal to that operand.
 
+@item VEC_SERIES_EXPR
+This node represents a vector formed from a scalar base and step,
+given as the first and second operands respectively.  Element @var{i}
+of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
+
+This node is restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
 @item VEC_LSHIFT_EXPR
 @itemx VEC_RSHIFT_EXPR
 These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-10-23 11:41:51.761413027 +0100
+++ gcc/doc/md.texi	2017-10-23 11:42:34.911720660 +0100
@@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{vec_series@var{m}} instruction pattern
+@item @samp{vec_series@var{m}}
+Initialize vector output operand 0 so that element @var{i} is equal to
+operand 1 plus @var{i} times operand 2.  In other words, create a linear
+series whose base value is operand 1 and whose step is operand 2.
+
+The vector output has mode @var{m} and the scalar inputs have the mode
+appropriate for one element of @var{m}.  This pattern is not used for
+floating-point vectors, in order to avoid having to specify the
+rounding behavior for @var{i} > 1.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
 Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-10-23 11:41:51.774917721 +0100
+++ gcc/tree.def	2017-10-23 11:42:34.924720660 +0100
@@ -308,6 +308,10 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
    VEC_DUPLICATE_CST_ELT.  */
 DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
 
+/* Represents a vector constant in which element i is equal to
+   VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP.  */
+DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
+
 /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
 DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
 
@@ -541,6 +545,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 /* Represents a vector in which every element is equal to operand 0.  */
 DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
 
+/* Vector series created from a start (base) value and a step.
+
+   A = VEC_SERIES_EXPR (B, C)
+
+   means
+
+   for (i = 0; i < N; i++)
+     A[i] = B + C * i;  */
+DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-10-23 11:41:51.775882341 +0100
+++ gcc/tree.h	2017-10-23 11:42:34.925720660 +0100
@@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
 #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
   (PTR_OR_REF_CHECK (NODE)->base.static_flag)
 
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
-   this means there was an overflow in folding.  */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
+   or VEC_SERES_CST, this means there was an overflow in folding.  */
 
 #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
 
@@ -1034,6 +1034,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
 #define VEC_DUPLICATE_CST_ELT(NODE) \
   (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
 
+/* In a VEC_SERIES_CST node.  */
+#define VEC_SERIES_CST_BASE(NODE) \
+  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
+#define VEC_SERIES_CST_STEP(NODE) \
+  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
+
 /* Define fields and accessors for some special-purpose tree nodes.  */
 
 #define IDENTIFIER_LENGTH(NODE) \
@@ -4030,9 +4036,11 @@ extern tree build_int_cstu (tree type, u
 extern tree build_int_cst_type (tree, HOST_WIDE_INT);
 extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
 extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
+extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
 extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
+extern tree build_vec_series (tree, tree, tree);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-10-23 11:41:51.774917721 +0100
+++ gcc/tree.c	2017-10-23 11:42:34.924720660 +0100
@@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
     case COMPLEX_CST:		return TS_COMPLEX;
     case VECTOR_CST:		return TS_VECTOR;
     case VEC_DUPLICATE_CST:	return TS_VECTOR;
+    case VEC_SERIES_CST:	return TS_VECTOR;
     case STRING_CST:		return TS_STRING;
       /* tcc_exceptional cases.  */
     case ERROR_MARK:		return TS_COMMON;
@@ -818,6 +819,8 @@ tree_code_size (enum tree_code code)
 	case COMPLEX_CST:	return sizeof (struct tree_complex);
 	case VECTOR_CST:	return sizeof (struct tree_vector);
 	case VEC_DUPLICATE_CST:	return sizeof (struct tree_vector);
+	case VEC_SERIES_CST:
+	  return sizeof (struct tree_vector) + sizeof (tree);
 	case STRING_CST:	gcc_unreachable ();
 	default:
 	  return lang_hooks.tree_size (code);
@@ -880,6 +883,9 @@ tree_size (const_tree node)
     case VEC_DUPLICATE_CST:
       return sizeof (struct tree_vector);
 
+    case VEC_SERIES_CST:
+      return sizeof (struct tree_vector) + sizeof (tree);
+
     case STRING_CST:
       return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
 
@@ -1711,6 +1717,31 @@ build_vec_duplicate_cst (tree type, tree
   return t;
 }
 
+/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
+
+   Note that this function is only suitable for callers that specifically
+   need a VEC_SERIES_CST node.  Use build_vec_series to build a general
+   series vector from a general base and step.  */
+
+tree
+build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
+{
+  int length = sizeof (struct tree_vector) + sizeof (tree);
+
+  record_node_allocation_statistics (VEC_SERIES_CST, length);
+
+  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+  TREE_SET_CODE (t, VEC_SERIES_CST);
+  TREE_TYPE (t) = type;
+  t->base.u.nelts = 2;
+  VEC_SERIES_CST_BASE (t) = base;
+  VEC_SERIES_CST_STEP (t) = step;
+  TREE_CONSTANT (t) = 1;
+
+  return t;
+}
+
 /* Build a newly constructed VECTOR_CST node of length LEN.  */
 
 tree
@@ -1821,6 +1852,19 @@ build_vector_from_val (tree vectype, tre
     }
 }
 
+/* Build a vector series of type TYPE in which element I has the value
+   BASE + I * STEP.  */
+
+tree
+build_vec_series (tree type, tree base, tree step)
+{
+  if (integer_zerop (step))
+    return build_vector_from_val (type, base);
+  if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
+    return build_vec_series_cst (type, base, step);
+  return build2 (VEC_SERIES_EXPR, type, base, step);
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
    calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
@@ -7136,6 +7180,10 @@ add_expr (const_tree t, inchash::hash &h
     case VEC_DUPLICATE_CST:
       inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
       return;
+    case VEC_SERIES_CST:
+      inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
+      inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
+      return;
     case SSA_NAME:
       /* We can just compare by pointer.  */
       hstate.add_wide_int (SSA_NAME_VERSION (t));
@@ -11150,6 +11198,7 @@ #define WALK_SUBTREE_TAIL(NODE)				\
     case FIXED_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
     case BLOCK:
     case PLACEHOLDER_EXPR:
@@ -12442,6 +12491,15 @@ drop_tree_overflow (tree t)
       if (TREE_OVERFLOW (*elt))
 	*elt = drop_tree_overflow (*elt);
     }
+  if (TREE_CODE (t) == VEC_SERIES_CST)
+    {
+      tree *elt = &VEC_SERIES_CST_BASE (t);
+      if (TREE_OVERFLOW (*elt))
+	*elt = drop_tree_overflow (*elt);
+      elt = &VEC_SERIES_CST_STEP (t);
+      if (TREE_OVERFLOW (*elt))
+	*elt = drop_tree_overflow (*elt);
+    }
   return t;
 }
 
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-10-23 11:41:51.760448406 +0100
+++ gcc/cfgexpand.c	2017-10-23 11:42:34.909720660 +0100
@@ -5051,6 +5051,8 @@ expand_debug_expr (tree exp)
     case VEC_PERM_EXPR:
     case VEC_DUPLICATE_CST:
     case VEC_DUPLICATE_EXPR:
+    case VEC_SERIES_CST:
+    case VEC_SERIES_EXPR:
       return NULL;
 
     /* Misc codes.  */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-10-23 11:41:51.772023858 +0100
+++ gcc/tree-pretty-print.c	2017-10-23 11:42:34.921720660 +0100
@@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, ", ... }");
       break;
 
+    case VEC_SERIES_CST:
+      pp_string (pp, "{ ");
+      dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
+      pp_string (pp, ", +, ");
+      dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
+      pp_string (pp, "}");
+      break;
+
     case FUNCTION_TYPE:
     case METHOD_TYPE:
       dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
+    case VEC_SERIES_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 11:41:51.763342269 +0100
+++ gcc/dwarf2out.c	2017-10-23 11:42:34.913720660 +0100
@@ -18863,6 +18863,7 @@ rtl_for_decl_init (tree init, tree type)
 	  {
 	  case VECTOR_CST:
 	  case VEC_DUPLICATE_CST:
+	  case VEC_SERIES_CST:
 	    break;
 	  case CONSTRUCTOR:
 	    if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h	2017-10-23 11:41:51.765271511 +0100
+++ gcc/gimple-expr.h	2017-10-23 11:42:34.916720660 +0100
@@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
       return true;
 
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2017-10-23 11:41:51.766236132 +0100
+++ gcc/gimplify.c	2017-10-23 11:42:34.917720660 +0100
@@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	case COMPLEX_CST:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	  /* Drop the overflow flag on constants, we do not want
 	     that in the GIMPLE IL.  */
 	  if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c	2017-10-23 11:41:51.767200753 +0100
+++ gcc/graphite-scop-detection.c	2017-10-23 11:42:34.917720660 +0100
@@ -1244,6 +1244,7 @@ scan_tree_for_params (sese_info_p s, tre
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
       break;
 
    default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c	2017-10-23 11:41:51.767200753 +0100
+++ gcc/ipa-icf-gimple.c	2017-10-23 11:42:34.917720660 +0100
@@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
     case REAL_CST:
       {
@@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
     case REAL_CST:
     case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c	2017-10-23 11:41:51.768165374 +0100
+++ gcc/ipa-icf.c	2017-10-23 11:42:34.918720660 +0100
@@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
       inchash::add_expr (exp, hstate);
       break;
     case CONSTRUCTOR:
@@ -2034,6 +2035,11 @@ sem_variable::equals (tree t1, tree t2)
     case VEC_DUPLICATE_CST:
       return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
 				   VEC_DUPLICATE_CST_ELT (t2));
+     case VEC_SERIES_CST:
+       return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
+				     VEC_SERIES_CST_BASE (t2))
+	       && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
+					VEC_SERIES_CST_STEP (t2)));
     case ARRAY_REF:
     case ARRAY_RANGE_REF:
       {
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	2017-10-23 11:41:51.769129995 +0100
+++ gcc/print-tree.c	2017-10-23 11:42:34.919720660 +0100
@@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
 	  print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
 	  break;
 
+	case VEC_SERIES_CST:
+	  print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
+	  print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
+	  break;
+
 	case COMPLEX_CST:
 	  print_node (file, "real", TREE_REALPART (node), indent + 4);
 	  print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	2017-10-23 11:41:51.772023858 +0100
+++ gcc/tree-ssa-loop.c	2017-10-23 11:42:34.921720660 +0100
@@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
 	case RESULT_DECL:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	case COMPLEX_CST:
 	case INTEGER_CST:
 	case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c	2017-10-23 11:41:51.772023858 +0100
+++ gcc/tree-ssa-pre.c	2017-10-23 11:42:34.922720660 +0100
@@ -2676,6 +2676,7 @@ create_component_ref_by_pieces_1 (basic_
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case REAL_CST:
     case CONSTRUCTOR:
     case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 11:41:51.773953100 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 11:42:34.922720660 +0100
@@ -859,6 +859,7 @@ copy_reference_ops_from_ref (tree ref, v
 	case COMPLEX_CST:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	case REAL_CST:
 	case FIXED_CST:
 	case CONSTRUCTOR:
@@ -1052,6 +1053,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	case COMPLEX_CST:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	case REAL_CST:
 	case CONSTRUCTOR:
 	case CONST_DECL:
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-10-23 11:41:51.775882341 +0100
+++ gcc/varasm.c	2017-10-23 11:42:34.927720660 +0100
@@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
       return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
 	      + const_hash_1 (TREE_OPERAND (exp, 1)));
 
+    case VEC_SERIES_CST:
+      return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
+	      + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
+
     CASE_CONVERT:
       return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
 
@@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
       return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
 			       VEC_DUPLICATE_CST_ELT (t2));
 
+    case VEC_SERIES_CST:
+      return (compare_constant (VEC_SERIES_CST_BASE (t1),
+				VEC_SERIES_CST_BASE (t2))
+	      && compare_constant (VEC_SERIES_CST_STEP (t1),
+				   VEC_SERIES_CST_STEP (t2)));
+
     case CONSTRUCTOR:
       {
 	vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-10-23 11:41:51.765271511 +0100
+++ gcc/fold-const.c	2017-10-23 11:42:34.916720660 +0100
@@ -421,6 +421,10 @@ negate_expr_p (tree t)
     case VEC_DUPLICATE_CST:
       return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
 
+    case VEC_SERIES_CST:
+      return (negate_expr_p (VEC_SERIES_CST_BASE (t))
+	      && negate_expr_p (VEC_SERIES_CST_STEP (t)));
+
     case COMPLEX_EXPR:
       return negate_expr_p (TREE_OPERAND (t, 0))
 	     && negate_expr_p (TREE_OPERAND (t, 1));
@@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
 	return build_vector_from_val (type, sub);
       }
 
+    case VEC_SERIES_CST:
+      {
+	tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
+	if (!neg_base)
+	  return NULL_TREE;
+	tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
+	if (!neg_step)
+	  return NULL_TREE;
+	return build_vec_series (type, neg_base, neg_step);
+      }
+
     case COMPLEX_EXPR:
       if (negate_expr_p (t))
 	return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
   return int_const_binop_1 (code, arg1, arg2, 1);
 }
 
+/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
+   and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
+   The step will be zero for VEC_DUPLICATE_CST.  */
+
+static bool
+vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
+{
+  if (TREE_CODE (exp) == VEC_SERIES_CST)
+    {
+      *base_out = VEC_SERIES_CST_BASE (exp);
+      *step_out = VEC_SERIES_CST_STEP (exp);
+      return true;
+    }
+  if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
+    {
+      *base_out = VEC_DUPLICATE_CST_ELT (exp);
+      *step_out = build_zero_cst (TREE_TYPE (*base_out));
+      return true;
+    }
+  return false;
+}
+
 /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
    constant.  We assume ARG1 and ARG2 have the same data type, or at least
    are the same kind of constant and the same machine mode.  Return zero if
@@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
       return build_vector_from_val (TREE_TYPE (arg1), sub);
     }
 
+  tree base1, step1, base2, step2;
+  if ((code == PLUS_EXPR || code == MINUS_EXPR)
+      && vec_series_equivalent_p (arg1, &base1, &step1)
+      && vec_series_equivalent_p (arg2, &base2, &step2))
+    {
+      tree new_base = const_binop (code, base1, base2);
+      if (!new_base)
+	return NULL_TREE;
+      tree new_step = const_binop (code, step1, step2);
+      if (!new_step)
+	return NULL_TREE;
+      return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
+    }
+
   /* Shifts allow a scalar offset for a vector.  */
   if (TREE_CODE (arg1) == VECTOR_CST
       && TREE_CODE (arg2) == INTEGER_CST)
@@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
      result as argument put those cases that need it here.  */
   switch (code)
     {
+    case VEC_SERIES_EXPR:
+      if (CONSTANT_CLASS_P (arg1)
+	  && CONSTANT_CLASS_P (arg2))
+	return build_vec_series (type, arg1, arg2);
+      return NULL_TREE;
+
     case COMPLEX_EXPR:
       if ((TREE_CODE (arg1) == REAL_CST
 	   && TREE_CODE (arg2) == REAL_CST)
@@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
 	return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
 				VEC_DUPLICATE_CST_ELT (arg1), flags);
 
+      case VEC_SERIES_CST:
+	return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
+				 VEC_SERIES_CST_BASE (arg1), flags)
+		&& operand_equal_p (VEC_SERIES_CST_STEP (arg0),
+				    VEC_SERIES_CST_STEP (arg1), flags));
+
       case COMPLEX_CST:
 	return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
 				 flags)
@@ -12050,6 +12113,10 @@ fold_checksum_tree (const_tree expr, str
 	case VEC_DUPLICATE_CST:
 	  fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
 	  break;
+	case VEC_SERIES_CST:
+	  fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
+	  fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
+	  break;
 	default:
 	  break;
 	}
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 11:41:39.186050437 +0100
+++ gcc/expmed.c	2017-10-23 11:42:34.914720660 +0100
@@ -5253,6 +5253,13 @@ make_tree (tree type, rtx x)
 	    tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
 	    return build_vector_from_val (type, elt_tree);
 	  }
+	if (GET_CODE (op) == VEC_SERIES)
+	  {
+	    tree itype = TREE_TYPE (type);
+	    tree base_tree = make_tree (itype, XEXP (op, 0));
+	    tree step_tree = make_tree (itype, XEXP (op, 1));
+	    return build_vec_series (type, base_tree, step_tree);
+	  }
 	return make_tree (type, op);
       }
 
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	2017-10-23 11:41:25.500318672 +0100
+++ gcc/gimple-pretty-print.c	2017-10-23 11:42:34.916720660 +0100
@@ -438,6 +438,7 @@ dump_binary_rhs (pretty_printer *buffer,
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_SERIES_EXPR:
       for (p = get_tree_code_name (code); *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-10-23 11:41:51.771059237 +0100
+++ gcc/tree-inline.c	2017-10-23 11:42:34.921720660 +0100
@@ -4003,6 +4003,7 @@ estimate_operator_cost (enum tree_code c
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_DUPLICATE_EXPR:
+    case VEC_SERIES_EXPR:
 
       return 1;
 
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 11:41:51.764306890 +0100
+++ gcc/expr.c	2017-10-23 11:42:34.915720660 +0100
@@ -7704,7 +7704,7 @@ expand_operands (tree exp0, tree exp1, r
 
 \f
 /* Expand constant vector element ELT, which has mode MODE.  This is used
-   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
+   for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST.  */
 
 static rtx
 const_vector_element (scalar_mode mode, const_tree elt)
@@ -9587,6 +9587,10 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       gcc_assert (target);
       return target;
 
+    case VEC_SERIES_EXPR:
+      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
+      return expand_vec_series_expr (mode, op0, op1, target);
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);
@@ -10044,6 +10048,13 @@ expand_expr_real_1 (tree exp, rtx target
 				  VEC_DUPLICATE_CST_ELT (exp));
       return gen_const_vec_duplicate (mode, op0);
 
+    case VEC_SERIES_CST:
+      op0 = const_vector_element (GET_MODE_INNER (mode),
+				  VEC_SERIES_CST_BASE (exp));
+      op1 = const_vector_element (GET_MODE_INNER (mode),
+				  VEC_SERIES_CST_STEP (exp));
+      return gen_const_vec_series (mode, op0, op1);
+
     case CONST_DECL:
       if (modifier == EXPAND_WRITE)
 	{
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-10-23 11:41:51.769129995 +0100
+++ gcc/optabs.def	2017-10-23 11:42:34.919720660 +0100
@@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
 OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
 
 OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
+OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-10-23 11:41:51.769129995 +0100
+++ gcc/optabs.h	2017-10-23 11:42:34.919720660 +0100
@@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
 
+/* Generate code for VEC_SERIES_EXPR.  */
+extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
+
 /* Generate code for MULT_HIGHPART_EXPR.  */
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:41:51.769129995 +0100
+++ gcc/optabs.c	2017-10-23 11:42:34.919720660 +0100
@@ -5693,6 +5693,27 @@ expand_vec_cond_expr (tree vec_cond_type
   return ops[0].value;
 }
 
+/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
+   Use TARGET for the result if nonnull and convenient.  */
+
+rtx
+expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
+{
+  struct expand_operand ops[3];
+  enum insn_code icode;
+  machine_mode emode = GET_MODE_INNER (vmode);
+
+  icode = direct_optab_handler (vec_series_optab, vmode);
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  create_output_operand (&ops[0], target, vmode);
+  create_input_operand (&ops[1], op0, emode);
+  create_input_operand (&ops[2], op1, emode);
+
+  expand_insn (icode, 3, ops);
+  return ops[0].value;
+}
+
 /* Generate insns for a vector comparison into a mask.  */
 
 rtx
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-10-23 11:41:51.768165374 +0100
+++ gcc/optabs-tree.c	2017-10-23 11:42:34.918720660 +0100
@@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
     case VEC_DUPLICATE_EXPR:
       return vec_duplicate_optab;
 
+    case VEC_SERIES_EXPR:
+      return vec_series_optab;
+
     default:
       break;
     }
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-10-23 11:41:51.770094616 +0100
+++ gcc/tree-cfg.c	2017-10-23 11:42:34.920720660 +0100
@@ -4119,6 +4119,23 @@ verify_gimple_assign_binary (gassign *st
       /* Continue with generic binary expression handling.  */
       break;
 
+    case VEC_SERIES_EXPR:
+      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
+	{
+	  error ("type mismatch in series expression");
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  return true;
+	}
+      if (TREE_CODE (lhs_type) != VECTOR_TYPE
+	  || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+	{
+	  error ("vector type expected in series expression");
+	  debug_generic_expr (lhs_type);
+	  return true;
+	}
+      return false;
+
     default:
       gcc_unreachable ();
     }
@@ -4485,6 +4502,7 @@ verify_gimple_assign_single (gassign *st
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
       return res;
 
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-10-23 11:41:51.773953100 +0100
+++ gcc/tree-vect-generic.c	2017-10-23 11:42:34.922720660 +0100
@@ -1595,7 +1595,8 @@ expand_vector_operations_1 (gimple_stmt_
   if (rhs_class == GIMPLE_BINARY_RHS)
     rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (type) != VECTOR_TYPE)
+  if (!VECTOR_TYPE_P (type)
+      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
     return;
 
   /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [07/nn] Add unique CONSTs
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (4 preceding siblings ...)
  2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
@ 2017-10-23 11:22 ` Richard Sandiford
  2017-10-27 15:51   ` Jeff Law
  2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab Richard Sandiford
                   ` (15 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:22 UTC (permalink / raw)
  To: gcc-patches

This patch adds a way of treating certain kinds of CONST as unique,
so that pointer equality is equivalent to value equality.  For now it
is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
generate them remains in the else arm of an "if (1)" until a later
patch.

This is needed so that (const (vec_duplicate xx)) can used as the
CONSTxx_RTX of a variable-length vector.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (unique_const_p): New function.
	(gen_rtx_CONST): Declare.
	* emit-rtl.c (const_hasher): New struct.
	(const_htab): New variable.
	(init_emit_once): Initialize it.
	(const_hasher::hash, const_hasher::equal): New functions.
	(gen_rtx_CONST): New function.
	(spare_vec_duplicate, spare_vec_series): New variables.
	(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
	but disable it for now.
	(gen_const_vec_series): Likewise (const (vec_series)).
	* gengenrtl.c (special_rtx): Return true for CONST.
	* rtl.c (shared_const_p): Return true if unique_const_p.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 11:41:41.549050496 +0100
+++ gcc/rtl.h	2017-10-23 11:42:47.297720974 +0100
@@ -2861,6 +2861,23 @@ vec_series_p (const_rtx x, rtx *base_out
   return const_vec_series_p (x, base_out, step_out);
 }
 
+/* Return true if there should only ever be one instance of (const X),
+   so that constants of this type can be compared using pointer equality.  */
+
+inline bool
+unique_const_p (const_rtx x)
+{
+  switch (GET_CODE (x))
+    {
+    case VEC_DUPLICATE:
+    case VEC_SERIES:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
 /* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X.  */
 
 inline scalar_int_mode
@@ -3542,6 +3559,7 @@ extern rtx_insn_list *gen_rtx_INSN_LIST
 gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
 	      basic_block bb, rtx pattern, int location, int code,
 	      rtx reg_notes);
+extern rtx gen_rtx_CONST (machine_mode, rtx);
 extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
 extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
 extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 11:41:41.548050496 +0100
+++ gcc/emit-rtl.c	2017-10-23 11:42:47.296720974 +0100
@@ -175,6 +175,15 @@ struct const_fixed_hasher : ggc_cache_pt
 
 static GTY ((cache)) hash_table<const_fixed_hasher> *const_fixed_htab;
 
+/* A hash table storing unique CONSTs.  */
+struct const_hasher : ggc_cache_ptr_hash<rtx_def>
+{
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table<const_hasher> *const_htab;
+
 #define cur_insn_uid (crtl->emit.x_cur_insn_uid)
 #define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
 #define first_label_num (crtl->emit.x_first_label_num)
@@ -310,6 +319,28 @@ const_fixed_hasher::equal (rtx x, rtx y)
   return fixed_identical (CONST_FIXED_VALUE (a), CONST_FIXED_VALUE (b));
 }
 
+/* Returns a hash code for X (which is either an existing unique CONST
+   or an operand to gen_rtx_CONST).  */
+
+hashval_t
+const_hasher::hash (rtx x)
+{
+  if (GET_CODE (x) == CONST)
+    x = XEXP (x, 0);
+
+  int do_not_record_p = 0;
+  return hash_rtx (x, GET_MODE (x), &do_not_record_p, NULL, false);
+}
+
+/* Returns true if the operand of unique CONST X is equal to Y.  */
+
+bool
+const_hasher::equal (rtx x, rtx y)
+{
+  gcc_checking_assert (GET_CODE (x) == CONST);
+  return rtx_equal_p (XEXP (x, 0), y);
+}
+
 /* Return true if the given memory attributes are equal.  */
 
 bool
@@ -5756,16 +5787,55 @@ init_emit (void)
 #endif
 }
 
+rtx
+gen_rtx_CONST (machine_mode mode, rtx val)
+{
+  if (unique_const_p (val))
+    {
+      /* Look up the CONST in the hash table.  */
+      rtx *slot = const_htab->find_slot (val, INSERT);
+      if (*slot == 0)
+	*slot = gen_rtx_raw_CONST (mode, val);
+      return *slot;
+    }
+
+  return gen_rtx_raw_CONST (mode, val);
+}
+
+/* Temporary rtx used by gen_const_vec_duplicate_1.  */
+static GTY((deletable)) rtx spare_vec_duplicate;
+
 /* Like gen_const_vec_duplicate, but ignore const_tiny_rtx.  */
 
 static rtx
 gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
 {
   int nunits = GET_MODE_NUNITS (mode);
-  rtvec v = rtvec_alloc (nunits);
-  for (int i = 0; i < nunits; ++i)
-    RTVEC_ELT (v, i) = el;
-  return gen_rtx_raw_CONST_VECTOR (mode, v);
+  if (1)
+    {
+      rtvec v = rtvec_alloc (nunits);
+
+      for (int i = 0; i < nunits; ++i)
+	RTVEC_ELT (v, i) = el;
+
+      return gen_rtx_raw_CONST_VECTOR (mode, v);
+    }
+  else
+    {
+      if (spare_vec_duplicate)
+	{
+	  PUT_MODE (spare_vec_duplicate, mode);
+	  XEXP (spare_vec_duplicate, 0) = el;
+	}
+      else
+	spare_vec_duplicate = gen_rtx_VEC_DUPLICATE (mode, el);
+
+      rtx res = gen_rtx_CONST (mode, spare_vec_duplicate);
+      if (XEXP (res, 0) == spare_vec_duplicate)
+	spare_vec_duplicate = NULL_RTX;
+
+      return res;
+    }
 }
 
 /* Generate a vector constant of mode MODE in which every element has
@@ -5827,6 +5897,9 @@ const_vec_series_p_1 (const_rtx x, rtx *
   return true;
 }
 
+/* Temporary rtx used by gen_const_vec_series.  */
+static GTY((deletable)) rtx spare_vec_series;
+
 /* Generate a vector constant of mode MODE in which element I has
    the value BASE + I * STEP.  */
 
@@ -5836,13 +5909,33 @@ gen_const_vec_series (machine_mode mode,
   gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
 
   int nunits = GET_MODE_NUNITS (mode);
-  rtvec v = rtvec_alloc (nunits);
-  scalar_mode inner_mode = GET_MODE_INNER (mode);
-  RTVEC_ELT (v, 0) = base;
-  for (int i = 1; i < nunits; ++i)
-    RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
-					    RTVEC_ELT (v, i - 1), step);
-  return gen_rtx_raw_CONST_VECTOR (mode, v);
+  if (1)
+    {
+      rtvec v = rtvec_alloc (nunits);
+      scalar_mode inner_mode = GET_MODE_INNER (mode);
+      RTVEC_ELT (v, 0) = base;
+      for (int i = 1; i < nunits; ++i)
+	RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
+						RTVEC_ELT (v, i - 1), step);
+      return gen_rtx_raw_CONST_VECTOR (mode, v);
+    }
+  else
+    {
+      if (spare_vec_series)
+	{
+	  PUT_MODE (spare_vec_series, mode);
+	  XEXP (spare_vec_series, 0) = base;
+	  XEXP (spare_vec_series, 1) = step;
+	}
+      else
+	spare_vec_series = gen_rtx_VEC_SERIES (mode, base, step);
+
+      rtx res = gen_rtx_CONST (mode, spare_vec_series);
+      if (XEXP (res, 0) == spare_vec_series)
+	spare_vec_series = NULL_RTX;
+
+      return res;
+    }
 }
 
 /* Generate a vector of mode MODE in which element I has the value
@@ -6000,6 +6093,8 @@ init_emit_once (void)
 
   reg_attrs_htab = hash_table<reg_attr_hasher>::create_ggc (37);
 
+  const_htab = hash_table<const_hasher>::create_ggc (37);
+
 #ifdef INIT_EXPANDERS
   /* This is to initialize {init|mark|free}_machine_status before the first
      call to push_function_context_to.  This is needed by the Chill front
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	2017-08-03 10:40:53.029491180 +0100
+++ gcc/gengenrtl.c	2017-10-23 11:42:47.297720974 +0100
@@ -143,7 +143,8 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "CC0") == 0
 	  || strcmp (defs[idx].enumname, "RETURN") == 0
 	  || strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
-	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
+	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0
+	  || strcmp (defs[idx].enumname, "CONST") == 0);
 }
 
 /* Return nonzero if the RTL code given by index IDX is one that we should
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	2017-08-03 10:40:55.646123304 +0100
+++ gcc/rtl.c	2017-10-23 11:42:47.297720974 +0100
@@ -252,6 +252,9 @@ shared_const_p (const_rtx orig)
 {
   gcc_assert (GET_CODE (orig) == CONST);
 
+  if (unique_const_p (XEXP (orig, 0)))
+    return true;
+
   /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
      a LABEL_REF, it isn't sharable.  */
   return (GET_CODE (XEXP (orig, 0)) == PLUS

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [08/nn] Add a fixed_size_mode class
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (6 preceding siblings ...)
  2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab Richard Sandiford
@ 2017-10-23 11:22 ` Richard Sandiford
  2017-10-26 11:57   ` Richard Biener
  2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
                   ` (13 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:22 UTC (permalink / raw)
  To: gcc-patches

This patch adds a fixed_size_mode machine_mode wrapper
for modes that are known to have a fixed size.  That applies
to all current modes, but future patches will add support for
variable-sized modes.

The use of this class should be pretty restricted.  One important
use case is to hold the mode of static data, which can never be
variable-sized with current file formats.  Another is to hold
the modes of registers involved in __builtin_apply and
__builtin_result, since those interfaces don't cope well with
variable-sized data.

The class can also be useful when reinterpreting the contents of
a fixed-length bit string as a different kind of value.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (fixed_size_mode): New class.
	* rtl.h (get_pool_mode): Return fixed_size_mode.
	* gengtype.c (main): Add fixed_size_mode.
	* target.def (get_raw_result_mode): Return a fixed_size_mode.
	(get_raw_arg_mode): Likewise.
	* doc/tm.texi: Regenerate.
	* targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode.
	* targhooks.c (default_get_reg_raw_mode): Likewise.
	* config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise.
	* config/mips/mips.c (mips_get_reg_raw_mode): Likewise.
	* config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise.
	(msp430_get_raw_result_mode): Likewise.
	* config/avr/avr-protos.h (regmask): Use as_a <fixed_side_mode>
	* dbxout.c (dbxout_parms): Require fixed-size modes.
	* expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise.
	* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
	* omp-low.c (lower_oacc_reductions): Likewise.
	* simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes.
	(simplify_subreg): Update accordingly.
	* varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode.
	(force_const_mem): Update accordingly.  Return NULL_RTX for modes
	that aren't fixed-size.
	(get_pool_mode): Return a fixed_size_mode.
	(output_constant_pool_2): Take a fixed_size_mode.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-09-15 14:47:33.184331588 +0100
+++ gcc/machmode.h	2017-10-23 11:42:52.014721093 +0100
@@ -652,6 +652,39 @@ GET_MODE_2XWIDER_MODE (const T &m)
 extern const unsigned char mode_complex[NUM_MACHINE_MODES];
 #define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
 
+/* Represents a machine mode that must have a fixed size.  The main
+   use of this class is to represent the modes of objects that always
+   have static storage duration, such as constant pool entries.
+   (No current target supports the concept of variable-size static data.)  */
+class fixed_size_mode
+{
+public:
+  typedef mode_traits<fixed_size_mode>::from_int from_int;
+
+  ALWAYS_INLINE fixed_size_mode () {}
+  ALWAYS_INLINE fixed_size_mode (from_int m) : m_mode (machine_mode (m)) {}
+  ALWAYS_INLINE fixed_size_mode (const scalar_mode &m) : m_mode (m) {}
+  ALWAYS_INLINE fixed_size_mode (const scalar_int_mode &m) : m_mode (m) {}
+  ALWAYS_INLINE fixed_size_mode (const scalar_float_mode &m) : m_mode (m) {}
+  ALWAYS_INLINE fixed_size_mode (const scalar_mode_pod &m) : m_mode (m) {}
+  ALWAYS_INLINE fixed_size_mode (const scalar_int_mode_pod &m) : m_mode (m) {}
+  ALWAYS_INLINE fixed_size_mode (const complex_mode &m) : m_mode (m) {}
+  ALWAYS_INLINE operator machine_mode () const { return m_mode; }
+
+  static bool includes_p (machine_mode);
+
+protected:
+  machine_mode m_mode;
+};
+
+/* Return true if MODE has a fixed size.  */
+
+inline bool
+fixed_size_mode::includes_p (machine_mode)
+{
+  return true;
+}
+
 extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
 
 /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 11:42:47.297720974 +0100
+++ gcc/rtl.h	2017-10-23 11:42:52.015721094 +0100
@@ -3020,7 +3020,7 @@ extern rtx force_const_mem (machine_mode
 struct function;
 extern rtx get_pool_constant (const_rtx);
 extern rtx get_pool_constant_mark (rtx, bool *);
-extern machine_mode get_pool_mode (const_rtx);
+extern fixed_size_mode get_pool_mode (const_rtx);
 extern rtx simplify_subtraction (rtx);
 extern void decide_function_section (tree);
 
Index: gcc/gengtype.c
===================================================================
--- gcc/gengtype.c	2017-05-23 19:29:56.919436344 +0100
+++ gcc/gengtype.c	2017-10-23 11:42:52.014721093 +0100
@@ -5197,6 +5197,7 @@ #define POS_HERE(Call) do { pos.file = t
       POS_HERE (do_scalar_typedef ("JCF_u2", &pos));
       POS_HERE (do_scalar_typedef ("void", &pos));
       POS_HERE (do_scalar_typedef ("machine_mode", &pos));
+      POS_HERE (do_scalar_typedef ("fixed_size_mode", &pos));
       POS_HERE (do_typedef ("PTR", 
 			    create_pointer (resolve_typedef ("void", &pos)),
 			    &pos));
Index: gcc/target.def
===================================================================
--- gcc/target.def	2017-10-23 11:41:23.134456913 +0100
+++ gcc/target.def	2017-10-23 11:42:52.017721094 +0100
@@ -5021,7 +5021,7 @@ DEFHOOK
  "This target hook returns the mode to be used when accessing raw return\
  registers in @code{__builtin_return}.  Define this macro if the value\
  in @var{reg_raw_mode} is not correct.",
- machine_mode, (int regno),
+ fixed_size_mode, (int regno),
  default_get_reg_raw_mode)
 
 /* Return a mode wide enough to copy any argument value that might be
@@ -5031,7 +5031,7 @@ DEFHOOK
  "This target hook returns the mode to be used when accessing raw argument\
  registers in @code{__builtin_apply_args}.  Define this macro if the value\
  in @var{reg_raw_mode} is not correct.",
- machine_mode, (int regno),
+ fixed_size_mode, (int regno),
  default_get_reg_raw_mode)
 
 HOOK_VECTOR_END (calls)
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi	2017-10-23 11:41:22.175925023 +0100
+++ gcc/doc/tm.texi	2017-10-23 11:42:52.012721093 +0100
@@ -4536,11 +4536,11 @@ This macro has effect in @option{-fpcc-s
 nothing when you use @option{-freg-struct-return} mode.
 @end defmac
 
-@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
+@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
 This target hook returns the mode to be used when accessing raw return registers in @code{__builtin_return}.  Define this macro if the value in @var{reg_raw_mode} is not correct.
 @end deftypefn
 
-@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
+@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
 This target hook returns the mode to be used when accessing raw argument registers in @code{__builtin_apply_args}.  Define this macro if the value in @var{reg_raw_mode} is not correct.
 @end deftypefn
 
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h	2017-10-02 09:08:43.318933786 +0100
+++ gcc/targhooks.h	2017-10-23 11:42:52.017721094 +0100
@@ -233,7 +233,7 @@ extern int default_jump_align_max_skip (
 extern section * default_function_section(tree decl, enum node_frequency freq,
 					  bool startup, bool exit);
 extern machine_mode default_dwarf_frame_reg_mode (int);
-extern machine_mode default_get_reg_raw_mode (int);
+extern fixed_size_mode default_get_reg_raw_mode (int);
 extern bool default_keep_leaf_when_profiled ();
 
 extern void *default_get_pch_validity (size_t *);
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c	2017-10-23 11:41:23.195392846 +0100
+++ gcc/targhooks.c	2017-10-23 11:42:52.017721094 +0100
@@ -1834,10 +1834,12 @@ default_dwarf_frame_reg_mode (int regno)
 /* To be used by targets where reg_raw_mode doesn't return the right
    mode for registers used in apply_builtin_return and apply_builtin_arg.  */
 
-machine_mode
+fixed_size_mode
 default_get_reg_raw_mode (int regno)
 {
-  return reg_raw_mode[regno];
+  /* Targets must override this hook if the underlying register is
+     variable-sized.  */
+  return as_a <fixed_size_mode> (reg_raw_mode[regno]);
 }
 
 /* Return true if a leaf function should stay leaf even with profiling
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c	2017-10-23 11:41:32.363050263 +0100
+++ gcc/config/ia64/ia64.c	2017-10-23 11:42:52.009721093 +0100
@@ -329,7 +329,7 @@ static tree ia64_fold_builtin (tree, int
 static tree ia64_builtin_decl (unsigned, bool);
 
 static reg_class_t ia64_preferred_reload_class (rtx, reg_class_t);
-static machine_mode ia64_get_reg_raw_mode (int regno);
+static fixed_size_mode ia64_get_reg_raw_mode (int regno);
 static section * ia64_hpux_function_section (tree, enum node_frequency,
 					     bool, bool);
 
@@ -11328,7 +11328,7 @@ ia64_dconst_0_375 (void)
   return ia64_dconst_0_375_rtx;
 }
 
-static machine_mode
+static fixed_size_mode
 ia64_get_reg_raw_mode (int regno)
 {
   if (FR_REGNO_P (regno))
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c	2017-10-23 11:41:32.365050264 +0100
+++ gcc/config/mips/mips.c	2017-10-23 11:42:52.010721093 +0100
@@ -1132,7 +1132,6 @@ static rtx mips_find_pic_call_symbol (rt
 static int mips_register_move_cost (machine_mode, reg_class_t,
 				    reg_class_t);
 static unsigned int mips_function_arg_boundary (machine_mode, const_tree);
-static machine_mode mips_get_reg_raw_mode (int regno);
 static rtx mips_gen_const_int_vector_shuffle (machine_mode, int);
 \f
 /* This hash table keeps track of implicit "mips16" and "nomips16" attributes
@@ -6111,7 +6110,7 @@ mips_function_arg_boundary (machine_mode
 
 /* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE.  */
 
-static machine_mode
+static fixed_size_mode
 mips_get_reg_raw_mode (int regno)
 {
   if (TARGET_FLOATXX && FP_REG_P (regno))
Index: gcc/config/msp430/msp430.c
===================================================================
--- gcc/config/msp430/msp430.c	2017-10-23 11:41:23.047405581 +0100
+++ gcc/config/msp430/msp430.c	2017-10-23 11:42:52.011721093 +0100
@@ -1398,16 +1398,17 @@ msp430_return_in_memory (const_tree ret_
 #undef  TARGET_GET_RAW_ARG_MODE
 #define TARGET_GET_RAW_ARG_MODE msp430_get_raw_arg_mode
 
-static machine_mode
+static fixed_size_mode
 msp430_get_raw_arg_mode (int regno)
 {
-  return (regno == ARG_POINTER_REGNUM) ? VOIDmode : Pmode;
+  return as_a <fixed_size_mode> (regno == ARG_POINTER_REGNUM
+				 ? VOIDmode : Pmode);
 }
 
 #undef  TARGET_GET_RAW_RESULT_MODE
 #define TARGET_GET_RAW_RESULT_MODE msp430_get_raw_result_mode
 
-static machine_mode
+static fixed_size_mode
 msp430_get_raw_result_mode (int regno ATTRIBUTE_UNUSED)
 {
   return Pmode;
Index: gcc/config/avr/avr-protos.h
===================================================================
--- gcc/config/avr/avr-protos.h	2017-10-23 11:41:22.812366984 +0100
+++ gcc/config/avr/avr-protos.h	2017-10-23 11:42:52.007721093 +0100
@@ -132,7 +132,7 @@ extern bool avr_casei_sequence_check_ope
 static inline unsigned
 regmask (machine_mode mode, unsigned regno)
 {
-  return ((1u << GET_MODE_SIZE (mode)) - 1) << regno;
+  return ((1u << GET_MODE_SIZE (as_a <fixed_size_mode> (mode))) - 1) << regno;
 }
 
 extern void avr_fix_inputs (rtx*, unsigned, unsigned);
Index: gcc/dbxout.c
===================================================================
--- gcc/dbxout.c	2017-10-10 17:55:22.088175460 +0100
+++ gcc/dbxout.c	2017-10-23 11:42:52.011721093 +0100
@@ -3393,12 +3393,16 @@ dbxout_parms (tree parms)
 {
   ++debug_nesting;
   emit_pending_bincls_if_required ();
+  fixed_size_mode rtl_mode, type_mode;
 
   for (; parms; parms = DECL_CHAIN (parms))
     if (DECL_NAME (parms)
 	&& TREE_TYPE (parms) != error_mark_node
 	&& DECL_RTL_SET_P (parms)
-	&& DECL_INCOMING_RTL (parms))
+	&& DECL_INCOMING_RTL (parms)
+	/* We can't represent variable-sized types in this format.  */
+	&& is_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (parms)), &type_mode)
+	&& is_a <fixed_size_mode> (GET_MODE (DECL_RTL (parms)), &rtl_mode))
       {
 	tree eff_type;
 	char letter;
@@ -3555,10 +3559,9 @@ dbxout_parms (tree parms)
 	    /* Make a big endian correction if the mode of the type of the
 	       parameter is not the same as the mode of the rtl.  */
 	    if (BYTES_BIG_ENDIAN
-		&& TYPE_MODE (TREE_TYPE (parms)) != GET_MODE (DECL_RTL (parms))
-		&& GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))) < UNITS_PER_WORD)
-	      number += (GET_MODE_SIZE (GET_MODE (DECL_RTL (parms)))
-			 - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))));
+		&& type_mode != rtl_mode
+		&& GET_MODE_SIZE (type_mode) < UNITS_PER_WORD)
+	      number += GET_MODE_SIZE (rtl_mode) - GET_MODE_SIZE (type_mode);
 	  }
 	else
 	  /* ??? We don't know how to represent this argument.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 11:42:34.915720660 +0100
+++ gcc/expr.c	2017-10-23 11:42:52.013721093 +0100
@@ -2628,9 +2628,10 @@ copy_blkmode_from_reg (rtx target, rtx s
   rtx src = NULL, dst = NULL;
   unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD);
   unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0;
-  machine_mode mode = GET_MODE (srcreg);
-  machine_mode tmode = GET_MODE (target);
-  machine_mode copy_mode;
+  /* No current ABI uses variable-sized modes to pass a BLKmnode type.  */
+  fixed_size_mode mode = as_a <fixed_size_mode> (GET_MODE (srcreg));
+  fixed_size_mode tmode = as_a <fixed_size_mode> (GET_MODE (target));
+  fixed_size_mode copy_mode;
 
   /* BLKmode registers created in the back-end shouldn't have survived.  */
   gcc_assert (mode != BLKmode);
@@ -2728,19 +2729,21 @@ copy_blkmode_from_reg (rtx target, rtx s
     }
 }
 
-/* Copy BLKmode value SRC into a register of mode MODE.  Return the
+/* Copy BLKmode value SRC into a register of mode MODE_IN.  Return the
    register if it contains any data, otherwise return null.
 
    This is used on targets that return BLKmode values in registers.  */
 
 rtx
-copy_blkmode_to_reg (machine_mode mode, tree src)
+copy_blkmode_to_reg (machine_mode mode_in, tree src)
 {
   int i, n_regs;
   unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0, bytes;
   unsigned int bitsize;
   rtx *dst_words, dst, x, src_word = NULL_RTX, dst_word = NULL_RTX;
-  machine_mode dst_mode;
+  /* No current ABI uses variable-sized modes to pass a BLKmnode type.  */
+  fixed_size_mode mode = as_a <fixed_size_mode> (mode_in);
+  fixed_size_mode dst_mode;
 
   gcc_assert (TYPE_MODE (TREE_TYPE (src)) == BLKmode);
 
Index: gcc/gimple-ssa-store-merging.c
===================================================================
--- gcc/gimple-ssa-store-merging.c	2017-10-09 11:50:52.446411111 +0100
+++ gcc/gimple-ssa-store-merging.c	2017-10-23 11:42:52.014721093 +0100
@@ -401,8 +401,11 @@ encode_tree_to_bitpos (tree expr, unsign
     The awkwardness comes from the fact that bitpos is counted from the
     most significant bit of a byte.  */
 
+  /* We must be dealing with fixed-size data at this point, since the
+     total size is also fixed.  */
+  fixed_size_mode mode = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (expr)));
   /* Allocate an extra byte so that we have space to shift into.  */
-  unsigned int byte_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (expr))) + 1;
+  unsigned int byte_size = GET_MODE_SIZE (mode) + 1;
   unsigned char *tmpbuf = XALLOCAVEC (unsigned char, byte_size);
   memset (tmpbuf, '\0', byte_size);
   /* The store detection code should only have allowed constants that are
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c	2017-10-10 17:55:22.100175459 +0100
+++ gcc/omp-low.c	2017-10-23 11:42:52.015721094 +0100
@@ -5067,8 +5067,10 @@ lower_oacc_reductions (location_t loc, t
 	  v1 = v2 = v3 = var;
 
 	/* Determine position in reduction buffer, which may be used
-	   by target.  */
-	machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+	   by target.  The parser has ensured that this is not a
+	   variable-sized type.  */
+	fixed_size_mode mode
+	  = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (var)));
 	unsigned align = GET_MODE_ALIGNMENT (mode) /  BITS_PER_UNIT;
 	offset = (offset + align - 1) & ~(align - 1);
 	tree off = build_int_cst (sizetype, offset);
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 11:41:41.550050496 +0100
+++ gcc/simplify-rtx.c	2017-10-23 11:42:52.016721094 +0100
@@ -48,8 +48,6 @@ #define HWI_SIGN_EXTEND(low) \
 static rtx neg_const_int (machine_mode, const_rtx);
 static bool plus_minus_operand_p (const_rtx);
 static rtx simplify_plus_minus (enum rtx_code, machine_mode, rtx, rtx);
-static rtx simplify_immed_subreg (machine_mode, rtx, machine_mode,
-				  unsigned int);
 static rtx simplify_associative_operation (enum rtx_code, machine_mode,
 					   rtx, rtx);
 static rtx simplify_relational_operation_1 (enum rtx_code, machine_mode,
@@ -5802,8 +5800,8 @@ simplify_ternary_operation (enum rtx_cod
    and then repacking them again for OUTERMODE.  */
 
 static rtx
-simplify_immed_subreg (machine_mode outermode, rtx op,
-		       machine_mode innermode, unsigned int byte)
+simplify_immed_subreg (fixed_size_mode outermode, rtx op,
+		       fixed_size_mode innermode, unsigned int byte)
 {
   enum {
     value_bit = 8,
@@ -6171,7 +6169,18 @@ simplify_subreg (machine_mode outermode,
       || CONST_DOUBLE_AS_FLOAT_P (op)
       || GET_CODE (op) == CONST_FIXED
       || GET_CODE (op) == CONST_VECTOR)
-    return simplify_immed_subreg (outermode, op, innermode, byte);
+    {
+      /* simplify_immed_subreg deconstructs OP into bytes and constructs
+	 the result from bytes, so it only works if the sizes of the modes
+	 are known at compile time.  Cases that apply to general modes
+	 should be handled here before calling simplify_immed_subreg.  */
+      fixed_size_mode fs_outermode, fs_innermode;
+      if (is_a <fixed_size_mode> (outermode, &fs_outermode)
+	  && is_a <fixed_size_mode> (innermode, &fs_innermode))
+	return simplify_immed_subreg (fs_outermode, op, fs_innermode, byte);
+
+      return NULL_RTX;
+    }
 
   /* Changing mode twice with SUBREG => just change it once,
      or not at all if changing back op starting mode.  */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-10-23 11:42:34.927720660 +0100
+++ gcc/varasm.c	2017-10-23 11:42:52.018721094 +0100
@@ -3584,7 +3584,7 @@ struct GTY((chain_next ("%h.next"), for_
   rtx constant;
   HOST_WIDE_INT offset;
   hashval_t hash;
-  machine_mode mode;
+  fixed_size_mode mode;
   unsigned int align;
   int labelno;
   int mark;
@@ -3760,10 +3760,11 @@ simplify_subtraction (rtx x)
 }
 \f
 /* Given a constant rtx X, make (or find) a memory constant for its value
-   and return a MEM rtx to refer to it in memory.  */
+   and return a MEM rtx to refer to it in memory.  IN_MODE is the mode
+   of X.  */
 
 rtx
-force_const_mem (machine_mode mode, rtx x)
+force_const_mem (machine_mode in_mode, rtx x)
 {
   struct constant_descriptor_rtx *desc, tmp;
   struct rtx_constant_pool *pool;
@@ -3772,6 +3773,11 @@ force_const_mem (machine_mode mode, rtx
   hashval_t hash;
   unsigned int align;
   constant_descriptor_rtx **slot;
+  fixed_size_mode mode;
+
+  /* We can't force variable-sized objects to memory.  */
+  if (!is_a <fixed_size_mode> (in_mode, &mode))
+    return NULL_RTX;
 
   /* If we're not allowed to drop X into the constant pool, don't.  */
   if (targetm.cannot_force_const_mem (mode, x))
@@ -3881,7 +3887,7 @@ get_pool_constant_mark (rtx addr, bool *
 
 /* Similar, return the mode.  */
 
-machine_mode
+fixed_size_mode
 get_pool_mode (const_rtx addr)
 {
   return SYMBOL_REF_CONSTANT (addr)->mode;
@@ -3901,7 +3907,7 @@ constant_pool_empty_p (void)
    in MODE with known alignment ALIGN.  */
 
 static void
-output_constant_pool_2 (machine_mode mode, rtx x, unsigned int align)
+output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
 {
   switch (GET_MODE_CLASS (mode))
     {

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [09/nn] Add a fixed_size_mode_pod class
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (7 preceding siblings ...)
  2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
@ 2017-10-23 11:23 ` Richard Sandiford
  2017-10-26 11:59   ` Richard Biener
  2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
                   ` (12 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:23 UTC (permalink / raw)
  To: gcc-patches

This patch adds a POD version of fixed_size_mode.  The only current use
is for storing the __builtin_apply and __builtin_result register modes,
which were made fixed_size_modes by the previous patch.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* coretypes.h (fixed_size_mode): Declare.
	(fixed_size_mode_pod): New typedef.
	* builtins.h (target_builtins::x_apply_args_mode)
	(target_builtins::x_apply_result_mode): Change type to
	fixed_size_mode_pod.
	* builtins.c (apply_args_size, apply_result_size, result_vector)
	(expand_builtin_apply_args_1, expand_builtin_apply)
	(expand_builtin_return): Update accordingly.

Index: gcc/coretypes.h
===================================================================
--- gcc/coretypes.h	2017-09-11 17:10:58.656085547 +0100
+++ gcc/coretypes.h	2017-10-23 11:42:57.592545063 +0100
@@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
 class scalar_int_mode;
 class scalar_float_mode;
 class complex_mode;
+class fixed_size_mode;
 template<typename> class opt_mode;
 typedef opt_mode<scalar_mode> opt_scalar_mode;
 typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
@@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
 template<typename> class pod_mode;
 typedef pod_mode<scalar_mode> scalar_mode_pod;
 typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
+typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
 
 /* Subclasses of rtx_def, using indentation to show the class
    hierarchy, along with the relevant invariant.
Index: gcc/builtins.h
===================================================================
--- gcc/builtins.h	2017-08-30 12:18:46.602740973 +0100
+++ gcc/builtins.h	2017-10-23 11:42:57.592545063 +0100
@@ -29,14 +29,14 @@ struct target_builtins {
      the register is not used for calling a function.  If the machine
      has register windows, this gives only the outbound registers.
      INCOMING_REGNO gives the corresponding inbound register.  */
-  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
+  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
 
   /* For each register that may be used for returning values, this gives
      a mode used to copy the register's value.  VOIDmode indicates the
      register is not used for returning values.  If the machine has
      register windows, this gives only the outbound registers.
      INCOMING_REGNO gives the corresponding inbound register.  */
-  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
+  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
 };
 
 extern struct target_builtins default_target_builtins;
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c	2017-10-23 11:41:23.140260335 +0100
+++ gcc/builtins.c	2017-10-23 11:42:57.592545063 +0100
@@ -1358,7 +1358,6 @@ apply_args_size (void)
   static int size = -1;
   int align;
   unsigned int regno;
-  machine_mode mode;
 
   /* The values computed by this function never change.  */
   if (size < 0)
@@ -1374,7 +1373,7 @@ apply_args_size (void)
       for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	if (FUNCTION_ARG_REGNO_P (regno))
 	  {
-	    mode = targetm.calls.get_raw_arg_mode (regno);
+	    fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
 
 	    gcc_assert (mode != VOIDmode);
 
@@ -1386,7 +1385,7 @@ apply_args_size (void)
 	  }
 	else
 	  {
-	    apply_args_mode[regno] = VOIDmode;
+	    apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
 	  }
     }
   return size;
@@ -1400,7 +1399,6 @@ apply_result_size (void)
 {
   static int size = -1;
   int align, regno;
-  machine_mode mode;
 
   /* The values computed by this function never change.  */
   if (size < 0)
@@ -1410,7 +1408,7 @@ apply_result_size (void)
       for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
 	if (targetm.calls.function_value_regno_p (regno))
 	  {
-	    mode = targetm.calls.get_raw_result_mode (regno);
+	    fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
 
 	    gcc_assert (mode != VOIDmode);
 
@@ -1421,7 +1419,7 @@ apply_result_size (void)
 	    apply_result_mode[regno] = mode;
 	  }
 	else
-	  apply_result_mode[regno] = VOIDmode;
+	  apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
 
       /* Allow targets that use untyped_call and untyped_return to override
 	 the size so that machine-specific information can be stored here.  */
@@ -1440,7 +1438,7 @@ apply_result_size (void)
 result_vector (int savep, rtx result)
 {
   int regno, size, align, nelts;
-  machine_mode mode;
+  fixed_size_mode mode;
   rtx reg, mem;
   rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
 
@@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
 {
   rtx registers, tem;
   int size, align, regno;
-  machine_mode mode;
+  fixed_size_mode mode;
   rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
 
   /* Create a block where the arg-pointer, structure value address,
@@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
 expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
 {
   int size, align, regno;
-  machine_mode mode;
+  fixed_size_mode mode;
   rtx incoming_args, result, reg, dest, src;
   rtx_call_insn *call_insn;
   rtx old_stack_level = 0;
@@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
 expand_builtin_return (rtx result)
 {
   int size, align, regno;
-  machine_mode mode;
+  fixed_size_mode mode;
   rtx reg;
   rtx_insn *call_fusage = 0;
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [10/nn] Widening optab cleanup
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (8 preceding siblings ...)
  2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
@ 2017-10-23 11:24 ` Richard Sandiford
  2017-10-30 18:32   ` Jeff Law
  2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
                   ` (11 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:24 UTC (permalink / raw)
  To: gcc-patches

widening_optab_handler had the comment:

      /* ??? Why does find_widening_optab_handler_and_mode attempt to
         widen things that can't be widened?  E.g. add_optab... */
      if (op > LAST_CONV_OPTAB)
        return CODE_FOR_nothing;

I think it comes from expand_binop using
find_widening_optab_handler_and_mode for two things: to test whether
a "normal" optab like add_optab is supported for a standard binary
operation and to test whether a "convert" optab is supported for a
widening operation like umul_widen_optab.  In the former case from_mode
and to_mode must be the same, in the latter from_mode must be narrower
than to_mode.

For the former case, find_widening_optab_handler_and_mode is only really
testing the modes that are passed in.  permit_non_widening must be true
here.

For the latter case, find_widening_optab_handler_and_mode should only
really consider new from_modes that are wider than the original
from_mode and narrower than the original to_mode.  Logically
permit_non_widening should be false, since widening optabs aren't
supposed to take operands that are the same width as the destination.
We get away with permit_non_widening being true because no target
would/should define a widening .md pattern with matching modes.

But really, it seems better for expand_binop to handle these two
cases itself rather than pushing them down.  With that change,
find_widening_optab_handler_and_mode is only ever called with
permit_non_widening set to false and is only ever called with
a "proper" convert optab.  We then no longer need widening_optab_handler,
we can just use convert_optab_handler directly.

The patch also passes the instruction code down to expand_binop_directly.
This should be more efficient and removes an extra call to
find_widening_optab_handler_and_mode.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* optabs-query.h (convert_optab_p): New function, split out from...
	(convert_optab_handler): ...here.
	(widening_optab_handler): Delete.
	(find_widening_optab_handler): Remove permit_non_widening parameter.
	(find_widening_optab_handler_and_mode): Likewise.  Provide an
	override that operates on mode class wrappers.
	* optabs-query.c (widening_optab_handler): Delete.
	(find_widening_optab_handler_and_mode): Remove permit_non_widening
	parameter.  Assert that the two modes are the same class and that
	the "from" mode is narrower than the "to" mode.  Use
	convert_optab_handler instead of widening_optab_handler.
	* expmed.c (expmed_mult_highpart_optab): Use convert_optab_handler
	instead of widening_optab_handler.
	* expr.c (expand_expr_real_2): Update calls to
	find_widening_optab_handler.
	* optabs.c (expand_widen_pattern_expr): Likewise.
	(expand_binop_directly): Take the insn_code as a parameter.
	(expand_binop): Only call find_widening_optab_handler for
	conversion optabs; use optab_handler otherwise.  Update calls
	to find_widening_optab_handler and expand_binop_directly.
	Use convert_optab_handler instead of widening_optab_handler.
	* tree-ssa-math-opts.c (convert_mult_to_widen): Update calls to
	find_widening_optab_handler and use scalar_mode rather than
	machine_mode.
	(convert_plusminus_to_widen): Likewise.

Index: gcc/optabs-query.h
===================================================================
--- gcc/optabs-query.h	2017-09-14 17:04:19.080694343 +0100
+++ gcc/optabs-query.h	2017-10-23 11:43:01.517673716 +0100
@@ -23,6 +23,14 @@ #define GCC_OPTABS_QUERY_H
 #include "insn-opinit.h"
 #include "target.h"
 
+/* Return true if OP is a conversion optab.  */
+
+inline bool
+convert_optab_p (optab op)
+{
+  return op > unknown_optab && op <= LAST_CONV_OPTAB;
+}
+
 /* Return the insn used to implement mode MODE of OP, or CODE_FOR_nothing
    if the target does not have such an insn.  */
 
@@ -43,7 +51,7 @@ convert_optab_handler (convert_optab op,
 		       machine_mode from_mode)
 {
   unsigned scode = (op << 16) | (from_mode << 8) | to_mode;
-  gcc_assert (op > unknown_optab && op <= LAST_CONV_OPTAB);
+  gcc_assert (convert_optab_p (op));
   return raw_optab_handler (scode);
 }
 
@@ -167,12 +175,11 @@ enum insn_code can_float_p (machine_mode
 enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
 bool can_conditionally_move_p (machine_mode mode);
 bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
-enum insn_code widening_optab_handler (optab, machine_mode, machine_mode);
 /* Find a widening optab even if it doesn't widen as much as we want.  */
-#define find_widening_optab_handler(A,B,C,D) \
-  find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+#define find_widening_optab_handler(A, B, C) \
+  find_widening_optab_handler_and_mode (A, B, C, NULL)
 enum insn_code find_widening_optab_handler_and_mode (optab, machine_mode,
-						     machine_mode, int,
+						     machine_mode,
 						     machine_mode *);
 int can_mult_highpart_p (machine_mode, bool);
 bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
@@ -181,4 +188,20 @@ bool can_atomic_exchange_p (machine_mode
 bool can_atomic_load_p (machine_mode);
 bool lshift_cheap_p (bool);
 
+/* Version of find_widening_optab_handler_and_mode that operates on
+   specific mode types.  */
+
+template<typename T>
+inline enum insn_code
+find_widening_optab_handler_and_mode (optab op, const T &to_mode,
+				      const T &from_mode, T *found_mode)
+{
+  machine_mode tmp;
+  enum insn_code icode = find_widening_optab_handler_and_mode
+    (op, machine_mode (to_mode), machine_mode (from_mode), &tmp);
+  if (icode != CODE_FOR_nothing && found_mode)
+    *found_mode = as_a <T> (tmp);
+  return icode;
+}
+
 #endif
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-09-25 13:57:21.028734061 +0100
+++ gcc/optabs-query.c	2017-10-23 11:43:01.517673716 +0100
@@ -401,44 +401,20 @@ can_vec_perm_p (machine_mode mode, bool
   return true;
 }
 
-/* Like optab_handler, but for widening_operations that have a
-   TO_MODE and a FROM_MODE.  */
-
-enum insn_code
-widening_optab_handler (optab op, machine_mode to_mode,
-			machine_mode from_mode)
-{
-  unsigned scode = (op << 16) | to_mode;
-  if (to_mode != from_mode && from_mode != VOIDmode)
-    {
-      /* ??? Why does find_widening_optab_handler_and_mode attempt to
-	 widen things that can't be widened?  E.g. add_optab... */
-      if (op > LAST_CONV_OPTAB)
-	return CODE_FOR_nothing;
-      scode |= from_mode << 8;
-    }
-  return raw_optab_handler (scode);
-}
-
 /* Find a widening optab even if it doesn't widen as much as we want.
    E.g. if from_mode is HImode, and to_mode is DImode, and there is no
-   direct HI->SI insn, then return SI->DI, if that exists.
-   If PERMIT_NON_WIDENING is non-zero then this can be used with
-   non-widening optabs also.  */
+   direct HI->SI insn, then return SI->DI, if that exists.  */
 
 enum insn_code
 find_widening_optab_handler_and_mode (optab op, machine_mode to_mode,
 				      machine_mode from_mode,
-				      int permit_non_widening,
 				      machine_mode *found_mode)
 {
-  for (; (permit_non_widening || from_mode != to_mode)
-	 && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
-	 && from_mode != VOIDmode;
-       from_mode = GET_MODE_WIDER_MODE (from_mode).else_void ())
+  gcc_checking_assert (GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode));
+  gcc_checking_assert (from_mode < to_mode);
+  FOR_EACH_MODE (from_mode, from_mode, to_mode)
     {
-      enum insn_code handler = widening_optab_handler (op, to_mode,
-						       from_mode);
+      enum insn_code handler = convert_optab_handler (op, to_mode, from_mode);
 
       if (handler != CODE_FOR_nothing)
 	{
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 11:42:34.914720660 +0100
+++ gcc/expmed.c	2017-10-23 11:43:01.515743957 +0100
@@ -3701,7 +3701,7 @@ expmed_mult_highpart_optab (scalar_int_m
 
   /* Try widening multiplication.  */
   moptab = unsignedp ? umul_widen_optab : smul_widen_optab;
-  if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
+  if (convert_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
       && mul_widen_cost (speed, wider_mode) < max_cost)
     {
       tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0,
@@ -3740,7 +3740,7 @@ expmed_mult_highpart_optab (scalar_int_m
 
   /* Try widening multiplication of opposite signedness, and adjust.  */
   moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
-  if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
+  if (convert_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
       && size - 1 < BITS_PER_WORD
       && (mul_widen_cost (speed, wider_mode)
 	  + 2 * shift_cost (speed, mode, size-1)
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-10-23 11:42:52.013721093 +0100
+++ gcc/expr.c	2017-10-23 11:43:01.517673716 +0100
@@ -8640,7 +8640,7 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
 	{
 	  machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
 	  this_optab = usmul_widen_optab;
-	  if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+	  if (find_widening_optab_handler (this_optab, mode, innermode)
 		!= CODE_FOR_nothing)
 	    {
 	      if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
@@ -8675,7 +8675,7 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
 
 	  if (TREE_CODE (treeop0) != INTEGER_CST)
 	    {
-	      if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+	      if (find_widening_optab_handler (this_optab, mode, innermode)
 		    != CODE_FOR_nothing)
 		{
 		  expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -8697,7 +8697,7 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
 					       unsignedp, this_optab);
 		  return REDUCE_BIT_FIELD (temp);
 		}
-	      if (find_widening_optab_handler (other_optab, mode, innermode, 0)
+	      if (find_widening_optab_handler (other_optab, mode, innermode)
 		    != CODE_FOR_nothing
 		  && innermode == word_mode)
 		{
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:42:34.919720660 +0100
+++ gcc/optabs.c	2017-10-23 11:43:01.518638595 +0100
@@ -264,7 +264,7 @@ expand_widen_pattern_expr (sepops ops, r
       || ops->code == WIDEN_MULT_MINUS_EXPR)
     icode = find_widening_optab_handler (widen_pattern_optab,
 					 TYPE_MODE (TREE_TYPE (ops->op2)),
-					 tmode0, 0);
+					 tmode0);
   else
     icode = optab_handler (widen_pattern_optab, tmode0);
   gcc_assert (icode != CODE_FOR_nothing);
@@ -989,17 +989,14 @@ avoid_expensive_constant (machine_mode m
 }
 
 /* Helper function for expand_binop: handle the case where there
-   is an insn that directly implements the indicated operation.
+   is an insn ICODE that directly implements the indicated operation.
    Returns null if this is not possible.  */
 static rtx
-expand_binop_directly (machine_mode mode, optab binoptab,
+expand_binop_directly (enum insn_code icode, machine_mode mode, optab binoptab,
 		       rtx op0, rtx op1,
 		       rtx target, int unsignedp, enum optab_methods methods,
 		       rtx_insn *last)
 {
-  machine_mode from_mode = widened_mode (mode, op0, op1);
-  enum insn_code icode = find_widening_optab_handler (binoptab, mode,
-						      from_mode, 1);
   machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
   machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
   machine_mode mode0, mode1, tmp_mode;
@@ -1123,6 +1120,7 @@ expand_binop (machine_mode mode, optab b
     = (methods == OPTAB_LIB || methods == OPTAB_LIB_WIDEN
        ? OPTAB_WIDEN : methods);
   enum mode_class mclass;
+  enum insn_code icode;
   machine_mode wider_mode;
   scalar_int_mode int_mode;
   rtx libfunc;
@@ -1156,23 +1154,30 @@ expand_binop (machine_mode mode, optab b
 
   /* If we can do it with a three-operand insn, do so.  */
 
-  if (methods != OPTAB_MUST_WIDEN
-      && find_widening_optab_handler (binoptab, mode,
-				      widened_mode (mode, op0, op1), 1)
-	    != CODE_FOR_nothing)
+  if (methods != OPTAB_MUST_WIDEN)
     {
-      temp = expand_binop_directly (mode, binoptab, op0, op1, target,
-				    unsignedp, methods, last);
-      if (temp)
-	return temp;
+      if (convert_optab_p (binoptab))
+	{
+	  machine_mode from_mode = widened_mode (mode, op0, op1);
+	  icode = find_widening_optab_handler (binoptab, mode, from_mode);
+	}
+      else
+	icode = optab_handler (binoptab, mode);
+      if (icode != CODE_FOR_nothing)
+	{
+	  temp = expand_binop_directly (icode, mode, binoptab, op0, op1,
+					target, unsignedp, methods, last);
+	  if (temp)
+	    return temp;
+	}
     }
 
   /* If we were trying to rotate, and that didn't work, try rotating
      the other direction before falling back to shifts and bitwise-or.  */
   if (((binoptab == rotl_optab
-	&& optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
+	&& (icode = optab_handler (rotr_optab, mode)) != CODE_FOR_nothing)
        || (binoptab == rotr_optab
-	   && optab_handler (rotl_optab, mode) != CODE_FOR_nothing))
+	   && (icode = optab_handler (rotl_optab, mode)) != CODE_FOR_nothing))
       && is_int_mode (mode, &int_mode))
     {
       optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
@@ -1188,7 +1193,7 @@ expand_binop (machine_mode mode, optab b
 			       gen_int_mode (bits, GET_MODE (op1)), op1,
 			       NULL_RTX, unsignedp, OPTAB_DIRECT);
 
-      temp = expand_binop_directly (int_mode, otheroptab, op0, newop1,
+      temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1,
 				    target, unsignedp, methods, last);
       if (temp)
 	return temp;
@@ -1235,7 +1240,8 @@ expand_binop (machine_mode mode, optab b
       else if (binoptab == rotr_optab)
 	otheroptab = vrotr_optab;
 
-      if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
+      if (otheroptab
+	  && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing)
 	{
 	  /* The scalar may have been extended to be too wide.  Truncate
 	     it back to the proper size to fit in the broadcast vector.  */
@@ -1249,7 +1255,7 @@ expand_binop (machine_mode mode, optab b
 	  rtx vop1 = expand_vector_broadcast (mode, op1);
 	  if (vop1)
 	    {
-	      temp = expand_binop_directly (mode, otheroptab, op0, vop1,
+	      temp = expand_binop_directly (icode, mode, otheroptab, op0, vop1,
 					    target, unsignedp, methods, last);
 	      if (temp)
 		return temp;
@@ -1272,7 +1278,7 @@ expand_binop (machine_mode mode, optab b
 		&& (find_widening_optab_handler ((unsignedp
 						  ? umul_widen_optab
 						  : smul_widen_optab),
-						 next_mode, mode, 0)
+						 next_mode, mode)
 		    != CODE_FOR_nothing)))
 	  {
 	    rtx xop0 = op0, xop1 = op1;
@@ -1703,7 +1709,7 @@ expand_binop (machine_mode mode, optab b
       && optab_handler (add_optab, word_mode) != CODE_FOR_nothing)
     {
       rtx product = NULL_RTX;
-      if (widening_optab_handler (umul_widen_optab, int_mode, word_mode)
+      if (convert_optab_handler (umul_widen_optab, int_mode, word_mode)
 	  != CODE_FOR_nothing)
 	{
 	  product = expand_doubleword_mult (int_mode, op0, op1, target,
@@ -1713,7 +1719,7 @@ expand_binop (machine_mode mode, optab b
 	}
 
       if (product == NULL_RTX
-	  && (widening_optab_handler (smul_widen_optab, int_mode, word_mode)
+	  && (convert_optab_handler (smul_widen_optab, int_mode, word_mode)
 	      != CODE_FOR_nothing))
 	{
 	  product = expand_doubleword_mult (int_mode, op0, op1, target,
@@ -1806,10 +1812,13 @@ expand_binop (machine_mode mode, optab b
 
   if (CLASS_HAS_WIDER_MODES_P (mclass))
     {
+      /* This code doesn't make sense for conversion optabs, since we
+	 wouldn't then want to extend the operands to be the same size
+	 as the result.  */
+      gcc_assert (!convert_optab_p (binoptab));
       FOR_EACH_WIDER_MODE (wider_mode, mode)
 	{
-	  if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
-		  != CODE_FOR_nothing
+	  if (optab_handler (binoptab, wider_mode)
 	      || (methods == OPTAB_LIB
 		  && optab_libfunc (binoptab, wider_mode)))
 	    {
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c	2017-10-09 11:51:27.664982724 +0100
+++ gcc/tree-ssa-math-opts.c	2017-10-23 11:43:01.519603474 +0100
@@ -3242,7 +3242,7 @@ convert_mult_to_widen (gimple *stmt, gim
 {
   tree lhs, rhs1, rhs2, type, type1, type2;
   enum insn_code handler;
-  machine_mode to_mode, from_mode, actual_mode;
+  scalar_int_mode to_mode, from_mode, actual_mode;
   optab op;
   int actual_precision;
   location_t loc = gimple_location (stmt);
@@ -3269,7 +3269,7 @@ convert_mult_to_widen (gimple *stmt, gim
     op = usmul_widen_optab;
 
   handler = find_widening_optab_handler_and_mode (op, to_mode, from_mode,
-						  0, &actual_mode);
+						  &actual_mode);
 
   if (handler == CODE_FOR_nothing)
     {
@@ -3290,7 +3290,7 @@ convert_mult_to_widen (gimple *stmt, gim
 
 	  op = smul_widen_optab;
 	  handler = find_widening_optab_handler_and_mode (op, to_mode,
-							  from_mode, 0,
+							  from_mode,
 							  &actual_mode);
 
 	  if (handler == CODE_FOR_nothing)
@@ -3350,8 +3350,7 @@ convert_plusminus_to_widen (gimple_stmt_
   optab this_optab;
   enum tree_code wmult_code;
   enum insn_code handler;
-  scalar_mode to_mode, from_mode;
-  machine_mode actual_mode;
+  scalar_mode to_mode, from_mode, actual_mode;
   location_t loc = gimple_location (stmt);
   int actual_precision;
   bool from_unsigned1, from_unsigned2;
@@ -3509,7 +3508,7 @@ convert_plusminus_to_widen (gimple_stmt_
      this transformation is likely to pessimize code.  */
   this_optab = optab_for_tree_code (wmult_code, optype, optab_default);
   handler = find_widening_optab_handler_and_mode (this_optab, to_mode,
-						  from_mode, 0, &actual_mode);
+						  from_mode, &actual_mode);
 
   if (handler == CODE_FOR_nothing)
     return false;

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [11/nn] Add narrower_subreg_mode helper function
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (9 preceding siblings ...)
  2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
@ 2017-10-23 11:24 ` Richard Sandiford
  2017-10-30 15:06   ` Jeff Law
  2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
                   ` (10 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:24 UTC (permalink / raw)
  To: gcc-patches

This patch adds a narrowing equivalent of wider_subreg_mode.  At present
there is only one user.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtl.h (narrower_subreg_mode): New function.
	* ira-color.c (update_costs_from_allocno): Use it.

Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-23 11:44:06.562686090 +0100
+++ gcc/rtl.h	2017-10-23 11:44:15.916785881 +0100
@@ -2972,6 +2972,16 @@ subreg_lowpart_offset (machine_mode oute
 }
 
 /* Given that a subreg has outer mode OUTERMODE and inner mode INNERMODE,
+   return the smaller of the two modes if they are different sizes,
+   otherwise return the outer mode.  */
+
+inline machine_mode
+narrower_subreg_mode (machine_mode outermode, machine_mode innermode)
+{
+  return paradoxical_subreg_p (outermode, innermode) ? innermode : outermode;
+}
+
+/* Given that a subreg has outer mode OUTERMODE and inner mode INNERMODE,
    return the mode that is big enough to hold both the outer and inner
    values.  Prefer the outer mode in the event of a tie.  */
 
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c	2017-10-23 11:44:11.500538024 +0100
+++ gcc/ira-color.c	2017-10-23 11:44:15.915819948 +0100
@@ -1367,15 +1367,14 @@ update_costs_from_allocno (ira_allocno_t
 	      || ALLOCNO_ASSIGNED_P (another_allocno))
 	    continue;
 
-	  if (GET_MODE_SIZE (ALLOCNO_MODE (cp->second)) < GET_MODE_SIZE (mode))
-	    /* If we have different modes use the smallest one.  It is
-	       a sub-register move.  It is hard to predict what LRA
-	       will reload (the pseudo or its sub-register) but LRA
-	       will try to minimize the data movement.  Also for some
-	       register classes bigger modes might be invalid,
-	       e.g. DImode for AREG on x86.  For such cases the
-	       register move cost will be maximal. */
-	    mode = ALLOCNO_MODE (cp->second);
+	  /* If we have different modes use the smallest one.  It is
+	     a sub-register move.  It is hard to predict what LRA
+	     will reload (the pseudo or its sub-register) but LRA
+	     will try to minimize the data movement.  Also for some
+	     register classes bigger modes might be invalid,
+	     e.g. DImode for AREG on x86.  For such cases the
+	     register move cost will be maximal.  */
+	  mode = narrower_subreg_mode (mode, ALLOCNO_MODE (cp->second));
 	  
 	  cost = (cp->second == allocno
 		  ? ira_register_move_cost[mode][rclass][aclass]

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [13/nn] More is_a <scalar_int_mode>
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (10 preceding siblings ...)
  2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
@ 2017-10-23 11:25 ` Richard Sandiford
  2017-10-26 12:03   ` Richard Biener
  2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
                   ` (9 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:25 UTC (permalink / raw)
  To: gcc-patches

alias.c:find_base_term and find_base_value checked:

      if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))

but (a) comparing the precision seems more correct, since it's possible
for modes to have the same memory size as Pmode but fewer bits and
(b) the functions are called on arbitrary rtl, so there's no guarantee
that we're handling an integer truncation.

Since there's no point processing truncations of anything other than an
integer, this patch checks that first.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* alias.c (find_base_value, find_base_term): Only process integer
	truncations.  Check the precision rather than the size.

Index: gcc/alias.c
===================================================================
--- gcc/alias.c	2017-10-23 11:41:25.511925516 +0100
+++ gcc/alias.c	2017-10-23 11:44:27.544693078 +0100
@@ -1349,6 +1349,7 @@ known_base_value_p (rtx x)
 find_base_value (rtx src)
 {
   unsigned int regno;
+  scalar_int_mode int_mode;
 
 #if defined (FIND_BASE_TERM)
   /* Try machine-dependent ways to find the base term.  */
@@ -1475,7 +1476,8 @@ find_base_value (rtx src)
 	 address modes depending on the address space.  */
       if (!target_default_pointer_address_modes_p ())
 	break;
-      if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
+      if (!is_a <scalar_int_mode> (GET_MODE (src), &int_mode)
+	  || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
 	break;
       /* Fall through.  */
     case HIGH:
@@ -1876,6 +1878,7 @@ find_base_term (rtx x)
   cselib_val *val;
   struct elt_loc_list *l, *f;
   rtx ret;
+  scalar_int_mode int_mode;
 
 #if defined (FIND_BASE_TERM)
   /* Try machine-dependent ways to find the base term.  */
@@ -1893,7 +1896,8 @@ find_base_term (rtx x)
 	 address modes depending on the address space.  */
       if (!target_default_pointer_address_modes_p ())
 	return 0;
-      if (GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (Pmode))
+      if (!is_a <scalar_int_mode> (GET_MODE (x), &int_mode)
+	  || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
 	return 0;
       /* Fall through.  */
     case HIGH:

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [12/nn] Add an is_narrower_int_mode helper function
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (11 preceding siblings ...)
  2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
@ 2017-10-23 11:25 ` Richard Sandiford
  2017-10-26 11:59   ` Richard Biener
  2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
                   ` (8 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:25 UTC (permalink / raw)
  To: gcc-patches

This patch adds a function for testing whether an arbitrary mode X
is an integer mode that is narrower than integer mode Y.  This is
useful for code like expand_float and expand_fix that could in
principle handle vectors as well as scalars.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* machmode.h (is_narrower_int_mode): New function
	* optabs.c (expand_float, expand_fix): Use it.
	* dwarf2out.c (rotate_loc_descriptor): Likewise.

Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h	2017-10-23 11:44:06.561720156 +0100
+++ gcc/machmode.h	2017-10-23 11:44:23.979432614 +0100
@@ -893,6 +893,17 @@ is_complex_float_mode (machine_mode mode
   return false;
 }
 
+/* Return true if MODE is a scalar integer mode with a precision
+   smaller than LIMIT's precision.  */
+
+inline bool
+is_narrower_int_mode (machine_mode mode, scalar_int_mode limit)
+{
+  scalar_int_mode int_mode;
+  return (is_a <scalar_int_mode> (mode, &int_mode)
+	  && GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (limit));
+}
+
 namespace mode_iterator
 {
   /* Start mode iterator *ITER at the first mode in class MCLASS, if any.  */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:44:07.732431531 +0100
+++ gcc/optabs.c	2017-10-23 11:44:23.980398548 +0100
@@ -4820,7 +4820,7 @@ expand_float (rtx to, rtx from, int unsi
       rtx value;
       convert_optab tab = unsignedp ? ufloat_optab : sfloat_optab;
 
-      if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION (SImode))
+      if (is_narrower_int_mode (GET_MODE (from), SImode))
 	from = convert_to_mode (SImode, from, unsignedp);
 
       libfunc = convert_optab_libfunc (tab, GET_MODE (to), GET_MODE (from));
@@ -5002,7 +5002,7 @@ expand_fix (rtx to, rtx from, int unsign
      that the mode of TO is at least as wide as SImode, since those are the
      only library calls we know about.  */
 
-  if (GET_MODE_PRECISION (GET_MODE (to)) < GET_MODE_PRECISION (SImode))
+  if (is_narrower_int_mode (GET_MODE (to), SImode))
     {
       target = gen_reg_rtx (SImode);
 
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-10-23 11:44:05.684652559 +0100
+++ gcc/dwarf2out.c	2017-10-23 11:44:23.979432614 +0100
@@ -14530,8 +14530,7 @@ rotate_loc_descriptor (rtx rtl, scalar_i
   dw_loc_descr_ref op0, op1, ret, mask[2] = { NULL, NULL };
   int i;
 
-  if (GET_MODE (rtlop1) != VOIDmode
-      && GET_MODE_BITSIZE (GET_MODE (rtlop1)) < GET_MODE_BITSIZE (mode))
+  if (is_narrower_int_mode (GET_MODE (rtlop1), mode))
     rtlop1 = gen_rtx_ZERO_EXTEND (mode, rtlop1);
   op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
 			    VAR_INIT_STATUS_INITIALIZED);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [14/nn] Add helpers for shift count modes
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (12 preceding siblings ...)
  2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
@ 2017-10-23 11:26 ` Richard Sandiford
  2017-10-26 12:07   ` Richard Biener
  2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
                   ` (7 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:26 UTC (permalink / raw)
  To: gcc-patches

This patch adds a stub helper routine to provide the mode
of a scalar shift amount, given the mode of the values
being shifted.

One long-standing problem has been to decide what this mode
should be for arbitrary rtxes (as opposed to those directly
tied to a target pattern).  Is it the mode of the shifted
elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
the corresponding target pattern says?  (In which case what
should the mode be when the target doesn't have a pattern?)

For now the patch picks word_mode, which should be safe on
all targets but could perhaps become suboptimal if the helper
routine is used more often than it is in this patch.  As it
stands the patch does not change the generated code.

The patch also adds a helper function that constructs rtxes
for constant shift amounts, again given the mode of the value
being shifted.  As well as helping with the SVE patches, this
is one step towards allowing CONST_INTs to have a real mode.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* target.h (get_shift_amount_mode): New function.
	* emit-rtl.h (gen_int_shift_amount): Declare.
	* emit-rtl.c (gen_int_shift_amount): New function.
	* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
	instead of GEN_INT.
	* calls.c (shift_return_value): Likewise.
	* cse.c (fold_rtx): Likewise.
	* dse.c (find_shift_sequence): Likewise.
	* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
	(expand_shift, expand_smod_pow2): Likewise.
	* lower-subreg.c (shift_cost): Likewise.
	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
	(simplify_binary_operation_1): Likewise.
	* combine.c (try_combine, find_split_point, force_int_to_mode)
	(simplify_shift_const_1, simplify_shift_const): Likewise.
	(change_zero_ext): Likewise.  Use simplify_gen_binary.
	* optabs.c (expand_superword_shift, expand_doubleword_mult)
	(expand_unop): Use gen_int_shift_amount instead of GEN_INT.
	(expand_binop): Likewise.  Use get_shift_amount_mode instead
	of word_mode as the mode of a CONST_INT shift amount.
	(shift_amt_for_vec_perm_mask): Add a machine_mode argument.
	Use gen_int_shift_amount instead of GEN_INT.
	(expand_vec_perm): Update caller accordingly.  Use
	gen_int_shift_amount instead of GEN_INT.

Index: gcc/target.h
===================================================================
--- gcc/target.h	2017-10-23 11:47:06.643477568 +0100
+++ gcc/target.h	2017-10-23 11:47:11.277288162 +0100
@@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
 
 extern struct gcc_target targetm;
 
+/* Return the mode that should be used to hold a scalar shift amount
+   when shifting values of the given mode.  */
+/* ??? This could in principle be generated automatically from the .md
+   shift patterns, but for now word_mode should be universally OK.  */
+
+inline scalar_int_mode
+get_shift_amount_mode (machine_mode)
+{
+  return word_mode;
+}
+
 #ifdef GCC_TM_H
 
 #ifndef CUMULATIVE_ARGS_MAGIC
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-10-23 11:47:06.643477568 +0100
+++ gcc/emit-rtl.h	2017-10-23 11:47:11.274393237 +0100
@@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
 extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
 extern void adjust_reg_mode (rtx, machine_mode);
 extern int mem_expr_equal_p (const_tree, const_tree);
+extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
 
 extern bool need_atomic_barrier_p (enum memmodel, bool);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/emit-rtl.c	2017-10-23 11:47:11.273428262 +0100
@@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
     }
 }
 
+/* Return a constant shift amount for shifting a value of mode MODE
+   by VALUE bits.  */
+
+rtx
+gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
+{
+  return gen_int_mode (value, get_shift_amount_mode (mode));
+}
+
 /* Initialize fields of rtl_data related to stack alignment.  */
 
 void
Index: gcc/asan.c
===================================================================
--- gcc/asan.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/asan.c	2017-10-23 11:47:11.270533336 +0100
@@ -1388,7 +1388,7 @@ asan_emit_stack_protection (rtx base, rt
   TREE_ASM_WRITTEN (id) = 1;
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
-			      GEN_INT (ASAN_SHADOW_SHIFT),
+			      gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
 			      NULL_RTX, 1, OPTAB_DIRECT);
   shadow_base
     = plus_constant (Pmode, shadow_base,
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/calls.c	2017-10-23 11:47:11.270533336 +0100
@@ -2749,15 +2749,17 @@ shift_return_value (machine_mode mode, b
   HOST_WIDE_INT shift;
 
   gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
-  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
+  machine_mode value_mode = GET_MODE (value);
+  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
   if (shift == 0)
     return false;
 
   /* Use ashr rather than lshr for right shifts.  This is for the benefit
      of the MIPS port, which requires SImode values to be sign-extended
      when stored in 64-bit registers.  */
-  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
-			   value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
+  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
+			   value, gen_int_shift_amount (value_mode, shift),
+			   value, 1, OPTAB_WIDEN))
     gcc_unreachable ();
   return true;
 }
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-10-23 11:47:03.707058235 +0100
+++ gcc/cse.c	2017-10-23 11:47:11.273428262 +0100
@@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (const_arg1) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
-						& (GET_MODE_UNIT_BITSIZE (mode)
-						   - 1));
+		    canon_const_arg1 = gen_int_shift_amount
+		      (mode, (INTVAL (const_arg1)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (inner_const) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    inner_const = GEN_INT (INTVAL (inner_const)
-					   & (GET_MODE_UNIT_BITSIZE (mode)
-					      - 1));
+		    inner_const = gen_int_shift_amount
+		      (mode, (INTVAL (inner_const)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
 		  /* As an exception, we can turn an ASHIFTRT of this
 		     form into a shift of the number of bits - 1.  */
 		  if (code == ASHIFTRT)
-		    new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
+		    new_const = gen_int_shift_amount
+		      (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
 		  else if (!side_effects_p (XEXP (y, 0)))
 		    return CONST0_RTX (mode);
 		  else
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/dse.c	2017-10-23 11:47:11.273428262 +0100
@@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
 				     store_mode, byte);
 	  if (ret && CONSTANT_P (ret))
 	    {
+	      rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
 	      ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
-						     ret, GEN_INT (shift));
+						     ret, shift_rtx);
 	      if (ret && CONSTANT_P (ret))
 		{
 		  byte = subreg_lowpart_offset (read_mode, new_mode);
@@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
 	 of one dsp where the cost of these two was not the same.  But
 	 this really is a rare case anyway.  */
       target = expand_binop (new_mode, lshr_optab, new_reg,
-			     GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+			     gen_int_shift_amount (new_mode, shift),
+			     new_reg, 1, OPTAB_DIRECT);
 
       shift_seq = get_insns ();
       end_sequence ();
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/expmed.c	2017-10-23 11:47:11.274393237 +0100
@@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
 	  PUT_MODE (all->zext, wider_mode);
 	  PUT_MODE (all->wide_mult, wider_mode);
 	  PUT_MODE (all->wide_lshr, wider_mode);
-	  XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
+	  XEXP (all->wide_lshr, 1)
+	    = gen_int_shift_amount (wider_mode, mode_bitsize);
 
 	  set_mul_widen_cost (speed, wider_mode,
 			      set_src_cost (all->wide_mult, wider_mode, speed));
@@ -908,12 +909,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     to make sure that for big-endian machines the higher order
 	     bits are used.  */
 	  if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
-	    value_word = simplify_expand_binop (word_mode, lshr_optab,
-						value_word,
-						GEN_INT (BITS_PER_WORD
-							 - new_bitsize),
-						NULL_RTX, true,
-						OPTAB_LIB_WIDEN);
+	    {
+	      int shift = BITS_PER_WORD - new_bitsize;
+	      rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
+	      value_word = simplify_expand_binop (word_mode, lshr_optab,
+						  value_word, shift_rtx,
+						  NULL_RTX, true,
+						  OPTAB_LIB_WIDEN);
+	    }
 
 	  if (!store_bit_field_1 (op0, new_bitsize,
 				  bitnum + bit_offset,
@@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
       if (CONST_INT_P (op1)
 	  && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
 	      (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
-	op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
-		       % GET_MODE_BITSIZE (scalar_mode));
+	op1 = gen_int_shift_amount (mode,
+				    (unsigned HOST_WIDE_INT) INTVAL (op1)
+				    % GET_MODE_BITSIZE (scalar_mode));
       else if (GET_CODE (op1) == SUBREG
 	       && subreg_lowpart_p (op1)
 	       && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
@@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
       && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
 		   GET_MODE_BITSIZE (scalar_mode) - 1))
     {
-      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
+      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
+					 - INTVAL (op1)));
       left = !left;
       code = left ? LROTATE_EXPR : RROTATE_EXPR;
     }
@@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
 	      if (op1 == const0_rtx)
 		return shifted;
 	      else if (CONST_INT_P (op1))
-		other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
-					- INTVAL (op1));
+		other_amount = gen_int_shift_amount
+		  (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
 	      else
 		{
 		  other_amount
@@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 	      int amount, rtx target, int unsignedp)
 {
-  return expand_shift_1 (code, mode,
-			 shifted, GEN_INT (amount), target, unsignedp);
+  return expand_shift_1 (code, mode, shifted,
+			 gen_int_shift_amount (mode, amount),
+			 target, unsignedp);
 }
 
 /* Likewise, but return 0 if that cannot be done.  */
@@ -3855,7 +3861,7 @@ expand_smod_pow2 (scalar_int_mode mode,
 	{
 	  HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
 	  signmask = force_reg (mode, signmask);
-	  shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
+	  shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
 
 	  /* Use the rtx_cost of a LSHIFTRT instruction to determine
 	     which instruction sequence to use.  If logical right shifts
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/lower-subreg.c	2017-10-23 11:47:11.274393237 +0100
@@ -129,7 +129,7 @@ shift_cost (bool speed_p, struct cost_rt
   PUT_CODE (rtxes->shift, code);
   PUT_MODE (rtxes->shift, mode);
   PUT_MODE (rtxes->source, mode);
-  XEXP (rtxes->shift, 1) = GEN_INT (op1);
+  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
   return set_src_cost (rtxes->shift, mode, speed_p);
 }
 
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/simplify-rtx.c	2017-10-23 11:47:11.277288162 +0100
@@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  if (STORE_FLAG_VALUE == 1)
 	    {
 	      temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  else if (STORE_FLAG_VALUE == -1)
 	    {
 	      temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -2679,7 +2681,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
 	  if (val >= 0)
-	    return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (ASHIFT, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
 
       /* x*2 is x+x and x*(-1) is -x */
@@ -3303,7 +3306,8 @@ simplify_binary_operation_1 (enum rtx_co
       /* Convert divide by power of two into shift.  */
       if (CONST_INT_P (trueop1)
 	  && (val = exact_log2 (UINTVAL (trueop1))) > 0)
-	return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
+	return simplify_gen_binary (LSHIFTRT, mode, op0,
+				    gen_int_shift_amount (mode, val));
       break;
 
     case DIV:
@@ -3423,10 +3427,12 @@ simplify_binary_operation_1 (enum rtx_co
 	  && IN_RANGE (INTVAL (trueop1),
 		       GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
 		       GET_MODE_UNIT_PRECISION (mode) - 1))
-	return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
-				    mode, op0,
-				    GEN_INT (GET_MODE_UNIT_PRECISION (mode)
-					     - INTVAL (trueop1)));
+	{
+	  int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
+	  rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
+	  return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
+				      mode, op0, new_amount_rtx);
+	}
 #endif
       /* FALLTHRU */
     case ASHIFTRT:
@@ -3466,8 +3472,8 @@ simplify_binary_operation_1 (enum rtx_co
 	      == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
 	  && subreg_lowpart_p (op0))
 	{
-	  rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
-			     + INTVAL (op1));
+	  rtx tmp = gen_int_shift_amount
+	    (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
 	  tmp = simplify_gen_binary (code, inner_mode,
 				     XEXP (SUBREG_REG (op0), 0),
 				     tmp);
@@ -3478,7 +3484,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
 	  if (val != INTVAL (op1))
-	    return simplify_gen_binary (code, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (code, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
       break;
 
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/combine.c	2017-10-23 11:47:11.272463287 +0100
@@ -3773,8 +3773,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && INTVAL (XEXP (*split, 1)) > 0
 	      && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
 	    {
+	      rtx i_rtx = gen_int_shift_amount (split_mode, i);
 	      SUBST (*split, gen_rtx_ASHIFT (split_mode,
-					     XEXP (*split, 0), GEN_INT (i)));
+					     XEXP (*split, 0), i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -3788,8 +3789,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
 	    {
 	      rtx nsplit = XEXP (*split, 0);
+	      rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
 	      SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
-					     XEXP (nsplit, 0), GEN_INT (i)));
+						       XEXP (nsplit, 0),
+						       i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -5057,12 +5060,12 @@ find_split_point (rtx *loc, rtx_insn *in
 				      GET_MODE (XEXP (SET_SRC (x), 0))))))
 	    {
 	      machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
-
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_NEG (mode,
 				  gen_rtx_LSHIFTRT (mode,
 						    XEXP (SET_SRC (x), 0),
-						    GEN_INT (pos))));
+						    pos_rtx)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -5120,11 +5123,11 @@ find_split_point (rtx *loc, rtx_insn *in
 	    {
 	      unsigned HOST_WIDE_INT mask
 		= (HOST_WIDE_INT_1U << len) - 1;
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_AND (mode,
 				  gen_rtx_LSHIFTRT
-				  (mode, gen_lowpart (mode, inner),
-				   GEN_INT (pos)),
+				  (mode, gen_lowpart (mode, inner), pos_rtx),
 				  gen_int_mode (mask, mode)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
@@ -5133,14 +5136,15 @@ find_split_point (rtx *loc, rtx_insn *in
 	    }
 	  else
 	    {
+	      int left_bits = GET_MODE_PRECISION (mode) - len - pos;
+	      int right_bits = GET_MODE_PRECISION (mode) - len;
 	      SUBST (SET_SRC (x),
 		     gen_rtx_fmt_ee
 		     (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
 		      gen_rtx_ASHIFT (mode,
 				      gen_lowpart (mode, inner),
-				      GEN_INT (GET_MODE_PRECISION (mode)
-					       - len - pos)),
-		      GEN_INT (GET_MODE_PRECISION (mode) - len)));
+				      gen_int_shift_amount (mode, left_bits)),
+		      gen_int_shift_amount (mode, right_bits)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -8915,10 +8919,11 @@ force_int_to_mode (rtx x, scalar_int_mod
 	  /* Must be more sign bit copies than the mask needs.  */
 	  && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
 	      >= exact_log2 (mask + 1)))
-	x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
-				 GEN_INT (GET_MODE_PRECISION (xmode)
-					  - exact_log2 (mask + 1)));
-
+	{
+	  int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
+	  x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
+				   gen_int_shift_amount (xmode, nbits));
+	}
       goto shiftrt;
 
     case ASHIFTRT:
@@ -10415,7 +10420,7 @@ simplify_shift_const_1 (enum rtx_code co
 {
   enum rtx_code orig_code = code;
   rtx orig_varop = varop;
-  int count;
+  int count, log2;
   machine_mode mode = result_mode;
   machine_mode shift_mode;
   scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
@@ -10618,13 +10623,11 @@ simplify_shift_const_1 (enum rtx_code co
 	     is cheaper.  But it is still better on those machines to
 	     merge two shifts into one.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (ASHIFT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10632,13 +10635,11 @@ simplify_shift_const_1 (enum rtx_code co
 	case UDIV:
 	  /* Similar, for when divides are cheaper.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10773,10 +10774,10 @@ simplify_shift_const_1 (enum rtx_code co
 
 	      mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
 				       int_result_mode);
-
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      mask_rtx
 		= simplify_const_binary_operation (code, int_result_mode,
-						   mask_rtx, GEN_INT (count));
+						   mask_rtx, count_rtx);
 
 	      /* Give up if we can't compute an outer operation to use.  */
 	      if (mask_rtx == 0
@@ -10832,9 +10833,10 @@ simplify_shift_const_1 (enum rtx_code co
 	      if (code == ASHIFTRT && int_mode != int_result_mode)
 		break;
 
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      rtx new_rtx = simplify_const_binary_operation (code, int_mode,
 							     XEXP (varop, 0),
-							     GEN_INT (count));
+							     count_rtx);
 	      varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
 	      count = 0;
 	      continue;
@@ -10900,7 +10902,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
 				  INTVAL (new_rtx), int_result_mode,
@@ -11043,7 +11045,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (ASHIFT, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, PLUS,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11064,7 +11066,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, XOR,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11119,12 +11121,12 @@ simplify_shift_const_1 (enum rtx_code co
 		      - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
 	    {
 	      rtx varop_inner = XEXP (varop, 0);
-
-	      varop_inner
-		= gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
-				    XEXP (varop_inner, 0),
-				    GEN_INT
-				    (count + INTVAL (XEXP (varop_inner, 1))));
+	      int new_count = count + INTVAL (XEXP (varop_inner, 1));
+	      rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
+							new_count);
+	      varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
+					      XEXP (varop_inner, 0),
+					      new_count_rtx);
 	      varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
 	      count = 0;
 	      continue;
@@ -11176,7 +11178,8 @@ simplify_shift_const_1 (enum rtx_code co
     x = NULL_RTX;
 
   if (x == NULL_RTX)
-    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
+    x = simplify_gen_binary (code, shift_mode, varop,
+			     gen_int_shift_amount (shift_mode, count));
 
   /* If we were doing an LSHIFTRT in a wider mode than it was originally,
      turn off all the bits that the shift would have turned off.  */
@@ -11238,7 +11241,8 @@ simplify_shift_const (rtx x, enum rtx_co
     return tem;
 
   if (!x)
-    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
+    x = simplify_gen_binary (code, GET_MODE (varop), varop,
+			     gen_int_shift_amount (GET_MODE (varop), count));
   if (GET_MODE (x) != result_mode)
     x = gen_lowpart (result_mode, x);
   return x;
@@ -11429,8 +11433,9 @@ change_zero_ext (rtx pat)
 	  if (BITS_BIG_ENDIAN)
 	    start = GET_MODE_PRECISION (inner_mode) - size - start;
 
-	  if (start)
-	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
+	  if (start != 0)
+	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
+				  gen_int_shift_amount (inner_mode, start));
 	  else
 	    x = XEXP (x, 0);
 	  if (mode != inner_mode)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-10-23 11:47:06.643477568 +0100
+++ gcc/optabs.c	2017-10-23 11:47:11.276323187 +0100
@@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
       if (binoptab != ashr_optab)
 	emit_move_insn (outof_target, CONST0_RTX (word_mode));
       else
-	if (!force_expand_binop (word_mode, binoptab,
-				 outof_input, GEN_INT (BITS_PER_WORD - 1),
+	if (!force_expand_binop (word_mode, binoptab, outof_input,
+				 gen_int_shift_amount (word_mode,
+						       BITS_PER_WORD - 1),
 				 outof_target, unsignedp, methods))
 	  return false;
     }
@@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
 {
   int low = (WORDS_BIG_ENDIAN ? 1 : 0);
   int high = (WORDS_BIG_ENDIAN ? 0 : 1);
-  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
+  rtx wordm1 = (umulp ? NULL_RTX
+		: gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
   rtx product, adjust, product_high, temp;
 
   rtx op0_high = operand_subword_force (op0, high, mode);
@@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
       unsigned int bits = GET_MODE_PRECISION (int_mode);
 
       if (CONST_INT_P (op1))
-        newop1 = GEN_INT (bits - INTVAL (op1));
+	newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
       else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
         newop1 = negate_rtx (GET_MODE (op1), op1);
       else
@@ -1399,11 +1401,11 @@ expand_binop (machine_mode mode, optab b
       shift_mask = targetm.shift_truncation_mask (word_mode);
       op1_mode = (GET_MODE (op1) != VOIDmode
 		  ? as_a <scalar_int_mode> (GET_MODE (op1))
-		  : word_mode);
+		  : get_shift_amount_mode (word_mode));
 
       /* Apply the truncation to constant shifts.  */
       if (double_shift_mask > 0 && CONST_INT_P (op1))
-	op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
+	op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
 
       if (op1 == CONST0_RTX (op1_mode))
 	return op0;
@@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
       else
 	{
 	  rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
-	  rtx first_shift_count, second_shift_count;
+	  HOST_WIDE_INT first_shift_count, second_shift_count;
 	  optab reverse_unsigned_shift, unsigned_shift;
 
 	  reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
@@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
 
 	  if (shift_count > BITS_PER_WORD)
 	    {
-	      first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
-	      second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
+	      first_shift_count = shift_count - BITS_PER_WORD;
+	      second_shift_count = 2 * BITS_PER_WORD - shift_count;
 	    }
 	  else
 	    {
-	      first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
-	      second_shift_count = GEN_INT (shift_count);
+	      first_shift_count = BITS_PER_WORD - shift_count;
+	      second_shift_count = shift_count;
 	    }
+	  rtx first_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, first_shift_count);
+	  rtx second_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, second_shift_count);
 
 	  into_temp1 = expand_binop (word_mode, unsigned_shift,
-				     outof_input, first_shift_count,
+				     outof_input, first_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 	  into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				     into_input, second_shift_count,
+				     into_input, second_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 
 	  if (into_temp1 != 0 && into_temp2 != 0)
@@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
 	    emit_move_insn (into_target, inter);
 
 	  outof_temp1 = expand_binop (word_mode, unsigned_shift,
-				      into_input, first_shift_count,
+				      into_input, first_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 	  outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				      outof_input, second_shift_count,
+				      outof_input, second_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 
 	  if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
@@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
 
 	  if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotl_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	     }
 
 	  if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotr_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	    }
 
 	  last = get_last_insn ();
 
-	  temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp1 = expand_binop (mode, ashl_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
-	  temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp2 = expand_binop (mode, lshr_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
 	  if (temp1 && temp2)
 	    {
@@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
 }
 
 /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
-   vec_perm operand, assuming the second operand is a constant vector of zeroes.
-   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
-   shift.  */
+   vec_perm operand (which has mode OP0_MODE), assuming the second
+   operand is a constant vector of zeroes.  Return the shift distance in
+   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
 static rtx
-shift_amt_for_vec_perm_mask (rtx sel)
+shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
 {
   unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
   unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
@@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
 	return NULL_RTX;
     }
 
-  return GEN_INT (first * bitsize);
+  return gen_int_shift_amount (op0_mode, first * bitsize);
 }
 
 /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
@@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
 	  && (shift_code != CODE_FOR_nothing
 	      || shift_code_qi != CODE_FOR_nothing))
 	{
-	  shift_amt = shift_amt_for_vec_perm_mask (sel);
+	  shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
 	  if (shift_amt)
 	    {
 	      struct expand_operand ops[3];
@@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
 				   NULL, 0, OPTAB_DIRECT);
       else
 	sel = expand_simple_binop (selmode, ASHIFT, sel,
-				   GEN_INT (exact_log2 (u)),
+				   gen_int_shift_amount (selmode,
+							 exact_log2 (u)),
 				   NULL, 0, OPTAB_DIRECT);
       gcc_assert (sel != NULL);
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [16/nn] Factor out the mode handling in lower-subreg.c
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (13 preceding siblings ...)
  2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
@ 2017-10-23 11:27 ` Richard Sandiford
  2017-10-26 12:09   ` Richard Biener
  2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
                   ` (6 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:27 UTC (permalink / raw)
  To: gcc-patches

This patch adds a helper routine (interesting_mode_p) to lower-subreg.c,
to make the decision about whether a mode can be split and, if so,
calculate the number of bytes and words in the mode.  At present this
function always returns true; a later patch will add cases in which it
can return false.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* lower-subreg.c (interesting_mode_p): New function.
	(compute_costs, find_decomposable_subregs, decompose_register)
	(simplify_subreg_concatn, can_decompose_p, resolve_simple_move)
	(resolve_clobber, dump_choices): Use it.

Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-10-23 11:47:11.274393237 +0100
+++ gcc/lower-subreg.c	2017-10-23 11:47:23.555013148 +0100
@@ -103,6 +103,18 @@ #define twice_word_mode \
 #define choices \
   this_target_lower_subreg->x_choices
 
+/* Return true if MODE is a mode we know how to lower.  When returning true,
+   store its byte size in *BYTES and its word size in *WORDS.  */
+
+static inline bool
+interesting_mode_p (machine_mode mode, unsigned int *bytes,
+		    unsigned int *words)
+{
+  *bytes = GET_MODE_SIZE (mode);
+  *words = CEIL (*bytes, UNITS_PER_WORD);
+  return true;
+}
+
 /* RTXes used while computing costs.  */
 struct cost_rtxes {
   /* Source and target registers.  */
@@ -199,10 +211,10 @@ compute_costs (bool speed_p, struct cost
   for (i = 0; i < MAX_MACHINE_MODE; i++)
     {
       machine_mode mode = (machine_mode) i;
-      int factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
-      if (factor > 1)
+      unsigned int size, factor;
+      if (interesting_mode_p (mode, &size, &factor) && factor > 1)
 	{
-	  int mode_move_cost;
+	  unsigned int mode_move_cost;
 
 	  PUT_MODE (rtxes->target, mode);
 	  PUT_MODE (rtxes->source, mode);
@@ -469,10 +481,10 @@ find_decomposable_subregs (rtx *loc, enu
 	      continue;
 	    }
 
-	  outer_size = GET_MODE_SIZE (GET_MODE (x));
-	  inner_size = GET_MODE_SIZE (GET_MODE (inner));
-	  outer_words = (outer_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
-	  inner_words = (inner_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+	  if (!interesting_mode_p (GET_MODE (x), &outer_size, &outer_words)
+	      || !interesting_mode_p (GET_MODE (inner), &inner_size,
+				      &inner_words))
+	    continue;
 
 	  /* We only try to decompose single word subregs of multi-word
 	     registers.  When we find one, we return -1 to avoid iterating
@@ -507,7 +519,7 @@ find_decomposable_subregs (rtx *loc, enu
 	}
       else if (REG_P (x))
 	{
-	  unsigned int regno;
+	  unsigned int regno, size, words;
 
 	  /* We will see an outer SUBREG before we see the inner REG, so
 	     when we see a plain REG here it means a direct reference to
@@ -527,7 +539,8 @@ find_decomposable_subregs (rtx *loc, enu
 
 	  regno = REGNO (x);
 	  if (!HARD_REGISTER_NUM_P (regno)
-	      && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
+	      && interesting_mode_p (GET_MODE (x), &size, &words)
+	      && words > 1)
 	    {
 	      switch (*pcmi)
 		{
@@ -567,15 +580,15 @@ find_decomposable_subregs (rtx *loc, enu
 decompose_register (unsigned int regno)
 {
   rtx reg;
-  unsigned int words, i;
+  unsigned int size, words, i;
   rtvec v;
 
   reg = regno_reg_rtx[regno];
 
   regno_reg_rtx[regno] = NULL_RTX;
 
-  words = GET_MODE_SIZE (GET_MODE (reg));
-  words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+  if (!interesting_mode_p (GET_MODE (reg), &size, &words))
+    gcc_unreachable ();
 
   v = rtvec_alloc (words);
   for (i = 0; i < words; ++i)
@@ -599,25 +612,29 @@ decompose_register (unsigned int regno)
 simplify_subreg_concatn (machine_mode outermode, rtx op,
 			 unsigned int byte)
 {
-  unsigned int inner_size;
+  unsigned int outer_size, outer_words, inner_size, inner_words;
   machine_mode innermode, partmode;
   rtx part;
   unsigned int final_offset;
 
+  innermode = GET_MODE (op);
+  if (!interesting_mode_p (outermode, &outer_size, &outer_words)
+      || !interesting_mode_p (innermode, &inner_size, &inner_words))
+    gcc_unreachable ();
+
   gcc_assert (GET_CODE (op) == CONCATN);
-  gcc_assert (byte % GET_MODE_SIZE (outermode) == 0);
+  gcc_assert (byte % outer_size == 0);
 
-  innermode = GET_MODE (op);
-  gcc_assert (byte < GET_MODE_SIZE (innermode));
-  if (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))
+  gcc_assert (byte < inner_size);
+  if (outer_size > inner_size)
     return NULL_RTX;
 
-  inner_size = GET_MODE_SIZE (innermode) / XVECLEN (op, 0);
+  inner_size /= XVECLEN (op, 0);
   part = XVECEXP (op, 0, byte / inner_size);
   partmode = GET_MODE (part);
 
   final_offset = byte % inner_size;
-  if (final_offset + GET_MODE_SIZE (outermode) > inner_size)
+  if (final_offset + outer_size > inner_size)
     return NULL_RTX;
 
   /* VECTOR_CSTs in debug expressions are expanded into CONCATN instead of
@@ -801,9 +818,10 @@ can_decompose_p (rtx x)
 
       if (HARD_REGISTER_NUM_P (regno))
 	{
-	  unsigned int byte, num_bytes;
+	  unsigned int byte, num_bytes, num_words;
 
-	  num_bytes = GET_MODE_SIZE (GET_MODE (x));
+	  if (!interesting_mode_p (GET_MODE (x), &num_bytes, &num_words))
+	    return false;
 	  for (byte = 0; byte < num_bytes; byte += UNITS_PER_WORD)
 	    if (simplify_subreg_regno (regno, GET_MODE (x), byte, word_mode) < 0)
 	      return false;
@@ -826,14 +844,15 @@ resolve_simple_move (rtx set, rtx_insn *
   rtx src, dest, real_dest;
   rtx_insn *insns;
   machine_mode orig_mode, dest_mode;
-  unsigned int words;
+  unsigned int orig_size, words;
   bool pushing;
 
   src = SET_SRC (set);
   dest = SET_DEST (set);
   orig_mode = GET_MODE (dest);
 
-  words = (GET_MODE_SIZE (orig_mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+  if (!interesting_mode_p (orig_mode, &orig_size, &words))
+    gcc_unreachable ();
   gcc_assert (words > 1);
 
   start_sequence ();
@@ -964,7 +983,7 @@ resolve_simple_move (rtx set, rtx_insn *
     {
       unsigned int i, j, jinc;
 
-      gcc_assert (GET_MODE_SIZE (orig_mode) % UNITS_PER_WORD == 0);
+      gcc_assert (orig_size % UNITS_PER_WORD == 0);
       gcc_assert (GET_CODE (XEXP (dest, 0)) != PRE_MODIFY);
       gcc_assert (GET_CODE (XEXP (dest, 0)) != POST_MODIFY);
 
@@ -1059,7 +1078,7 @@ resolve_clobber (rtx pat, rtx_insn *insn
 {
   rtx reg;
   machine_mode orig_mode;
-  unsigned int words, i;
+  unsigned int orig_size, words, i;
   int ret;
 
   reg = XEXP (pat, 0);
@@ -1067,8 +1086,8 @@ resolve_clobber (rtx pat, rtx_insn *insn
     return false;
 
   orig_mode = GET_MODE (reg);
-  words = GET_MODE_SIZE (orig_mode);
-  words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+  if (!interesting_mode_p (orig_mode, &orig_size, &words))
+    gcc_unreachable ();
 
   ret = validate_change (NULL_RTX, &XEXP (pat, 0),
 			 simplify_gen_subreg_concatn (word_mode, reg,
@@ -1332,12 +1351,13 @@ dump_shift_choices (enum rtx_code code,
 static void
 dump_choices (bool speed_p, const char *description)
 {
-  unsigned int i;
+  unsigned int size, factor, i;
 
   fprintf (dump_file, "Choices when optimizing for %s:\n", description);
 
   for (i = 0; i < MAX_MACHINE_MODE; i++)
-    if (GET_MODE_SIZE ((machine_mode) i) > UNITS_PER_WORD)
+    if (interesting_mode_p ((machine_mode) i, &size, &factor)
+	&& factor > 1)
       fprintf (dump_file, "  %s mode %s for copy lowering.\n",
 	       choices[speed_p].move_modes_to_split[i]
 	       ? "Splitting"

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [15/nn] Use more specific hash functions in rtlhash.c
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (14 preceding siblings ...)
  2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
@ 2017-10-23 11:27 ` Richard Sandiford
  2017-10-26 12:08   ` Richard Biener
  2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
                   ` (5 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:27 UTC (permalink / raw)
  To: gcc-patches

Avoid using add_object when we have more specific routines available.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* rtlhash.c (add_rtx): Use add_hwi for 'w' and add_int for 'i'.

Index: gcc/rtlhash.c
===================================================================
--- gcc/rtlhash.c	2017-02-23 19:54:03.000000000 +0000
+++ gcc/rtlhash.c	2017-10-23 11:47:20.120201389 +0100
@@ -77,11 +77,11 @@ add_rtx (const_rtx x, hash &hstate)
     switch (fmt[i])
       {
       case 'w':
-	hstate.add_object (XWINT (x, i));
+	hstate.add_hwi (XWINT (x, i));
 	break;
       case 'n':
       case 'i':
-	hstate.add_object (XINT (x, i));
+	hstate.add_int (XINT (x, i));
 	break;
       case 'V':
       case 'E':

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (15 preceding siblings ...)
  2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
@ 2017-10-23 11:28 ` Richard Sandiford
  2017-10-26 12:10   ` Richard Biener
  2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
                   ` (4 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:28 UTC (permalink / raw)
  To: gcc-patches

This avoids the double evaluation mentioned in the comments and
simplifies the change to make MEM_OFFSET variable.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* var-tracking.c (INT_MEM_OFFSET): Replace with...
	(int_mem_offset): ...this new function.
	(var_mem_set, var_mem_delete_and_set, var_mem_delete)
	(find_mem_expr_in_1pdv, dataflow_set_preserve_mem_locs)
	(same_variable_part_p, use_type, add_stores, vt_get_decl_and_offset):
	Update accordingly.

Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c	2017-09-12 14:28:56.401824826 +0100
+++ gcc/var-tracking.c	2017-10-23 11:47:27.197231712 +0100
@@ -390,8 +390,15 @@ struct variable
 /* Pointer to the BB's information specific to variable tracking pass.  */
 #define VTI(BB) ((variable_tracking_info *) (BB)->aux)
 
-/* Macro to access MEM_OFFSET as an HOST_WIDE_INT.  Evaluates MEM twice.  */
-#define INT_MEM_OFFSET(mem) (MEM_OFFSET_KNOWN_P (mem) ? MEM_OFFSET (mem) : 0)
+/* Return MEM_OFFSET (MEM) as a HOST_WIDE_INT, or 0 if we can't.  */
+
+static inline HOST_WIDE_INT
+int_mem_offset (const_rtx mem)
+{
+  if (MEM_OFFSET_KNOWN_P (mem))
+    return MEM_OFFSET (mem);
+  return 0;
+}
 
 #if CHECKING_P && (GCC_VERSION >= 2007)
 
@@ -2336,7 +2343,7 @@ var_mem_set (dataflow_set *set, rtx loc,
 	     rtx set_src)
 {
   tree decl = MEM_EXPR (loc);
-  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
+  HOST_WIDE_INT offset = int_mem_offset (loc);
 
   var_mem_decl_set (set, loc, initialized,
 		    dv_from_decl (decl), offset, set_src, INSERT);
@@ -2354,7 +2361,7 @@ var_mem_delete_and_set (dataflow_set *se
 			enum var_init_status initialized, rtx set_src)
 {
   tree decl = MEM_EXPR (loc);
-  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
+  HOST_WIDE_INT offset = int_mem_offset (loc);
 
   clobber_overlapping_mems (set, loc);
   decl = var_debug_decl (decl);
@@ -2375,7 +2382,7 @@ var_mem_delete_and_set (dataflow_set *se
 var_mem_delete (dataflow_set *set, rtx loc, bool clobber)
 {
   tree decl = MEM_EXPR (loc);
-  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
+  HOST_WIDE_INT offset = int_mem_offset (loc);
 
   clobber_overlapping_mems (set, loc);
   decl = var_debug_decl (decl);
@@ -4618,7 +4625,7 @@ find_mem_expr_in_1pdv (tree expr, rtx va
   for (node = var->var_part[0].loc_chain; node; node = node->next)
     if (MEM_P (node->loc)
 	&& MEM_EXPR (node->loc) == expr
-	&& INT_MEM_OFFSET (node->loc) == 0)
+	&& int_mem_offset (node->loc) == 0)
       {
 	where = node;
 	break;
@@ -4683,7 +4690,7 @@ dataflow_set_preserve_mem_locs (variable
 	      /* We want to remove dying MEMs that don't refer to DECL.  */
 	      if (GET_CODE (loc->loc) == MEM
 		  && (MEM_EXPR (loc->loc) != decl
-		      || INT_MEM_OFFSET (loc->loc) != 0)
+		      || int_mem_offset (loc->loc) != 0)
 		  && mem_dies_at_call (loc->loc))
 		break;
 	      /* We want to move here MEMs that do refer to DECL.  */
@@ -4727,7 +4734,7 @@ dataflow_set_preserve_mem_locs (variable
 
 	  if (GET_CODE (loc->loc) != MEM
 	      || (MEM_EXPR (loc->loc) == decl
-		  && INT_MEM_OFFSET (loc->loc) == 0)
+		  && int_mem_offset (loc->loc) == 0)
 	      || !mem_dies_at_call (loc->loc))
 	    {
 	      if (old_loc != loc->loc && emit_notes)
@@ -5254,7 +5261,7 @@ same_variable_part_p (rtx loc, tree expr
   else if (MEM_P (loc))
     {
       expr2 = MEM_EXPR (loc);
-      offset2 = INT_MEM_OFFSET (loc);
+      offset2 = int_mem_offset (loc);
     }
   else
     return false;
@@ -5522,7 +5529,7 @@ use_type (rtx loc, struct count_use_info
 	return MO_CLOBBER;
       else if (target_for_debug_bind (var_debug_decl (expr)))
 	return MO_CLOBBER;
-      else if (track_loc_p (loc, expr, INT_MEM_OFFSET (loc),
+      else if (track_loc_p (loc, expr, int_mem_offset (loc),
 			    false, modep, NULL)
 	       /* Multi-part variables shouldn't refer to one-part
 		  variable names such as VALUEs (never happens) or
@@ -6017,7 +6024,7 @@ add_stores (rtx loc, const_rtx expr, voi
 	      rtx xexpr = gen_rtx_SET (loc, src);
 	      if (same_variable_part_p (SET_SRC (xexpr),
 					MEM_EXPR (loc),
-					INT_MEM_OFFSET (loc)))
+					int_mem_offset (loc)))
 		mo.type = MO_COPY;
 	      else
 		mo.type = MO_SET;
@@ -9579,7 +9586,7 @@ vt_get_decl_and_offset (rtx rtl, tree *d
       if (MEM_ATTRS (rtl))
 	{
 	  *declp = MEM_EXPR (rtl);
-	  *offsetp = INT_MEM_OFFSET (rtl);
+	  *offsetp = int_mem_offset (rtl);
 	  return true;
 	}
     }

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [19/nn] Don't treat zero-sized ranges as overlapping
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (16 preceding siblings ...)
  2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
@ 2017-10-23 11:29 ` Richard Sandiford
  2017-10-26 12:14   ` Richard Biener
  2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
                   ` (3 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:29 UTC (permalink / raw)
  To: gcc-patches

Most GCC ranges seem to be represented as an offset and a size (rather
than a start and inclusive end or start and exclusive end).  The usual
test for whether X is in a range is of course:

  x >= start && x < start + size
or:
  x >= start && x - start < size

which means that an empty range of size 0 contains nothing.  But other
range tests aren't as obvious.

The usual test for whether one range is contained within another
range is:

  start1 >= start2 && start1 + size1 <= start2 + size2

while the test for whether two ranges overlap (from ranges_overlap_p) is:

     (start1 >= start2 && start1 < start2 + size2)
  || (start2 >= start1 && start2 < start1 + size1)

i.e. the ranges overlap if one range contains the start of the other
range.  This leads to strange results like:

  (start X, size 0) is a subrange of (start X, size 0) but
  (start X, size 0) does not overlap (start X, size 0)

Similarly:

  (start 4, size 0) is a subrange of (start 2, size 2) but
  (start 4, size 0) does not overlap (start 2, size 2)

It seems like "X is a subrange of Y" should imply "X overlaps Y".

This becomes harder to ignore with the runtime sizes and offsets
added for SVE.  The most obvious fix seemed to be to say that
an empty range does not overlap anything, and is therefore not
a subrange of anything.

Using the new definition of subranges didn't seem to cause any
codegen differences in the testsuite.  But there was one change
with the new definition of overlapping ranges.  strncpy-chk.c has:

  memset (dst, 0, sizeof (dst));
  if (strncpy (dst, src, 0) != dst || strcmp (dst, ""))
    abort();

The strncpy is detected as a zero-size write, and so with the new
definition of overlapping ranges, we treat the strncpy as having
no effect on the strcmp (which is true).  The reaching definition
is the memset instead.

This patch makes ranges_overlap_p return false for zero-sized
ranges, even if the other range has an unknown size.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-ssa-alias.h (ranges_overlap_p): Return false if either
	range is known to be empty.

Index: gcc/tree-ssa-alias.h
===================================================================
--- gcc/tree-ssa-alias.h	2017-03-28 16:19:22.000000000 +0100
+++ gcc/tree-ssa-alias.h	2017-10-23 11:47:38.181155696 +0100
@@ -171,6 +171,8 @@ ranges_overlap_p (HOST_WIDE_INT pos1,
 		  HOST_WIDE_INT pos2,
 		  unsigned HOST_WIDE_INT size2)
 {
+  if (size1 == 0 || size2 == 0)
+    return false;
   if (pos1 >= pos2
       && (size2 == (unsigned HOST_WIDE_INT)-1
 	  || pos1 < (pos2 + (HOST_WIDE_INT) size2)))

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (17 preceding siblings ...)
  2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
@ 2017-10-23 11:29 ` Richard Sandiford
  2017-10-26 12:13   ` Richard Biener
  2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
                   ` (2 subsequent siblings)
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:29 UTC (permalink / raw)
  To: gcc-patches

This patch avoids some calculations of the form:

  GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)

in simplify-rtx.c.  If we're dealing with CONST_VECTORs, it's better
to use CONST_VECTOR_NUNITS, since that remains constant even after the
SVE patches.  In other cases we can get the number from GET_MODE_NUNITS.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
	and CONST_VECTOR_NUNITS instead of computing the number of units from
	the byte sizes of the vector and element.
	(simplify_binary_operation_1): Likewise.
	(simplify_const_binary_operation): Likewise.
	(simplify_ternary_operation): Likewise.

Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-10-23 11:47:11.277288162 +0100
+++ gcc/simplify-rtx.c	2017-10-23 11:47:32.868935554 +0100
@@ -1752,18 +1752,12 @@ simplify_const_unary_operation (enum rtx
 	return gen_const_vec_duplicate (mode, op);
       if (GET_CODE (op) == CONST_VECTOR)
 	{
-	  int elt_size = GET_MODE_UNIT_SIZE (mode);
-          unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
-	  rtvec v = rtvec_alloc (n_elts);
-	  unsigned int i;
-
-	  machine_mode inmode = GET_MODE (op);
-	  int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
-	  unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
-
+	  unsigned int n_elts = GET_MODE_NUNITS (mode);
+	  unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
 	  gcc_assert (in_n_elts < n_elts);
 	  gcc_assert ((n_elts % in_n_elts) == 0);
-	  for (i = 0; i < n_elts; i++)
+	  rtvec v = rtvec_alloc (n_elts);
+	  for (unsigned i = 0; i < n_elts; i++)
 	    RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
 	  return gen_rtx_CONST_VECTOR (mode, v);
 	}
@@ -3608,9 +3602,7 @@ simplify_binary_operation_1 (enum rtx_co
 	      rtx op0 = XEXP (trueop0, 0);
 	      rtx op1 = XEXP (trueop0, 1);
 
-	      machine_mode opmode = GET_MODE (op0);
-	      int elt_size = GET_MODE_UNIT_SIZE (opmode);
-	      int n_elts = GET_MODE_SIZE (opmode) / elt_size;
+	      int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
 
 	      int i = INTVAL (XVECEXP (trueop1, 0, 0));
 	      int elem;
@@ -3637,21 +3629,8 @@ simplify_binary_operation_1 (enum rtx_co
 		  mode01 = GET_MODE (op01);
 
 		  /* Find out number of elements of each operand.  */
-		  if (VECTOR_MODE_P (mode00))
-		    {
-		      elt_size = GET_MODE_UNIT_SIZE (mode00);
-		      n_elts00 = GET_MODE_SIZE (mode00) / elt_size;
-		    }
-		  else
-		    n_elts00 = 1;
-
-		  if (VECTOR_MODE_P (mode01))
-		    {
-		      elt_size = GET_MODE_UNIT_SIZE (mode01);
-		      n_elts01 = GET_MODE_SIZE (mode01) / elt_size;
-		    }
-		  else
-		    n_elts01 = 1;
+		  n_elts00 = GET_MODE_NUNITS (mode00);
+		  n_elts01 = GET_MODE_NUNITS (mode01);
 
 		  gcc_assert (n_elts == n_elts00 + n_elts01);
 
@@ -3771,9 +3750,8 @@ simplify_binary_operation_1 (enum rtx_co
 	      rtx subop1 = XEXP (trueop0, 1);
 	      machine_mode mode0 = GET_MODE (subop0);
 	      machine_mode mode1 = GET_MODE (subop1);
-	      int li = GET_MODE_UNIT_SIZE (mode0);
-	      int l0 = GET_MODE_SIZE (mode0) / li;
-	      int l1 = GET_MODE_SIZE (mode1) / li;
+	      int l0 = GET_MODE_NUNITS (mode0);
+	      int l1 = GET_MODE_NUNITS (mode1);
 	      int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
 	      if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
 		{
@@ -3931,14 +3909,10 @@ simplify_binary_operation_1 (enum rtx_co
 		|| CONST_SCALAR_INT_P (trueop1) 
 		|| CONST_DOUBLE_AS_FLOAT_P (trueop1)))
 	  {
-	    int elt_size = GET_MODE_UNIT_SIZE (mode);
-	    unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
+	    unsigned n_elts = GET_MODE_NUNITS (mode);
+	    unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
 	    rtvec v = rtvec_alloc (n_elts);
 	    unsigned int i;
-	    unsigned in_n_elts = 1;
-
-	    if (VECTOR_MODE_P (op0_mode))
-	      in_n_elts = (GET_MODE_SIZE (op0_mode) / elt_size);
 	    for (i = 0; i < n_elts; i++)
 	      {
 		if (i < in_n_elts)
@@ -4026,16 +4000,12 @@ simplify_const_binary_operation (enum rt
       && GET_CODE (op0) == CONST_VECTOR
       && GET_CODE (op1) == CONST_VECTOR)
     {
-      unsigned n_elts = GET_MODE_NUNITS (mode);
-      machine_mode op0mode = GET_MODE (op0);
-      unsigned op0_n_elts = GET_MODE_NUNITS (op0mode);
-      machine_mode op1mode = GET_MODE (op1);
-      unsigned op1_n_elts = GET_MODE_NUNITS (op1mode);
+      unsigned int n_elts = CONST_VECTOR_NUNITS (op0);
+      gcc_assert (n_elts == (unsigned int) CONST_VECTOR_NUNITS (op1));
+      gcc_assert (n_elts == GET_MODE_NUNITS (mode));
       rtvec v = rtvec_alloc (n_elts);
       unsigned int i;
 
-      gcc_assert (op0_n_elts == n_elts);
-      gcc_assert (op1_n_elts == n_elts);
       for (i = 0; i < n_elts; i++)
 	{
 	  rtx x = simplify_binary_operation (code, GET_MODE_INNER (mode),
@@ -5712,8 +5682,7 @@ simplify_ternary_operation (enum rtx_cod
       trueop2 = avoid_constant_pool_reference (op2);
       if (CONST_INT_P (trueop2))
 	{
-	  int elt_size = GET_MODE_UNIT_SIZE (mode);
-	  unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
+	  unsigned n_elts = GET_MODE_NUNITS (mode);
 	  unsigned HOST_WIDE_INT sel = UINTVAL (trueop2);
 	  unsigned HOST_WIDE_INT mask;
 	  if (n_elts == HOST_BITS_PER_WIDE_INT)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (18 preceding siblings ...)
  2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
@ 2017-10-23 11:30 ` Richard Sandiford
  2017-10-30 17:49   ` Jeff Law
  2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
  2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:30 UTC (permalink / raw)
  To: gcc-patches

This patch moves the check for an overlapping byte to normalize_ref
from its callers, so that it's easier to convert to poly_ints later.
It's not really worth it on its own.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* tree-ssa-dse.c (normalize_ref): Check whether the ranges overlap
	and return false if not.
	(clear_bytes_written_by, live_bytes_read): Update accordingly.

Index: gcc/tree-ssa-dse.c
===================================================================
--- gcc/tree-ssa-dse.c	2017-10-23 11:41:23.587123840 +0100
+++ gcc/tree-ssa-dse.c	2017-10-23 11:47:41.546155781 +0100
@@ -137,13 +137,11 @@ valid_ao_ref_for_dse (ao_ref *ref)
 	  && (ref->size != -1));
 }
 
-/* Normalize COPY (an ao_ref) relative to REF.  Essentially when we are
-   done COPY will only refer bytes found within REF.
+/* Try to normalize COPY (an ao_ref) relative to REF.  Essentially when we are
+   done COPY will only refer bytes found within REF.  Return true if COPY
+   is known to intersect at least one byte of REF.  */
 
-   We have already verified that COPY intersects at least one
-   byte with REF.  */
-
-static void
+static bool
 normalize_ref (ao_ref *copy, ao_ref *ref)
 {
   /* If COPY starts before REF, then reset the beginning of
@@ -151,13 +149,22 @@ normalize_ref (ao_ref *copy, ao_ref *ref
      number of bytes removed from COPY.  */
   if (copy->offset < ref->offset)
     {
-      copy->size -= (ref->offset - copy->offset);
+      HOST_WIDE_INT diff = ref->offset - copy->offset;
+      if (copy->size <= diff)
+	return false;
+      copy->size -= diff;
       copy->offset = ref->offset;
     }
 
+  HOST_WIDE_INT diff = copy->offset - ref->offset;
+  if (ref->size <= diff)
+    return false;
+
   /* If COPY extends beyond REF, chop off its size appropriately.  */
-  if (copy->offset + copy->size > ref->offset + ref->size)
-    copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
+  HOST_WIDE_INT limit = ref->size - diff;
+  if (copy->size > limit)
+    copy->size = limit;
+  return true;
 }
 
 /* Clear any bytes written by STMT from the bitmap LIVE_BYTES.  The base
@@ -179,14 +186,10 @@ clear_bytes_written_by (sbitmap live_byt
   if (valid_ao_ref_for_dse (&write)
       && operand_equal_p (write.base, ref->base, OEP_ADDRESS_OF)
       && write.size == write.max_size
-      && ((write.offset < ref->offset
-	   && write.offset + write.size > ref->offset)
-	  || (write.offset >= ref->offset
-	      && write.offset < ref->offset + ref->size)))
-    {
-      normalize_ref (&write, ref);
-      bitmap_clear_range (live_bytes,
-			  (write.offset - ref->offset) / BITS_PER_UNIT,
+      && normalize_ref (&write, ref))
+    {
+      HOST_WIDE_INT start = write.offset - ref->offset;
+      bitmap_clear_range (live_bytes, start / BITS_PER_UNIT,
 			  write.size / BITS_PER_UNIT);
     }
 }
@@ -480,21 +483,20 @@ live_bytes_read (ao_ref use_ref, ao_ref
 {
   /* We have already verified that USE_REF and REF hit the same object.
      Now verify that there's actually an overlap between USE_REF and REF.  */
-  if (ranges_overlap_p (use_ref.offset, use_ref.size, ref->offset, ref->size))
+  if (normalize_ref (&use_ref, ref))
     {
-      normalize_ref (&use_ref, ref);
+      HOST_WIDE_INT start = use_ref.offset - ref->offset;
+      HOST_WIDE_INT size = use_ref.size;
 
       /* If USE_REF covers all of REF, then it will hit one or more
 	 live bytes.   This avoids useless iteration over the bitmap
 	 below.  */
-      if (use_ref.offset <= ref->offset
-	  && use_ref.offset + use_ref.size >= ref->offset + ref->size)
+      if (start == 0 && size == ref->size)
 	return true;
 
       /* Now check if any of the remaining bits in use_ref are set in LIVE.  */
-      unsigned int start = (use_ref.offset - ref->offset) / BITS_PER_UNIT;
-      unsigned int end  = ((use_ref.offset + use_ref.size) / BITS_PER_UNIT) - 1;
-      return bitmap_bit_in_range_p (live, start, end);
+      return bitmap_bit_in_range_p (live, start / BITS_PER_UNIT,
+				    (start + size - 1) / BITS_PER_UNIT);
     }
   return true;
 }

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [21/nn] Minor vn_reference_lookup_3 tweak
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (19 preceding siblings ...)
  2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
@ 2017-10-23 11:31 ` Richard Sandiford
  2017-10-26 12:18   ` Richard Biener
  2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:31 UTC (permalink / raw)
  To: gcc-patches

The repeated checks for MEM_REF made this code hard to convert to
poly_ints as-is.  Hopefully the new structure also makes it clearer
at a glance what the two cases are.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid repeated
	checks for MEM_REF.

Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-10-23 11:47:03.852769480 +0100
+++ gcc/tree-ssa-sccvn.c	2017-10-23 11:47:44.596155858 +0100
@@ -2234,6 +2234,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
 	  || offset % BITS_PER_UNIT != 0
 	  || ref->size % BITS_PER_UNIT != 0)
 	return (void *)-1;
+      at = offset / BITS_PER_UNIT;
 
       /* Extract a pointer base and an offset for the destination.  */
       lhs = gimple_call_arg (def_stmt, 0);
@@ -2301,19 +2302,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
       copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
 
       /* The bases of the destination and the references have to agree.  */
-      if ((TREE_CODE (base) != MEM_REF
-	   && !DECL_P (base))
-	  || (TREE_CODE (base) == MEM_REF
-	      && (TREE_OPERAND (base, 0) != lhs
-		  || !tree_fits_uhwi_p (TREE_OPERAND (base, 1))))
-	  || (DECL_P (base)
-	      && (TREE_CODE (lhs) != ADDR_EXPR
-		  || TREE_OPERAND (lhs, 0) != base)))
+      if (TREE_CODE (base) == MEM_REF)
+	{
+	  if (TREE_OPERAND (base, 0) != lhs
+	      || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
+	    return (void *) -1;
+	  at += tree_to_uhwi (TREE_OPERAND (base, 1));
+	}
+      else if (!DECL_P (base)
+	       || TREE_CODE (lhs) != ADDR_EXPR
+	       || TREE_OPERAND (lhs, 0) != base)
 	return (void *)-1;
 
-      at = offset / BITS_PER_UNIT;
-      if (TREE_CODE (base) == MEM_REF)
-	at += tree_to_uhwi (TREE_OPERAND (base, 1));
       /* If the access is completely outside of the memcpy destination
 	 area there is no aliasing.  */
       if (lhs_offset >= at + maxsize / BITS_PER_UNIT

^ permalink raw reply	[flat|nested] 90+ messages in thread

* [22/nn] Make dse.c use offset/width instead of start/end
  2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
                   ` (20 preceding siblings ...)
  2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
@ 2017-10-23 11:45 ` Richard Sandiford
  2017-10-26 12:18   ` Richard Biener
  21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:45 UTC (permalink / raw)
  To: gcc-patches

store_info and read_info_type in dse.c represented the ranges as
start/end, but a lot of the internal code used offset/width instead.
Using offset/width throughout fits better with the poly_int.h
range-checking functions.


2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* dse.c (store_info, read_info_type): Replace begin and end with
	offset and width.
	(print_range): New function.
	(set_all_positions_unneeded, any_positions_needed_p)
	(check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Update
	accordingly.
	(record_store): Likewise.  Optimize the case in which all positions
	are unneeded.
	(get_stored_val): Replace read_begin and read_end with read_offset
	and read_width.
	(replace_read): Update call accordingly.

Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-10-23 11:47:11.273428262 +0100
+++ gcc/dse.c	2017-10-23 11:47:48.294155952 +0100
@@ -243,9 +243,12 @@ struct store_info
   /* Canonized MEM address for use by canon_true_dependence.  */
   rtx mem_addr;
 
-  /* The offset of the first and byte before the last byte associated
-     with the operation.  */
-  HOST_WIDE_INT begin, end;
+  /* The offset of the first byte associated with the operation.  */
+  HOST_WIDE_INT offset;
+
+  /* The number of bytes covered by the operation.  This is always exact
+     and known (rather than -1).  */
+  HOST_WIDE_INT width;
 
   union
     {
@@ -261,7 +264,7 @@ struct store_info
 	  bitmap bmap;
 
 	  /* Number of set bits (i.e. unneeded bytes) in BITMAP.  If it is
-	     equal to END - BEGIN, the whole store is unused.  */
+	     equal to WIDTH, the whole store is unused.  */
 	  int count;
 	} large;
     } positions_needed;
@@ -304,10 +307,11 @@ struct read_info_type
   /* The id of the mem group of the base address.  */
   int group_id;
 
-  /* The offset of the first and byte after the last byte associated
-     with the operation.  If begin == end == 0, the read did not have
-     a constant offset.  */
-  int begin, end;
+  /* The offset of the first byte associated with the operation.  */
+  HOST_WIDE_INT offset;
+
+  /* The number of bytes covered by the operation, or -1 if not known.  */
+  HOST_WIDE_INT width;
 
   /* The mem being read.  */
   rtx mem;
@@ -586,6 +590,18 @@ static deferred_change *deferred_change_
 
 /* The number of bits used in the global bitmaps.  */
 static unsigned int current_position;
+
+/* Print offset range [OFFSET, OFFSET + WIDTH) to FILE.  */
+
+static void
+print_range (FILE *file, poly_int64 offset, poly_int64 width)
+{
+  fprintf (file, "[");
+  print_dec (offset, file, SIGNED);
+  fprintf (file, "..");
+  print_dec (offset + width, file, SIGNED);
+  fprintf (file, ")");
+}
 \f
 /*----------------------------------------------------------------------------
    Zeroth step.
@@ -1212,10 +1228,9 @@ set_all_positions_unneeded (store_info *
 {
   if (__builtin_expect (s_info->is_large, false))
     {
-      int pos, end = s_info->end - s_info->begin;
-      for (pos = 0; pos < end; pos++)
-	bitmap_set_bit (s_info->positions_needed.large.bmap, pos);
-      s_info->positions_needed.large.count = end;
+      bitmap_set_range (s_info->positions_needed.large.bmap,
+			0, s_info->width);
+      s_info->positions_needed.large.count = s_info->width;
     }
   else
     s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
@@ -1227,8 +1242,7 @@ set_all_positions_unneeded (store_info *
 any_positions_needed_p (store_info *s_info)
 {
   if (__builtin_expect (s_info->is_large, false))
-    return (s_info->positions_needed.large.count
-	    < s_info->end - s_info->begin);
+    return s_info->positions_needed.large.count < s_info->width;
   else
     return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
 }
@@ -1355,8 +1369,12 @@ record_store (rtx body, bb_info_t bb_inf
       set_usage_bits (group, offset, width, expr);
 
       if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, " processing const base store gid=%d[%d..%d)\n",
-		 group_id, (int)offset, (int)(offset+width));
+	{
+	  fprintf (dump_file, " processing const base store gid=%d",
+		   group_id);
+	  print_range (dump_file, offset, width);
+	  fprintf (dump_file, "\n");
+	}
     }
   else
     {
@@ -1368,8 +1386,11 @@ record_store (rtx body, bb_info_t bb_inf
       group_id = -1;
 
       if (dump_file && (dump_flags & TDF_DETAILS))
-	fprintf (dump_file, " processing cselib store [%d..%d)\n",
-		 (int)offset, (int)(offset+width));
+	{
+	  fprintf (dump_file, " processing cselib store ");
+	  print_range (dump_file, offset, width);
+	  fprintf (dump_file, "\n");
+	}
     }
 
   const_rhs = rhs = NULL_RTX;
@@ -1435,18 +1456,21 @@ record_store (rtx body, bb_info_t bb_inf
 	{
 	  HOST_WIDE_INT i;
 	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    fprintf (dump_file, "    trying store in insn=%d gid=%d[%d..%d)\n",
-		     INSN_UID (ptr->insn), s_info->group_id,
-		     (int)s_info->begin, (int)s_info->end);
+	    {
+	      fprintf (dump_file, "    trying store in insn=%d gid=%d",
+		       INSN_UID (ptr->insn), s_info->group_id);
+	      print_range (dump_file, s_info->offset, s_info->width);
+	      fprintf (dump_file, "\n");
+	    }
 
 	  /* Even if PTR won't be eliminated as unneeded, if both
 	     PTR and this insn store the same constant value, we might
 	     eliminate this insn instead.  */
 	  if (s_info->const_rhs
 	      && const_rhs
-	      && offset >= s_info->begin
-	      && offset + width <= s_info->end
-	      && all_positions_needed_p (s_info, offset - s_info->begin,
+	      && known_subrange_p (offset, width,
+				   s_info->offset, s_info->width)
+	      && all_positions_needed_p (s_info, offset - s_info->offset,
 					 width))
 	    {
 	      if (GET_MODE (mem) == BLKmode)
@@ -1462,8 +1486,7 @@ record_store (rtx body, bb_info_t bb_inf
 		{
 		  rtx val;
 		  start_sequence ();
-		  val = get_stored_val (s_info, GET_MODE (mem),
-					offset, offset + width,
+		  val = get_stored_val (s_info, GET_MODE (mem), offset, width,
 					BLOCK_FOR_INSN (insn_info->insn),
 					true);
 		  if (get_insns () != NULL)
@@ -1474,10 +1497,18 @@ record_store (rtx body, bb_info_t bb_inf
 		}
 	    }
 
-	  for (i = MAX (offset, s_info->begin);
-	       i < offset + width && i < s_info->end;
-	       i++)
-	    set_position_unneeded (s_info, i - s_info->begin);
+	  if (known_subrange_p (s_info->offset, s_info->width, offset, width))
+	    /* The new store touches every byte that S_INFO does.  */
+	    set_all_positions_unneeded (s_info);
+	  else
+	    {
+	      HOST_WIDE_INT begin_unneeded = offset - s_info->offset;
+	      HOST_WIDE_INT end_unneeded = begin_unneeded + width;
+	      begin_unneeded = MAX (begin_unneeded, 0);
+	      end_unneeded = MIN (end_unneeded, s_info->width);
+	      for (i = begin_unneeded; i < end_unneeded; ++i)
+		set_position_unneeded (s_info, i);
+	    }
 	}
       else if (s_info->rhs)
 	/* Need to see if it is possible for this store to overwrite
@@ -1535,8 +1566,8 @@ record_store (rtx body, bb_info_t bb_inf
       store_info->positions_needed.small_bitmask = lowpart_bitmask (width);
     }
   store_info->group_id = group_id;
-  store_info->begin = offset;
-  store_info->end = offset + width;
+  store_info->offset = offset;
+  store_info->width = width;
   store_info->is_set = GET_CODE (body) == SET;
   store_info->rhs = rhs;
   store_info->const_rhs = const_rhs;
@@ -1700,39 +1731,38 @@ look_for_hardregs (rtx x, const_rtx pat
 }
 
 /* Helper function for replace_read and record_store.
-   Attempt to return a value stored in STORE_INFO, from READ_BEGIN
-   to one before READ_END bytes read in READ_MODE.  Return NULL
+   Attempt to return a value of mode READ_MODE stored in STORE_INFO,
+   consisting of READ_WIDTH bytes starting from READ_OFFSET.  Return NULL
    if not successful.  If REQUIRE_CST is true, return always constant.  */
 
 static rtx
 get_stored_val (store_info *store_info, machine_mode read_mode,
-		HOST_WIDE_INT read_begin, HOST_WIDE_INT read_end,
+		HOST_WIDE_INT read_offset, HOST_WIDE_INT read_width,
 		basic_block bb, bool require_cst)
 {
   machine_mode store_mode = GET_MODE (store_info->mem);
-  int shift;
-  int access_size; /* In bytes.  */
+  HOST_WIDE_INT gap;
   rtx read_reg;
 
   /* To get here the read is within the boundaries of the write so
      shift will never be negative.  Start out with the shift being in
      bytes.  */
   if (store_mode == BLKmode)
-    shift = 0;
+    gap = 0;
   else if (BYTES_BIG_ENDIAN)
-    shift = store_info->end - read_end;
+    gap = ((store_info->offset + store_info->width)
+	   - (read_offset + read_width));
   else
-    shift = read_begin - store_info->begin;
-
-  access_size = shift + GET_MODE_SIZE (read_mode);
-
-  /* From now on it is bits.  */
-  shift *= BITS_PER_UNIT;
+    gap = read_offset - store_info->offset;
 
-  if (shift)
-    read_reg = find_shift_sequence (access_size, store_info, read_mode, shift,
-    				    optimize_bb_for_speed_p (bb),
-				    require_cst);
+  if (gap != 0)
+    {
+      HOST_WIDE_INT shift = gap * BITS_PER_UNIT;
+      HOST_WIDE_INT access_size = GET_MODE_SIZE (read_mode) + gap;
+      read_reg = find_shift_sequence (access_size, store_info, read_mode,
+				      shift, optimize_bb_for_speed_p (bb),
+				      require_cst);
+    }
   else if (store_mode == BLKmode)
     {
       /* The store is a memset (addr, const_val, const_size).  */
@@ -1835,7 +1865,7 @@ replace_read (store_info *store_info, in
   start_sequence ();
   bb = BLOCK_FOR_INSN (read_insn->insn);
   read_reg = get_stored_val (store_info,
-			     read_mode, read_info->begin, read_info->end,
+			     read_mode, read_info->offset, read_info->width,
 			     bb, false);
   if (read_reg == NULL_RTX)
     {
@@ -1986,8 +2016,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t
   read_info = read_info_type_pool.allocate ();
   read_info->group_id = group_id;
   read_info->mem = mem;
-  read_info->begin = offset;
-  read_info->end = offset + width;
+  read_info->offset = offset;
+  read_info->width = width;
   read_info->next = insn_info->read_rec;
   insn_info->read_rec = read_info;
   if (group_id < 0)
@@ -2013,8 +2043,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 	    fprintf (dump_file, " processing const load gid=%d[BLK]\n",
 		     group_id);
 	  else
-	    fprintf (dump_file, " processing const load gid=%d[%d..%d)\n",
-		     group_id, (int)offset, (int)(offset+width));
+	    {
+	      fprintf (dump_file, " processing const load gid=%d", group_id);
+	      print_range (dump_file, offset, width);
+	      fprintf (dump_file, "\n");
+	    }
 	}
 
       while (i_ptr)
@@ -2052,19 +2085,19 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 	      else
 		{
 		  if (store_info->rhs
-		      && offset >= store_info->begin
-		      && offset + width <= store_info->end
+		      && known_subrange_p (offset, width, store_info->offset,
+					   store_info->width)
 		      && all_positions_needed_p (store_info,
-						 offset - store_info->begin,
+						 offset - store_info->offset,
 						 width)
 		      && replace_read (store_info, i_ptr, read_info,
 				       insn_info, loc, bb_info->regs_live))
 		    return;
 
 		  /* The bases are the same, just see if the offsets
-		     overlap.  */
-		  if ((offset < store_info->end)
-		      && (offset + width > store_info->begin))
+		     could overlap.  */
+		  if (ranges_may_overlap_p (offset, width, store_info->offset,
+					    store_info->width))
 		    remove = true;
 		}
 	    }
@@ -2119,11 +2152,10 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 	  if (store_info->rhs
 	      && store_info->group_id == -1
 	      && store_info->cse_base == base
-	      && width != -1
-	      && offset >= store_info->begin
-	      && offset + width <= store_info->end
+	      && known_subrange_p (offset, width, store_info->offset,
+				   store_info->width)
 	      && all_positions_needed_p (store_info,
-					 offset - store_info->begin, width)
+					 offset - store_info->offset, width)
 	      && replace_read (store_info, i_ptr,  read_info, insn_info, loc,
 			       bb_info->regs_live))
 	    return;
@@ -2775,16 +2807,19 @@ scan_stores (store_info *store_info, bit
       group_info *group_info
 	= rtx_group_vec[store_info->group_id];
       if (group_info->process_globally)
-	for (i = store_info->begin; i < store_info->end; i++)
-	  {
-	    int index = get_bitmap_index (group_info, i);
-	    if (index != 0)
-	      {
-		bitmap_set_bit (gen, index);
-		if (kill)
-		  bitmap_clear_bit (kill, index);
-	      }
-	  }
+	{
+	  HOST_WIDE_INT end = store_info->offset + store_info->width;
+	  for (i = store_info->offset; i < end; i++)
+	    {
+	      int index = get_bitmap_index (group_info, i);
+	      if (index != 0)
+		{
+		  bitmap_set_bit (gen, index);
+		  if (kill)
+		    bitmap_clear_bit (kill, index);
+		}
+	    }
+	}
       store_info = store_info->next;
     }
 }
@@ -2834,9 +2869,9 @@ scan_reads (insn_info_t insn_info, bitma
 	    {
 	      if (i == read_info->group_id)
 		{
-		  if (read_info->begin > read_info->end)
+		  if (!known_size_p (read_info->width))
 		    {
-		      /* Begin > end for block mode reads.  */
+		      /* Handle block mode reads.  */
 		      if (kill)
 			bitmap_ior_into (kill, group->group_kill);
 		      bitmap_and_compl_into (gen, group->group_kill);
@@ -2846,7 +2881,8 @@ scan_reads (insn_info_t insn_info, bitma
 		      /* The groups are the same, just process the
 			 offsets.  */
 		      HOST_WIDE_INT j;
-		      for (j = read_info->begin; j < read_info->end; j++)
+		      HOST_WIDE_INT end = read_info->offset + read_info->width;
+		      for (j = read_info->offset; j < end; j++)
 			{
 			  int index = get_bitmap_index (group, j);
 			  if (index != 0)
@@ -3265,7 +3301,8 @@ dse_step5 (void)
 	      HOST_WIDE_INT i;
 	      group_info *group_info = rtx_group_vec[store_info->group_id];
 
-	      for (i = store_info->begin; i < store_info->end; i++)
+	      HOST_WIDE_INT end = store_info->offset + store_info->width;
+	      for (i = store_info->offset; i < end; i++)
 		{
 		  int index = get_bitmap_index (group_info, i);
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [01/nn] Add gen_(const_)vec_duplicate helpers
  2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
@ 2017-10-25 16:29   ` Jeff Law
  2017-10-27 16:12     ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-25 16:29 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:16 AM, Richard Sandiford wrote:
> This patch adds helper functions for generating constant and
> non-constant vector duplicates.  These routines help with SVE because
> it is then easier to use:
> 
>    (const:M (vec_duplicate:M X))
> 
> for a broadcast of X, even if the number of elements in M isn't known
> at compile time.  It also makes it easier for general rtx code to treat
> constant and non-constant duplicates in the same way.
> 
> In the target code, the patch uses gen_vec_duplicate instead of
> gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
> useful.  It might be that some or all of the call sites only handle
> non-constants in practice, in which case the change is a harmless
> no-op (and a saving of a few characters).
> 
> Otherwise, the target changes use gen_const_vec_duplicate instead
> of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
> They also include some changes to use CONSTxx_RTX for easy global
> constants.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* emit-rtl.h (gen_const_vec_duplicate): Declare.
> 	(gen_vec_duplicate): Likewise.
> 	* emit-rtl.c (gen_const_vec_duplicate_1): New function, split
> 	out from...
> 	(gen_const_vector): ...here.
> 	(gen_const_vec_duplicate, gen_vec_duplicate): New functions.
> 	(gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
> 	whose elements are all equal.
> 	* optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
> 	* simplify-rtx.c (simplify_const_unary_operation): Likewise.
> 	(simplify_relational_operation): Likewise.
> 	* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
> 	Likewise.
> 	(aarch64_simd_dup_constant): Use gen_vec_duplicate.
> 	(aarch64_expand_vector_init): Likewise.
> 	* config/arm/arm.c (neon_vdup_constant): Likewise.
> 	(neon_expand_vector_init): Likewise.
> 	(arm_expand_vec_perm): Use gen_const_vec_duplicate.
> 	(arm_block_set_unaligned_vect): Likewise.
> 	(arm_block_set_aligned_vect): Likewise.
> 	* config/arm/neon.md (neon_copysignf<mode>): Likewise.
> 	* config/i386/i386.c (ix86_expand_vec_perm): Likewise.
> 	(expand_vec_perm_even_odd_pack): Likewise.
> 	(ix86_vector_duplicate_value): Use gen_vec_duplicate.
> 	* config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
> 	* config/ia64/ia64.c (ia64_expand_vecint_compare): Use
> 	gen_const_vec_duplicate.
> 	* config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
> 	* config/mips/mips.c (mips_gen_const_int_vector): Use
> 	gen_const_vec_duplicate.
> 	(mips_expand_vector_init): Use CONST0_RTX.
> 	* config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
> 	(define_split): Use gen_const_vec_duplicate.
> 	* config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
> 	(define_split): Use gen_const_vec_duplicate.
> 	* config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
> 	(vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
> 	* config/spu/spu.c (spu_const): Likewise.
I'd started looking at this a couple times when it was originally
submitted, but never seemed to get through it.  It seems like a nice
cleanup.

So in gen_const_vector we had an assert to verify that const_tiny_rtx
was set up.  That seems to have been lost.  It's probably not a big
deal, but I mention it in case the loss was unintentional.


OK.  Your call whether or not to re-introduce the assert for const_tiny_rtx.

Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [02/nn] Add more vec_duplicate simplifications
  2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
@ 2017-10-25 16:35   ` Jeff Law
  2017-11-10  9:42     ` Christophe Lyon
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-25 16:35 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:17 AM, Richard Sandiford wrote:
> This patch adds a vec_duplicate_p helper that tests for constant
> or non-constant vector duplicates.  Together with the existing
> const_vec_duplicate_p, this complements the gen_vec_duplicate
> and gen_const_vec_duplicate added by a previous patch.
> 
> The patch uses the new routines to add more rtx simplifications
> involving vector duplicates.  These mirror simplifications that
> we already do for CONST_VECTOR broadcasts and are needed for
> variable-length SVE, which uses:
> 
>   (const:M (vec_duplicate:M X))
> 
> to represent constant broadcasts instead.  The simplifications do
> trigger on the testsuite for variable duplicates too, and in each
> case I saw the change was an improvement.  E.g.:
> 
[ snip ]

> 
> The best way of testing the new simplifications seemed to be
> via selftests.  The patch cribs part of David's patch here:
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
Cool.  I really wish I had more time to promote David's work by adding
selftests to various things.  There's certainly cases where it's the
most direct and useful way to test certain bits of lower level
infrastructure we have.  Glad to see you found it useful here.



> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    David Malcolm  <dmalcolm@redhat.com>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (vec_duplicate_p): New function.
> 	* selftest-rtl.c (assert_rtx_eq_at): New function.
> 	* selftest-rtl.h (ASSERT_RTX_EQ): New macro.
> 	(assert_rtx_eq_at): Declare.
> 	* selftest.h (selftest::simplify_rtx_c_tests): Declare.
> 	* selftest-run-tests.c (selftest::run_tests): Call it.
> 	* simplify-rtx.c: Include selftest.h and selftest-rtl.h.
> 	(simplify_unary_operation_1): Recursively handle vector duplicates.
> 	(simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
> 	vector duplicates.
> 	(simplify_subreg): Handle subregs of vector duplicates.
> 	(make_test_reg, test_vector_ops_duplicate, test_vector_ops)
> 	(selftest::simplify_rtx_c_tests): New functions.
Thanks for the examples of how this affects various targets.  Seems like
it ought to be a consistent win when they trigger.

jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [03/nn] Allow vector CONSTs
  2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
@ 2017-10-25 16:59   ` Jeff Law
  2017-10-27 16:19     ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-25 16:59 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:18 AM, Richard Sandiford wrote:
> This patch allows (const ...) wrappers to be used for rtx vector
> constants, as an alternative to const_vector.  This is useful
> for SVE, where the number of elements isn't known until runtime.
Right.  It's constant, but not knowable at compile time.  That seems an
exact match for how we've used CONST.

> 
> It could also be useful in future for fixed-length vectors, to
> reduce the amount of memory needed to represent simple constants
> with high element counts.  However, one nice thing about keeping
> it restricted to variable-length vectors is that there is never
> any need to handle combinations of (const ...) and CONST_VECTOR.
Yea, but is the memory consumption of these large vectors a real
problem?  I suspect, relative to other memory issues they're in the noise.

> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* doc/rtl.texi (const): Update description of address constants.
> 	Say that vector constants are allowed too.
> 	* common.md (E, F): Use CONSTANT_P instead of checking for
> 	CONST_VECTOR.
> 	* emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
> 	checking for CONST_VECTOR.
> 	* expmed.c (make_tree): Use build_vector_from_val for a CONST
> 	VEC_DUPLICATE.
> 	* expr.c (expand_expr_real_2): Check for vector modes instead
> 	of checking for CONST_VECTOR.
> 	* rtl.h (const_vec_p): New function.
> 	(const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
> 	(unwrap_const_vec_duplicate): Handle them here too.
My only worry here is code that is a bit loose in checking for a CONST,
but not the innards and perhaps isn't prepared for for the new forms
that appear inside the CONST.

If we have such problems I'd expect it's in the targets as the targets
have traditionally have had to validate the innards of a CONST to ensure
it could be handled by the assembler/linker.  Hmm, that may save the
targets since they'd likely need an update to LEGITIMATE_CONSTANT_P to
ever see these new forms.

Presumably an aarch64 specific patch to recognize these as valid
constants in LEGITIMATE_CONSTANT_P is in the works?

OK for the trunk.

jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [04/nn] Add a VEC_SERIES rtl code
  2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
@ 2017-10-26 11:49   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:49 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:19 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds an rtl representation of a vector linear series
> of the form:
>
>   a[I] = BASE + I * STEP
>
> Like vec_duplicate;
>
> - the new rtx can be used for both constant and non-constant vectors
> - when used for constant vectors it is wrapped in a (const ...)
> - the constant form is only used for variable-length vectors;
>   fixed-length vectors still use CONST_VECTOR
>
> At the moment the code is restricted to integer elements, to avoid
> concerns over floating-point rounding.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/rtl.texi (vec_series): Document.
>         (const): Say that the operand can be a vec_series.
>         * rtl.def (VEC_SERIES): New rtx code.
>         * rtl.h (const_vec_series_p_1): Declare.
>         (const_vec_series_p): New function.
>         * emit-rtl.h (gen_const_vec_series): Declare.
>         (gen_vec_series): Likewise.
>         * emit-rtl.c (const_vec_series_p_1, gen_const_vec_series)
>         (gen_vec_series): Likewise.
>         * optabs.c (expand_mult_highpart): Use gen_const_vec_series.
>         * simplify-rtx.c (simplify_unary_operation): Handle negations
>         of vector series.
>         (simplify_binary_operation_series): New function.
>         (simplify_binary_operation_1): Use it.  Handle VEC_SERIES.
>         (test_vector_ops_series): New function.
>         (test_vector_ops): Call it.
>         * config/powerpcspe/altivec.md (altivec_lvsl): Use
>         gen_const_vec_series.
>         (altivec_lvsr): Likewise.
>         * config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise.
>
> Index: gcc/doc/rtl.texi
> ===================================================================
> --- gcc/doc/rtl.texi    2017-10-23 11:41:39.185050437 +0100
> +++ gcc/doc/rtl.texi    2017-10-23 11:41:41.547050496 +0100
> @@ -1677,7 +1677,8 @@ are target-specific and typically repres
>  operator.  @var{m} should be a valid address mode.
>
>  The second use of @code{const} is to wrap a vector operation.
> -In this case @var{exp} must be a @code{vec_duplicate} expression.
> +In this case @var{exp} must be a @code{vec_duplicate} or
> +@code{vec_series} expression.
>
>  @findex high
>  @item (high:@var{m} @var{exp})
> @@ -2722,6 +2723,10 @@ the same submodes as the input vector mo
>  number of output parts must be an integer multiple of the number of input
>  parts.
>
> +@findex vec_series
> +@item (vec_series:@var{m} @var{base} @var{step})
> +This operation creates a vector in which element @var{i} is equal to
> +@samp{@var{base} + @var{i}*@var{step}}.  @var{m} must be a vector integer mode.
>  @end table
>
>  @node Conversions
> Index: gcc/rtl.def
> ===================================================================
> --- gcc/rtl.def 2017-10-23 11:40:11.378243915 +0100
> +++ gcc/rtl.def 2017-10-23 11:41:41.549050496 +0100
> @@ -710,6 +710,11 @@ DEF_RTL_EXPR(VEC_CONCAT, "vec_concat", "
>     an integer multiple of the number of input parts.  */
>  DEF_RTL_EXPR(VEC_DUPLICATE, "vec_duplicate", "e", RTX_UNARY)
>
> +/* Creation of a vector in which element I has the value BASE + I * STEP,
> +   where BASE is the first operand and STEP is the second.  The result
> +   must have a vector integer mode.  */
> +DEF_RTL_EXPR(VEC_SERIES, "vec_series", "ee", RTX_BIN_ARITH)
> +
>  /* Addition with signed saturation */
>  DEF_RTL_EXPR(SS_PLUS, "ss_plus", "ee", RTX_COMM_ARITH)
>
> Index: gcc/rtl.h
> ===================================================================
> --- gcc/rtl.h   2017-10-23 11:41:39.188050437 +0100
> +++ gcc/rtl.h   2017-10-23 11:41:41.549050496 +0100
> @@ -2816,6 +2816,51 @@ unwrap_const_vec_duplicate (T x)
>    return x;
>  }
>
> +/* In emit-rtl.c.  */
> +extern bool const_vec_series_p_1 (const_rtx, rtx *, rtx *);
> +
> +/* Return true if X is a constant vector that contains a linear series
> +   of the form:
> +
> +   { B, B + S, B + 2 * S, B + 3 * S, ... }
> +
> +   for a nonzero S.  Store B and S in *BASE_OUT and *STEP_OUT on sucess.  */
> +
> +inline bool
> +const_vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> +  if (GET_CODE (x) == CONST_VECTOR
> +      && GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
> +    return const_vec_series_p_1 (x, base_out, step_out);
> +  if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_SERIES)
> +    {
> +      *base_out = XEXP (XEXP (x, 0), 0);
> +      *step_out = XEXP (XEXP (x, 0), 1);
> +      return true;
> +    }
> +  return false;
> +}
> +
> +/* Return true if X is a vector that contains a linear series of the
> +   form:
> +
> +   { B, B + S, B + 2 * S, B + 3 * S, ... }
> +
> +   where B and S are constant or nonconstant.  Store B and S in
> +   *BASE_OUT and *STEP_OUT on sucess.  */
> +
> +inline bool
> +vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> +  if (GET_CODE (x) == VEC_SERIES)
> +    {
> +      *base_out = XEXP (x, 0);
> +      *step_out = XEXP (x, 1);
> +      return true;
> +    }
> +  return const_vec_series_p (x, base_out, step_out);
> +}
> +
>  /* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X.  */
>
>  inline scalar_int_mode
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h      2017-10-23 11:41:32.369050264 +0100
> +++ gcc/emit-rtl.h      2017-10-23 11:41:41.548050496 +0100
> @@ -441,6 +441,9 @@ get_max_uid (void)
>  extern rtx gen_const_vec_duplicate (machine_mode, rtx);
>  extern rtx gen_vec_duplicate (machine_mode, rtx);
>
> +extern rtx gen_const_vec_series (machine_mode, rtx, rtx);
> +extern rtx gen_vec_series (machine_mode, rtx, rtx);
> +
>  extern void set_decl_incoming_rtl (tree, rtx, bool);
>
>  /* Return a memory reference like MEMREF, but with its mode changed
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c      2017-10-23 11:41:39.186050437 +0100
> +++ gcc/emit-rtl.c      2017-10-23 11:41:41.548050496 +0100
> @@ -5796,6 +5796,69 @@ gen_vec_duplicate (machine_mode mode, rt
>    return gen_rtx_VEC_DUPLICATE (mode, x);
>  }
>
> +/* A subroutine of const_vec_series_p that handles the case in which
> +   X is known to be an integer CONST_VECTOR.  */
> +
> +bool
> +const_vec_series_p_1 (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> +  unsigned int nelts = CONST_VECTOR_NUNITS (x);
> +  if (nelts < 2)
> +    return false;
> +
> +  scalar_mode inner = GET_MODE_INNER (GET_MODE (x));
> +  rtx base = CONST_VECTOR_ELT (x, 0);
> +  rtx step = simplify_binary_operation (MINUS, inner,
> +                                       CONST_VECTOR_ELT (x, 1), base);
> +  if (rtx_equal_p (step, CONST0_RTX (inner)))
> +    return false;
> +
> +  for (unsigned int i = 2; i < nelts; ++i)
> +    {
> +      rtx diff = simplify_binary_operation (MINUS, inner,
> +                                           CONST_VECTOR_ELT (x, i),
> +                                           CONST_VECTOR_ELT (x, i - 1));
> +      if (!rtx_equal_p (step, diff))
> +       return false;
> +    }
> +
> +  *base_out = base;
> +  *step_out = step;
> +  return true;
> +}
> +
> +/* Generate a vector constant of mode MODE in which element I has
> +   the value BASE + I * STEP.  */
> +
> +rtx
> +gen_const_vec_series (machine_mode mode, rtx base, rtx step)
> +{
> +  gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
> +
> +  int nunits = GET_MODE_NUNITS (mode);
> +  rtvec v = rtvec_alloc (nunits);
> +  scalar_mode inner_mode = GET_MODE_INNER (mode);
> +  RTVEC_ELT (v, 0) = base;
> +  for (int i = 1; i < nunits; ++i)
> +    RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
> +                                           RTVEC_ELT (v, i - 1), step);
> +  return gen_rtx_raw_CONST_VECTOR (mode, v);
> +}
> +
> +/* Generate a vector of mode MODE in which element I has the value
> +   BASE + I * STEP.  The result will be a constant if BASE and STEP
> +   are both constants.  */
> +
> +rtx
> +gen_vec_series (machine_mode mode, rtx base, rtx step)
> +{
> +  if (step == const0_rtx)
> +    return gen_vec_duplicate (mode, base);
> +  if (CONSTANT_P (base) && CONSTANT_P (step))
> +    return gen_const_vec_series (mode, base, step);
> +  return gen_rtx_VEC_SERIES (mode, base, step);
> +}
> +
>  /* Generate a new vector constant for mode MODE and constant value
>     CONSTANT.  */
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-10-23 11:41:32.369050264 +0100
> +++ gcc/optabs.c        2017-10-23 11:41:41.549050496 +0100
> @@ -5784,13 +5784,13 @@ expand_mult_highpart (machine_mode mode,
>        for (i = 0; i < nunits; ++i)
>         RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
>                                     + ((i & 1) ? nunits : 0));
> +      perm = gen_rtx_CONST_VECTOR (mode, v);
>      }
>    else
>      {
> -      for (i = 0; i < nunits; ++i)
> -       RTVEC_ELT (v, i) = GEN_INT (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
> +      int base = BYTES_BIG_ENDIAN ? 0 : 1;
> +      perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
>      }
> -  perm = gen_rtx_CONST_VECTOR (mode, v);
>
>    return expand_vec_perm (mode, m1, m2, perm, target);
>  }
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c  2017-10-23 11:41:36.309050364 +0100
> +++ gcc/simplify-rtx.c  2017-10-23 11:41:41.550050496 +0100
> @@ -927,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
>  simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
>  {
>    enum rtx_code reversed;
> -  rtx temp, elt;
> +  rtx temp, elt, base, step;
>    scalar_int_mode inner, int_mode, op_mode, op0_mode;
>
>    switch (code)
> @@ -1185,6 +1185,22 @@ simplify_unary_operation_1 (enum rtx_cod
>               return simplify_gen_unary (TRUNCATE, int_mode, temp, inner);
>             }
>         }
> +
> +      if (vec_series_p (op, &base, &step))
> +       {
> +         /* Only create a new series if we can simplify both parts.  In other
> +            cases this isn't really a simplification, and it's not necessarily
> +            a win to replace a vector operation with a scalar operation.  */
> +         scalar_mode inner_mode = GET_MODE_INNER (mode);
> +         base = simplify_unary_operation (NEG, inner_mode, base, inner_mode);
> +         if (base)
> +           {
> +             step = simplify_unary_operation (NEG, inner_mode,
> +                                              step, inner_mode);
> +             if (step)
> +               return gen_vec_series (mode, base, step);
> +           }
> +       }
>        break;
>
>      case TRUNCATE:
> @@ -2153,6 +2169,46 @@ simplify_binary_operation (enum rtx_code
>    return NULL_RTX;
>  }
>
> +/* Subroutine of simplify_binary_operation_1 that looks for cases in
> +   which OP0 and OP1 are both vector series or vector duplicates
> +   (which are really just series with a step of 0).  If so, try to
> +   form a new series by applying CODE to the bases and to the steps.
> +   Return null if no simplification is possible.
> +
> +   MODE is the mode of the operation and is known to be a vector
> +   integer mode.  */
> +
> +static rtx
> +simplify_binary_operation_series (rtx_code code, machine_mode mode,
> +                                 rtx op0, rtx op1)
> +{
> +  rtx base0, step0;
> +  if (vec_duplicate_p (op0, &base0))
> +    step0 = const0_rtx;
> +  else if (!vec_series_p (op0, &base0, &step0))
> +    return NULL_RTX;
> +
> +  rtx base1, step1;
> +  if (vec_duplicate_p (op1, &base1))
> +    step1 = const0_rtx;
> +  else if (!vec_series_p (op1, &base1, &step1))
> +    return NULL_RTX;
> +
> +  /* Only create a new series if we can simplify both parts.  In other
> +     cases this isn't really a simplification, and it's not necessarily
> +     a win to replace a vector operation with a scalar operation.  */
> +  scalar_mode inner_mode = GET_MODE_INNER (mode);
> +  rtx new_base = simplify_binary_operation (code, inner_mode, base0, base1);
> +  if (!new_base)
> +    return NULL_RTX;
> +
> +  rtx new_step = simplify_binary_operation (code, inner_mode, step0, step1);
> +  if (!new_step)
> +    return NULL_RTX;
> +
> +  return gen_vec_series (mode, new_base, new_step);
> +}
> +
>  /* Subroutine of simplify_binary_operation.  Simplify a binary operation
>     CODE with result mode MODE, operating on OP0 and OP1.  If OP0 and/or
>     OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
> @@ -2333,6 +2389,14 @@ simplify_binary_operation_1 (enum rtx_co
>           if (tem)
>             return tem;
>         }
> +
> +      /* Handle vector series.  */
> +      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +       {
> +         tem = simplify_binary_operation_series (code, mode, op0, op1);
> +         if (tem)
> +           return tem;
> +       }
>        break;
>
>      case COMPARE:
> @@ -2544,6 +2608,14 @@ simplify_binary_operation_1 (enum rtx_co
>               || plus_minus_operand_p (op1))
>           && (tem = simplify_plus_minus (code, mode, op0, op1)) != 0)
>         return tem;
> +
> +      /* Handle vector series.  */
> +      if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +       {
> +         tem = simplify_binary_operation_series (code, mode, op0, op1);
> +         if (tem)
> +           return tem;
> +       }
>        break;
>
>      case MULT:
> @@ -3495,6 +3567,11 @@ simplify_binary_operation_1 (enum rtx_co
>        /* ??? There are simplifications that can be done.  */
>        return 0;
>
> +    case VEC_SERIES:
> +      if (op1 == CONST0_RTX (GET_MODE_INNER (mode)))
> +       return gen_vec_duplicate (mode, op0);
> +      return 0;
> +
>      case VEC_SELECT:
>        if (!VECTOR_MODE_P (mode))
>         {
> @@ -6490,6 +6567,60 @@ test_vector_ops_duplicate (machine_mode
>      }
>  }
>
> +/* Test vector simplifications involving VEC_SERIES in which the
> +   operands and result have vector mode MODE.  SCALAR_REG is a pseudo
> +   register that holds one element of MODE.  */
> +
> +static void
> +test_vector_ops_series (machine_mode mode, rtx scalar_reg)
> +{
> +  /* Test unary cases with VEC_SERIES arguments.  */
> +  scalar_mode inner_mode = GET_MODE_INNER (mode);
> +  rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
> +  rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
> +  rtx series_0_r = gen_rtx_VEC_SERIES (mode, const0_rtx, scalar_reg);
> +  rtx series_0_nr = gen_rtx_VEC_SERIES (mode, const0_rtx, neg_scalar_reg);
> +  rtx series_nr_1 = gen_rtx_VEC_SERIES (mode, neg_scalar_reg, const1_rtx);
> +  rtx series_r_m1 = gen_rtx_VEC_SERIES (mode, scalar_reg, constm1_rtx);
> +  rtx series_r_r = gen_rtx_VEC_SERIES (mode, scalar_reg, scalar_reg);
> +  rtx series_nr_nr = gen_rtx_VEC_SERIES (mode, neg_scalar_reg,
> +                                        neg_scalar_reg);
> +  ASSERT_RTX_EQ (series_0_r,
> +                simplify_unary_operation (NEG, mode, series_0_nr, mode));
> +  ASSERT_RTX_EQ (series_r_m1,
> +                simplify_unary_operation (NEG, mode, series_nr_1, mode));
> +  ASSERT_RTX_EQ (series_r_r,
> +                simplify_unary_operation (NEG, mode, series_nr_nr, mode));
> +
> +  /* Test that a VEC_SERIES with a zero step is simplified away.  */
> +  ASSERT_RTX_EQ (duplicate,
> +                simplify_binary_operation (VEC_SERIES, mode,
> +                                           scalar_reg, const0_rtx));
> +
> +  /* Test PLUS and MINUS with VEC_SERIES.  */
> +  rtx series_0_1 = gen_const_vec_series (mode, const0_rtx, const1_rtx);
> +  rtx series_0_m1 = gen_const_vec_series (mode, const0_rtx, constm1_rtx);
> +  rtx series_r_1 = gen_rtx_VEC_SERIES (mode, scalar_reg, const1_rtx);
> +  ASSERT_RTX_EQ (series_r_r,
> +                simplify_binary_operation (PLUS, mode, series_0_r,
> +                                           duplicate));
> +  ASSERT_RTX_EQ (series_r_1,
> +                simplify_binary_operation (PLUS, mode, duplicate,
> +                                           series_0_1));
> +  ASSERT_RTX_EQ (series_r_m1,
> +                simplify_binary_operation (PLUS, mode, duplicate,
> +                                           series_0_m1));
> +  ASSERT_RTX_EQ (series_0_r,
> +                simplify_binary_operation (MINUS, mode, series_r_r,
> +                                           duplicate));
> +  ASSERT_RTX_EQ (series_r_m1,
> +                simplify_binary_operation (MINUS, mode, duplicate,
> +                                           series_0_1));
> +  ASSERT_RTX_EQ (series_r_1,
> +                simplify_binary_operation (MINUS, mode, duplicate,
> +                                           series_0_m1));
> +}
> +
>  /* Verify some simplifications involving vectors.  */
>
>  static void
> @@ -6502,6 +6633,9 @@ test_vector_ops ()
>         {
>           rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
>           test_vector_ops_duplicate (mode, scalar_reg);
> +         if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> +             && GET_MODE_NUNITS (mode) > 2)
> +           test_vector_ops_series (mode, scalar_reg);
>         }
>      }
>  }
> Index: gcc/config/powerpcspe/altivec.md
> ===================================================================
> --- gcc/config/powerpcspe/altivec.md    2017-10-23 11:41:32.366050264 +0100
> +++ gcc/config/powerpcspe/altivec.md    2017-10-23 11:41:41.546050496 +0100
> @@ -2456,13 +2456,10 @@ (define_expand "altivec_lvsl"
>      emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
>    else
>      {
> -      int i;
> -      rtx mask, perm[16], constv, vperm;
> +      rtx mask, constv, vperm;
>        mask = gen_reg_rtx (V16QImode);
>        emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
> -      for (i = 0; i < 16; ++i)
> -        perm[i] = GEN_INT (i);
> -      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> +      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
>        constv = force_reg (V16QImode, constv);
>        vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
>                                UNSPEC_VPERM);
> @@ -2488,13 +2485,10 @@ (define_expand "altivec_lvsr"
>      emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
>    else
>      {
> -      int i;
> -      rtx mask, perm[16], constv, vperm;
> +      rtx mask, constv, vperm;
>        mask = gen_reg_rtx (V16QImode);
>        emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
> -      for (i = 0; i < 16; ++i)
> -        perm[i] = GEN_INT (i);
> -      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> +      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
>        constv = force_reg (V16QImode, constv);
>        vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
>                                UNSPEC_VPERM);
> Index: gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc/config/rs6000/altivec.md        2017-10-23 11:41:32.366050264 +0100
> +++ gcc/config/rs6000/altivec.md        2017-10-23 11:41:41.547050496 +0100
> @@ -2573,13 +2573,10 @@ (define_expand "altivec_lvsl"
>      emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
>    else
>      {
> -      int i;
> -      rtx mask, perm[16], constv, vperm;
> +      rtx mask, constv, vperm;
>        mask = gen_reg_rtx (V16QImode);
>        emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
> -      for (i = 0; i < 16; ++i)
> -        perm[i] = GEN_INT (i);
> -      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> +      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
>        constv = force_reg (V16QImode, constv);
>        vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
>                                UNSPEC_VPERM);
> @@ -2614,13 +2611,10 @@ (define_expand "altivec_lvsr"
>      emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
>    else
>      {
> -      int i;
> -      rtx mask, perm[16], constv, vperm;
> +      rtx mask, constv, vperm;
>        mask = gen_reg_rtx (V16QImode);
>        emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
> -      for (i = 0; i < 16; ++i)
> -        perm[i] = GEN_INT (i);
> -      constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> +      constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
>        constv = force_reg (V16QImode, constv);
>        vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
>                                UNSPEC_VPERM);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
@ 2017-10-26 11:53   ` Richard Biener
  2017-11-06 15:09     ` Richard Sandiford
  2017-12-15  0:29   ` Richard Sandiford
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:53 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> SVE needs a way of broadcasting a scalar to a variable-length vector.
> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
> be used for fixed-length vectors.  VEC_DUPLICATE_EXPR is the tree
> equivalent of the existing rtl code VEC_DUPLICATE.
>
> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
> to mark constant nodes, but in response to last year's RFC, Richard B.
> suggested it would be better to have separate codes for the constant
> and non-constant cases.  This allows VEC_DUPLICATE_EXPR to be treated
> as a normal unary operation and avoids the previous need for treating
> it as a GIMPLE_SINGLE_RHS.
>
> It might make sense to use VEC_DUPLICATE_CST for all duplicated
> vector constants, since it's a bit more compact than VECTOR_CST
> in that case, and is potentially more efficient to process.
> However, the nice thing about keeping it restricted to variable-length
> vectors is that there is then no need to handle combinations of
> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
> VECTOR_CST or never use it.
>
> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.

Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c     2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-vect-generic.c     2017-10-23 11:41:51.773953100 +0100
@@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
 ssa_uniform_vector_p (tree op)
  {
     if (TREE_CODE (op) == VECTOR_CST
     +      || TREE_CODE (op) == VEC_DUPLICATE_CST
            || TREE_CODE (op) == CONSTRUCTOR)
                 return uniform_vector_p (op);

VEC_DUPLICATE_EXPR handling?  Looks like for VEC_DUPLICATE_CST
it could directly return true.

I didn't see uniform_vector_p being updated?

Can you add verification to either verify_expr or build_vec_duplicate_cst
that the type is one of variable size?  And amend tree.def docs
accordingly.  Because otherwise we miss a lot of cases in constant
folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).

Otherwise looks ok to me.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hawyard@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
>         (VEC_COND_EXPR): Add missing @tindex.
>         * doc/md.texi (vec_duplicate@var{m}): Document.
>         * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
>         * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
>         are used for VEC_DUPLICATE_CST as well.
>         (tree_vector): Access base.n.nelts directly.
>         * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
>         valid codes.
>         (VEC_DUPLICATE_CST_ELT): New macro.
>         (build_vec_duplicate_cst): Declare.
>         * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>         (integer_zerop, integer_onep, integer_all_onesp, integer_truep)
>         (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
>         (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
>         (build_vec_duplicate_cst): New function.
>         (uniform_vector_p): Handle the new codes.
>         (test_vec_duplicate_predicates_int): New function.
>         (test_vec_duplicate_predicates_float): Likewise.
>         (test_vec_duplicate_predicates): Likewise.
>         (tree_c_tests): Call test_vec_duplicate_predicates.
>         * cfgexpand.c (expand_debug_expr): Handle the new codes.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
>         * gimple-expr.h (is_gimple_constant): Likewise.
>         * gimplify.c (gimplify_expr): Likewise.
>         * graphite-isl-ast-to-gimple.c
>         (translate_isl_ast_to_gimple::is_constant): Likewise.
>         * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>         * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>         (func_checker::compare_operand): Likewise.
>         * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>         * match.pd (negate_expr_p): Likewise.
>         * print-tree.c (print_node): Likewise.
>         * tree-chkp.c (chkp_find_bounds_1): Likewise.
>         * tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
>         * tree-ssa-loop.c (for_each_index): Likewise.
>         * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>         * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>         (ao_ref_init_from_vn_reference): Likewise.
>         * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
>         * varasm.c (const_hash_1, compare_constant): Likewise.
>         * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
>         (fold_convert_const, operand_equal_p, fold_view_convert_expr)
>         (exact_inverse, fold_checksum_tree): Likewise.
>         (const_unop): Likewise.  Fold VEC_DUPLICATE_EXPRs of a constant.
>         (test_vec_duplicate_folding): New function.
>         (fold_const_c_tests): Call it.
>         * optabs.def (vec_duplicate_optab): New optab.
>         * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
>         * optabs.h (expand_vector_broadcast): Declare.
>         * optabs.c (expand_vector_broadcast): Make non-static.  Try using
>         vec_duplicate_optab.
>         * expr.c (store_constructor): Try using vec_duplicate_optab for
>         uniform vectors.
>         (const_vector_element): New function, split out from...
>         (const_vector_from_tree): ...here.
>         (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
>         (expand_expr_real_1): Handle VEC_DUPLICATE_CST.
>         * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
>         instead of checking for VECTOR_CST.
>         * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
>         (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
>         * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi        2017-10-23 11:38:53.934094740 +0100
> +++ gcc/doc/generic.texi        2017-10-23 11:41:51.760448406 +0100
> @@ -1036,6 +1036,7 @@ As this example indicates, the operands
>  @tindex FIXED_CST
>  @tindex COMPLEX_CST
>  @tindex VECTOR_CST
> +@tindex VEC_DUPLICATE_CST
>  @tindex STRING_CST
>  @findex TREE_STRING_LENGTH
>  @findex TREE_STRING_POINTER
> @@ -1089,6 +1090,14 @@ constant nodes.  Each individual constan
>  double constant node.  The first operand is a @code{TREE_LIST} of the
>  constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
>
> +@item VEC_DUPLICATE_CST
> +These nodes represent a vector constant in which every element has the
> +same scalar value.  At present only variable-length vectors use
> +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
> +instead.  The scalar element value is given by
> +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
> +element of a @code{VECTOR_CST}.
> +
>  @item STRING_CST
>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>  returns the length of the string, as an @code{int}.  The
> @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
>
>  @node Vectors
>  @subsection Vectors
> +@tindex VEC_DUPLICATE_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
>  @tindex VEC_PACK_TRUNC_EXPR
>  @tindex VEC_PACK_SAT_EXPR
>  @tindex VEC_PACK_FIX_TRUNC_EXPR
> +@tindex VEC_COND_EXPR
>  @tindex SAD_EXPR
>
>  @table @code
> +@item VEC_DUPLICATE_EXPR
> +This node has a single operand and represents a vector in which every
> +element is equal to that operand.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-10-23 11:41:22.189466342 +0100
> +++ gcc/doc/md.texi     2017-10-23 11:41:51.761413027 +0100
> @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
>  the vector mode @var{m}, or a vector mode with the same element mode and
>  smaller number of elements.
>
> +@cindex @code{vec_duplicate@var{m}} instruction pattern
> +@item @samp{vec_duplicate@var{m}}
> +Initialize vector output operand 0 so that each element has the value given
> +by scalar input operand 1.  The vector has mode @var{m} and the scalar has
> +the mode appropriate for one element of @var{m}.
> +
> +This pattern only handles duplicates of non-constant inputs.  Constant
> +vectors go through the @code{mov@var{m}} pattern instead.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree.def        2017-10-23 11:41:51.774917721 +0100
> @@ -304,6 +304,10 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
>  /* Contents are in VECTOR_CST_ELTS field.  */
>  DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which every element is equal to
> +   VEC_DUPLICATE_CST_ELT.  */
> +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
> +
>  /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
>  DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -534,6 +538,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
>     1 and 2 are NULL.  The operands are then taken from the cfg edges. */
>  DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Represents a vector in which every element is equal to operand 0.  */
> +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>
> Index: gcc/tree-core.h
> ===================================================================
> --- gcc/tree-core.h     2017-10-23 11:41:25.862065318 +0100
> +++ gcc/tree-core.h     2017-10-23 11:41:51.771059237 +0100
> @@ -975,7 +975,8 @@ struct GTY(()) tree_base {
>      /* VEC length.  This field is only used with TREE_VEC.  */
>      int length;
>
> -    /* Number of elements.  This field is only used with VECTOR_CST.  */
> +    /* Number of elements.  This field is only used with VECTOR_CST
> +       and VEC_DUPLICATE_CST.  It is always 1 for VEC_DUPLICATE_CST.  */
>      unsigned int nelts;
>
>      /* SSA version number.  This field is only used with SSA_NAME.  */
> @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
>     public_flag:
>
>         TREE_OVERFLOW in
> -           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
> +           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
>
>         TREE_PUBLIC in
>             VAR_DECL, FUNCTION_DECL
> @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
>
>  struct GTY(()) tree_vector {
>    struct tree_typed typed;
> -  tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
> +  tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
>  };
>
>  struct GTY(()) tree_identifier {
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2017-10-23 11:41:23.517482774 +0100
> +++ gcc/tree.h  2017-10-23 11:41:51.775882341 +0100
> @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
>  #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
>    (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
> -   there was an overflow in folding.  */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> +   this means there was an overflow in folding.  */
>
>  #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1030,6 +1030,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
>  #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
>  #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
>
> +/* In a VEC_DUPLICATE_CST node.  */
> +#define VEC_DUPLICATE_CST_ELT(NODE) \
> +  (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
> +
>  /* Define fields and accessors for some special-purpose tree nodes.  */
>
>  #define IDENTIFIER_LENGTH(NODE) \
> @@ -4025,6 +4029,7 @@ extern tree build_int_cst (tree, HOST_WI
>  extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
>  extern tree build_int_cst_type (tree, HOST_WIDE_INT);
>  extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
> +extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
>  extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
>  extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
>  extern tree build_vector_from_val (tree, tree);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-10-23 11:41:23.515548300 +0100
> +++ gcc/tree.c  2017-10-23 11:41:51.774917721 +0100
> @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
>      case FIXED_CST:            return TS_FIXED_CST;
>      case COMPLEX_CST:          return TS_COMPLEX;
>      case VECTOR_CST:           return TS_VECTOR;
> +    case VEC_DUPLICATE_CST:    return TS_VECTOR;
>      case STRING_CST:           return TS_STRING;
>        /* tcc_exceptional cases.  */
>      case ERROR_MARK:           return TS_COMMON;
> @@ -816,6 +817,7 @@ tree_code_size (enum tree_code code)
>         case FIXED_CST:         return sizeof (struct tree_fixed_cst);
>         case COMPLEX_CST:       return sizeof (struct tree_complex);
>         case VECTOR_CST:        return sizeof (struct tree_vector);
> +       case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
>         case STRING_CST:        gcc_unreachable ();
>         default:
>           return lang_hooks.tree_size (code);
> @@ -875,6 +877,9 @@ tree_size (const_tree node)
>        return (sizeof (struct tree_vector)
>               + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
>
> +    case VEC_DUPLICATE_CST:
> +      return sizeof (struct tree_vector);
> +
>      case STRING_CST:
>        return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1682,6 +1687,30 @@ cst_and_fits_in_hwi (const_tree x)
>           && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
>  }
>
> +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
> +
> +   Note that this function is only suitable for callers that specifically
> +   need a VEC_DUPLICATE_CST node.  Use build_vector_from_val to duplicate
> +   a general scalar into a general vector type.  */
> +
> +tree
> +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
> +{
> +  int length = sizeof (struct tree_vector);
> +
> +  record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
> +
> +  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> +  TREE_SET_CODE (t, VEC_DUPLICATE_CST);
> +  TREE_TYPE (t) = type;
> +  t->base.u.nelts = 1;
> +  VEC_DUPLICATE_CST_ELT (t) = exp;
> +  TREE_CONSTANT (t) = 1;
> +
> +  return t;
> +}
> +
>  /* Build a newly constructed VECTOR_CST node of length LEN.  */
>
>  tree
> @@ -2343,6 +2372,8 @@ integer_zerop (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2369,6 +2400,8 @@ integer_onep (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2407,6 +2440,9 @@ integer_all_onesp (const_tree expr)
>        return 1;
>      }
>
> +  else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
> +    return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
> +
>    else if (TREE_CODE (expr) != INTEGER_CST)
>      return 0;
>
> @@ -2463,7 +2499,7 @@ integer_nonzerop (const_tree expr)
>  int
>  integer_truep (const_tree expr)
>  {
> -  if (TREE_CODE (expr) == VECTOR_CST)
> +  if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
>      return integer_all_onesp (expr);
>    return integer_onep (expr);
>  }
> @@ -2634,6 +2670,8 @@ real_zerop (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2662,6 +2700,8 @@ real_onep (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return real_onep (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2689,6 +2729,8 @@ real_minus_onep (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -7091,6 +7133,9 @@ add_expr (const_tree t, inchash::hash &h
>           inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
>         return;
>        }
> +    case VEC_DUPLICATE_CST:
> +      inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
> +      return;
>      case SSA_NAME:
>        /* We can just compare by pointer.  */
>        hstate.add_wide_int (SSA_NAME_VERSION (t));
> @@ -10345,6 +10390,9 @@ initializer_zerop (const_tree init)
>         return true;
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
> +
>      case CONSTRUCTOR:
>        {
>         unsigned HOST_WIDE_INT idx;
> @@ -10390,7 +10438,13 @@ uniform_vector_p (const_tree vec)
>
>    gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
>
> -  if (TREE_CODE (vec) == VECTOR_CST)
> +  if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
> +    return VEC_DUPLICATE_CST_ELT (vec);
> +
> +  else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
> +    return TREE_OPERAND (vec, 0);
> +
> +  else if (TREE_CODE (vec) == VECTOR_CST)
>      {
>        first = VECTOR_CST_ELT (vec, 0);
>        for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
> @@ -11095,6 +11149,7 @@ #define WALK_SUBTREE_TAIL(NODE)                         \
>      case REAL_CST:
>      case FIXED_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>      case BLOCK:
>      case PLACEHOLDER_EXPR:
> @@ -12381,6 +12436,12 @@ drop_tree_overflow (tree t)
>             elt = drop_tree_overflow (elt);
>         }
>      }
> +  if (TREE_CODE (t) == VEC_DUPLICATE_CST)
> +    {
> +      tree *elt = &VEC_DUPLICATE_CST_ELT (t);
> +      if (TREE_OVERFLOW (*elt))
> +       *elt = drop_tree_overflow (*elt);
> +    }
>    return t;
>  }
>
> @@ -13798,6 +13859,92 @@ test_integer_constants ()
>    ASSERT_EQ (type, TREE_TYPE (zero));
>  }
>
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
> +   for integral type TYPE.  */
> +
> +static void
> +test_vec_duplicate_predicates_int (tree type)
> +{
> +  tree vec_type = build_vector_type (type, 4);
> +
> +  tree zero = build_zero_cst (type);
> +  tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
> +  ASSERT_TRUE (integer_zerop (vec_zero));
> +  ASSERT_FALSE (integer_onep (vec_zero));
> +  ASSERT_FALSE (integer_minus_onep (vec_zero));
> +  ASSERT_FALSE (integer_all_onesp (vec_zero));
> +  ASSERT_FALSE (integer_truep (vec_zero));
> +  ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> +  tree one = build_one_cst (type);
> +  tree vec_one = build_vec_duplicate_cst (vec_type, one);
> +  ASSERT_FALSE (integer_zerop (vec_one));
> +  ASSERT_TRUE (integer_onep (vec_one));
> +  ASSERT_FALSE (integer_minus_onep (vec_one));
> +  ASSERT_FALSE (integer_all_onesp (vec_one));
> +  ASSERT_FALSE (integer_truep (vec_one));
> +  ASSERT_FALSE (initializer_zerop (vec_one));
> +
> +  tree minus_one = build_minus_one_cst (type);
> +  tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
> +  ASSERT_FALSE (integer_zerop (vec_minus_one));
> +  ASSERT_FALSE (integer_onep (vec_minus_one));
> +  ASSERT_TRUE (integer_minus_onep (vec_minus_one));
> +  ASSERT_TRUE (integer_all_onesp (vec_minus_one));
> +  ASSERT_TRUE (integer_truep (vec_minus_one));
> +  ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> +  tree x = create_tmp_var_raw (type, "x");
> +  tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
> +  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> +  ASSERT_EQ (uniform_vector_p (vec_one), one);
> +  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> +  ASSERT_EQ (uniform_vector_p (vec_x), x);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
> +   type TYPE.  */
> +
> +static void
> +test_vec_duplicate_predicates_float (tree type)
> +{
> +  tree vec_type = build_vector_type (type, 4);
> +
> +  tree zero = build_zero_cst (type);
> +  tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
> +  ASSERT_TRUE (real_zerop (vec_zero));
> +  ASSERT_FALSE (real_onep (vec_zero));
> +  ASSERT_FALSE (real_minus_onep (vec_zero));
> +  ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> +  tree one = build_one_cst (type);
> +  tree vec_one = build_vec_duplicate_cst (vec_type, one);
> +  ASSERT_FALSE (real_zerop (vec_one));
> +  ASSERT_TRUE (real_onep (vec_one));
> +  ASSERT_FALSE (real_minus_onep (vec_one));
> +  ASSERT_FALSE (initializer_zerop (vec_one));
> +
> +  tree minus_one = build_minus_one_cst (type);
> +  tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
> +  ASSERT_FALSE (real_zerop (vec_minus_one));
> +  ASSERT_FALSE (real_onep (vec_minus_one));
> +  ASSERT_TRUE (real_minus_onep (vec_minus_one));
> +  ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> +  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> +  ASSERT_EQ (uniform_vector_p (vec_one), one);
> +  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
> +
> +static void
> +test_vec_duplicate_predicates ()
> +{
> +  test_vec_duplicate_predicates_int (integer_type_node);
> +  test_vec_duplicate_predicates_float (float_type_node);
> +}
> +
>  /* Verify identifiers.  */
>
>  static void
> @@ -13826,6 +13973,7 @@ test_labels ()
>  tree_c_tests ()
>  {
>    test_integer_constants ();
> +  test_vec_duplicate_predicates ();
>    test_identifiers ();
>    test_labels ();
>  }
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-10-23 11:41:23.137358624 +0100
> +++ gcc/cfgexpand.c     2017-10-23 11:41:51.760448406 +0100
> @@ -5049,6 +5049,8 @@ expand_debug_expr (tree exp)
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_PERM_EXPR:
> +    case VEC_DUPLICATE_CST:
> +    case VEC_DUPLICATE_EXPR:
>        return NULL;
>
>      /* Misc codes.  */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-pretty-print.c     2017-10-23 11:41:51.772023858 +0100
> @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
>        }
>        break;
>
> +    case VEC_DUPLICATE_CST:
> +      pp_string (pp, "{ ");
> +      dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
> +      pp_string (pp, ", ... }");
> +      break;
> +
>      case FUNCTION_TYPE:
>      case METHOD_TYPE:
>        dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, " > ");
>        break;
>
> +    case VEC_DUPLICATE_EXPR:
> +      pp_space (pp);
> +      for (str = get_tree_code_name (code); *str; str++)
> +       pp_character (pp, TOUPPER (*str));
> +      pp_string (pp, " < ");
> +      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> +      pp_string (pp, " > ");
> +      break;
> +
>      case VEC_UNPACK_HI_EXPR:
>        pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
>        dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c     2017-10-23 11:41:24.407340836 +0100
> +++ gcc/dwarf2out.c     2017-10-23 11:41:51.763342269 +0100
> @@ -18862,6 +18862,7 @@ rtl_for_decl_init (tree init, tree type)
>         switch (TREE_CODE (init))
>           {
>           case VECTOR_CST:
> +         case VEC_DUPLICATE_CST:
>             break;
>           case CONSTRUCTOR:
>             if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h   2017-10-23 11:38:53.934094740 +0100
> +++ gcc/gimple-expr.h   2017-10-23 11:41:51.765271511 +0100
> @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
>      case FIXED_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>        return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2017-10-23 11:41:25.531270256 +0100
> +++ gcc/gimplify.c      2017-10-23 11:41:51.766236132 +0100
> @@ -11506,6 +11506,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>         case STRING_CST:
>         case COMPLEX_CST:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>           /* Drop the overflow flag on constants, we do not want
>              that in the GIMPLE IL.  */
>           if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-isl-ast-to-gimple.c
> ===================================================================
> --- gcc/graphite-isl-ast-to-gimple.c    2017-10-23 11:41:23.205065216 +0100
> +++ gcc/graphite-isl-ast-to-gimple.c    2017-10-23 11:41:51.767200753 +0100
> @@ -222,7 +222,8 @@ enum phi_node_kind
>      return TREE_CODE (op) == INTEGER_CST
>        || TREE_CODE (op) == REAL_CST
>        || TREE_CODE (op) == COMPLEX_CST
> -      || TREE_CODE (op) == VECTOR_CST;
> +      || TREE_CODE (op) == VECTOR_CST
> +      || TREE_CODE (op) == VEC_DUPLICATE_CST;
>    }
>
>  private:
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c       2017-10-23 11:41:25.533204730 +0100
> +++ gcc/graphite-scop-detection.c       2017-10-23 11:41:51.767200753 +0100
> @@ -1243,6 +1243,7 @@ scan_tree_for_params (sese_info_p s, tre
>      case REAL_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>        break;
>
>     default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c        2017-10-23 11:38:53.934094740 +0100
> +++ gcc/ipa-icf-gimple.c        2017-10-23 11:41:51.767200753 +0100
> @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>      case REAL_CST:
>        {
> @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>      case REAL_CST:
>      case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c       2017-10-23 11:41:25.874639400 +0100
> +++ gcc/ipa-icf.c       2017-10-23 11:41:51.768165374 +0100
> @@ -1478,6 +1478,7 @@ sem_item::add_expr (const_tree exp, inch
>      case STRING_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>        inchash::add_expr (exp, hstate);
>        break;
>      case CONSTRUCTOR:
> @@ -2030,6 +2031,9 @@ sem_variable::equals (tree t1, tree t2)
>
>         return 1;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
> +                                  VEC_DUPLICATE_CST_ELT (t2));
>      case ARRAY_REF:
>      case ARRAY_RANGE_REF:
>        {
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2017-10-23 11:38:53.934094740 +0100
> +++ gcc/match.pd        2017-10-23 11:41:51.768165374 +0100
> @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match negate_expr_p
>   VECTOR_CST
>   (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
> +(match negate_expr_p
> + VEC_DUPLICATE_CST
> + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
>
>  /* (-A) * (-B) -> A * B  */
>  (simplify
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c    2017-10-23 11:38:53.934094740 +0100
> +++ gcc/print-tree.c    2017-10-23 11:41:51.769129995 +0100
> @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
>           }
>           break;
>
> +       case VEC_DUPLICATE_CST:
> +         print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
> +         break;
> +
>         case COMPLEX_CST:
>           print_node (file, "real", TREE_REALPART (node), indent + 4);
>           print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-chkp.c
> ===================================================================
> --- gcc/tree-chkp.c     2017-10-23 11:41:23.201196268 +0100
> +++ gcc/tree-chkp.c     2017-10-23 11:41:51.770094616 +0100
> @@ -3800,6 +3800,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>        if (integer_zerop (ptr_src))
>         bounds = chkp_get_none_bounds ();
>        else
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c        2017-10-23 11:41:23.228278904 +0100
> +++ gcc/tree-loop-distribution.c        2017-10-23 11:41:51.771059237 +0100
> @@ -921,6 +921,9 @@ const_with_all_bytes_same (tree val)
>            && CONSTRUCTOR_NELTS (val) == 0))
>      return 0;
>
> +  if (TREE_CODE (val) == VEC_DUPLICATE_CST)
> +    return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
> +
>    if (real_zerop (val))
>      {
>        /* Only return 0 for +0.0, not for -0.0, which doesn't have
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
> @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
>         case STRING_CST:
>         case RESULT_DECL:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>         case COMPLEX_CST:
>         case INTEGER_CST:
>         case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c  2017-10-23 11:41:25.549647760 +0100
> +++ gcc/tree-ssa-pre.c  2017-10-23 11:41:51.772023858 +0100
> @@ -2675,6 +2675,7 @@ create_component_ref_by_pieces_1 (basic_
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case REAL_CST:
>      case CONSTRUCTOR:
>      case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c        2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-ssa-sccvn.c        2017-10-23 11:41:51.773953100 +0100
> @@ -858,6 +858,7 @@ copy_reference_ops_from_ref (tree ref, v
>         case INTEGER_CST:
>         case COMPLEX_CST:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>         case REAL_CST:
>         case FIXED_CST:
>         case CONSTRUCTOR:
> @@ -1050,6 +1051,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
>         case INTEGER_CST:
>         case COMPLEX_CST:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>         case REAL_CST:
>         case CONSTRUCTOR:
>         case CONST_DECL:
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-vect-generic.c     2017-10-23 11:41:51.773953100 +0100
> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>  ssa_uniform_vector_p (tree op)
>  {
>    if (TREE_CODE (op) == VECTOR_CST
> +      || TREE_CODE (op) == VEC_DUPLICATE_CST
>        || TREE_CODE (op) == CONSTRUCTOR)
>      return uniform_vector_p (op);
>    if (TREE_CODE (op) == SSA_NAME)
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c        2017-10-23 11:41:25.822408600 +0100
> +++ gcc/varasm.c        2017-10-23 11:41:51.775882341 +0100
> @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
>      CASE_CONVERT:
>        return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> +    case VEC_DUPLICATE_CST:
> +      return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
> +
>      default:
>        /* A language specific constant. Just hash the code.  */
>        return code;
> @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
>         return 1;
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
> +                              VEC_DUPLICATE_CST_ELT (t2));
> +
>      case CONSTRUCTOR:
>        {
>         vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-10-23 11:41:23.535860278 +0100
> +++ gcc/fold-const.c    2017-10-23 11:41:51.765271511 +0100
> @@ -418,6 +418,9 @@ negate_expr_p (tree t)
>         return true;
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
> +
>      case COMPLEX_EXPR:
>        return negate_expr_p (TREE_OPERAND (t, 0))
>              && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
>         return build_vector (type, elts);
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      {
> +       tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
> +       if (!sub)
> +         return NULL_TREE;
> +       return build_vector_from_val (type, sub);
> +      }
> +
>      case COMPLEX_EXPR:
>        if (negate_expr_p (t))
>         return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
>        return build_vector (type, elts);
>      }
>
> +  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> +      && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
> +    {
> +      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
> +                             VEC_DUPLICATE_CST_ELT (arg2));
> +      if (!sub)
> +       return NULL_TREE;
> +      return build_vector_from_val (TREE_TYPE (arg1), sub);
> +    }
> +
>    /* Shifts allow a scalar offset for a vector.  */
>    if (TREE_CODE (arg1) == VECTOR_CST
>        && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
>
>        return build_vector (type, elts);
>      }
> +
> +  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> +      && TREE_CODE (arg2) == INTEGER_CST)
> +    {
> +      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
> +      if (!sub)
> +       return NULL_TREE;
> +      return build_vector_from_val (TREE_TYPE (arg1), sub);
> +    }
>    return NULL_TREE;
>  }
>
> @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
>           if (i == count)
>             return build_vector (type, elements);
>         }
> +      else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
> +       {
> +         tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
> +                                VEC_DUPLICATE_CST_ELT (arg0));
> +         if (sub)
> +           return build_vector_from_val (type, sub);
> +       }
>        break;
>
>      case TRUTH_NOT_EXPR:
> @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
>         return res;
>        }
>
> +    case VEC_DUPLICATE_EXPR:
> +      if (CONSTANT_CLASS_P (arg0))
> +       return build_vector_from_val (type, arg0);
> +      return NULL_TREE;
> +
>      default:
>        break;
>      }
> @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
>             }
>           return build_vector (type, v);
>         }
> +      if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> +         && (TYPE_VECTOR_SUBPARTS (type)
> +             == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
> +       {
> +         tree sub = fold_convert_const (code, TREE_TYPE (type),
> +                                        VEC_DUPLICATE_CST_ELT (arg1));
> +         if (sub)
> +           return build_vector_from_val (type, sub);
> +       }
>      }
>    return NULL_TREE;
>  }
> @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
>           return 1;
>         }
>
> +      case VEC_DUPLICATE_CST:
> +       return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
> +                               VEC_DUPLICATE_CST_ELT (arg1), flags);
> +
>        case COMPLEX_CST:
>         return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
>                                  flags)
> @@ -7492,6 +7547,20 @@ can_native_interpret_type_p (tree type)
>  static tree
>  fold_view_convert_expr (tree type, tree expr)
>  {
> +  /* Recurse on duplicated vectors if the target type is also a vector
> +     and if the elements line up.  */
> +  tree expr_type = TREE_TYPE (expr);
> +  if (TREE_CODE (expr) == VEC_DUPLICATE_CST
> +      && VECTOR_TYPE_P (type)
> +      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
> +      && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
> +    {
> +      tree sub = fold_view_convert_expr (TREE_TYPE (type),
> +                                        VEC_DUPLICATE_CST_ELT (expr));
> +      if (sub)
> +       return build_vector_from_val (type, sub);
> +    }
> +
>    /* We support up to 512-bit values (for V8DFmode).  */
>    unsigned char buffer[64];
>    int len;
> @@ -8891,6 +8960,15 @@ exact_inverse (tree type, tree cst)
>         return build_vector (type, elts);
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      {
> +       tree sub = exact_inverse (TREE_TYPE (type),
> +                                 VEC_DUPLICATE_CST_ELT (cst));
> +       if (!sub)
> +         return NULL_TREE;
> +       return build_vector_from_val (type, sub);
> +      }
> +
>      default:
>        return NULL_TREE;
>      }
> @@ -11969,6 +12047,9 @@ fold_checksum_tree (const_tree expr, str
>           for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
>             fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
>           break;
> +       case VEC_DUPLICATE_CST:
> +         fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
> +         break;
>         default:
>           break;
>         }
> @@ -14436,6 +14517,36 @@ test_vector_folding ()
>    ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
>  }
>
> +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
> +
> +static void
> +test_vec_duplicate_folding ()
> +{
> +  tree type = build_vector_type (ssizetype, 4);
> +  tree dup5 = build_vec_duplicate_cst (type, ssize_int (5));
> +  tree dup3 = build_vec_duplicate_cst (type, ssize_int (3));
> +
> +  tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
> +  ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
> +
> +  tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
> +  ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
> +
> +  tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
> +  ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
> +
> +  tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
> +  ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
> +
> +  tree size_vector = build_vector_type (sizetype, 4);
> +  tree size_dup5 = fold_convert (size_vector, dup5);
> +  ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
> +
> +  tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
> +  tree dup5_cst = build_vector_from_val (type, ssize_int (5));
> +  ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
> +}
> +
>  /* Run all of the selftests within this file.  */
>
>  void
> @@ -14443,6 +14554,7 @@ fold_const_c_tests ()
>  {
>    test_arithmetic_folding ();
>    test_vector_folding ();
> +  test_vec_duplicate_folding ();
>  }
>
>  } // namespace selftest
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-10-23 11:38:53.934094740 +0100
> +++ gcc/optabs.def      2017-10-23 11:41:51.769129995 +0100
> @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
>
>  OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
> +
> +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2017-10-23 11:38:53.934094740 +0100
> +++ gcc/optabs-tree.c   2017-10-23 11:41:51.768165374 +0100
> @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
>        return TYPE_UNSIGNED (type) ?
>         vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
>
> +    case VEC_DUPLICATE_EXPR:
> +      return vec_duplicate_optab;
> +
>      default:
>        break;
>      }
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-10-23 11:38:53.934094740 +0100
> +++ gcc/optabs.h        2017-10-23 11:41:51.769129995 +0100
> @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
>                                   enum optab_methods methods);
>  extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
>                                 enum optab_methods);
> +extern rtx expand_vector_broadcast (machine_mode, rtx);
>
>  /* Generate code for a simple binary or unary operation.  "Simple" in
>     this case means "can be unambiguously described by a (mode, code)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-10-23 11:41:41.549050496 +0100
> +++ gcc/optabs.c        2017-10-23 11:41:51.769129995 +0100
> @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
>     mode of OP must be the element mode of VMODE.  If OP is a constant,
>     then the return value will be a constant.  */
>
> -static rtx
> +rtx
>  expand_vector_broadcast (machine_mode vmode, rtx op)
>  {
>    enum insn_code icode;
> @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
>    if (CONSTANT_P (op))
>      return gen_const_vec_duplicate (vmode, op);
>
> +  icode = optab_handler (vec_duplicate_optab, vmode);
> +  if (icode != CODE_FOR_nothing)
> +    {
> +      struct expand_operand ops[2];
> +      create_output_operand (&ops[0], NULL_RTX, vmode);
> +      create_input_operand (&ops[1], op, GET_MODE (op));
> +      expand_insn (icode, 2, ops);
> +      return ops[0].value;
> +    }
> +
>    /* ??? If the target doesn't have a vec_init, then we have no easy way
>       of performing this operation.  Most of this sort of generic support
>       is hidden away in the vector lowering support in gimple.  */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-10-23 11:41:39.187050437 +0100
> +++ gcc/expr.c  2017-10-23 11:41:51.764306890 +0100
> @@ -6572,7 +6572,8 @@ store_constructor (tree exp, rtx target,
>         constructor_elt *ce;
>         int i;
>         int need_to_clear;
> -       int icode = CODE_FOR_nothing;
> +       insn_code icode = CODE_FOR_nothing;
> +       tree elt;
>         tree elttype = TREE_TYPE (type);
>         int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
>         machine_mode eltmode = TYPE_MODE (elttype);
> @@ -6582,13 +6583,30 @@ store_constructor (tree exp, rtx target,
>         unsigned n_elts;
>         alias_set_type alias;
>         bool vec_vec_init_p = false;
> +       machine_mode mode = GET_MODE (target);
>
>         gcc_assert (eltmode != BLKmode);
>
> +       /* Try using vec_duplicate_optab for uniform vectors.  */
> +       if (!TREE_SIDE_EFFECTS (exp)
> +           && VECTOR_MODE_P (mode)
> +           && eltmode == GET_MODE_INNER (mode)
> +           && ((icode = optab_handler (vec_duplicate_optab, mode))
> +               != CODE_FOR_nothing)
> +           && (elt = uniform_vector_p (exp)))
> +         {
> +           struct expand_operand ops[2];
> +           create_output_operand (&ops[0], target, mode);
> +           create_input_operand (&ops[1], expand_normal (elt), eltmode);
> +           expand_insn (icode, 2, ops);
> +           if (!rtx_equal_p (target, ops[0].value))
> +             emit_move_insn (target, ops[0].value);
> +           break;
> +         }
> +
>         n_elts = TYPE_VECTOR_SUBPARTS (type);
> -       if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
> +       if (REG_P (target) && VECTOR_MODE_P (mode))
>           {
> -           machine_mode mode = GET_MODE (target);
>             machine_mode emode = eltmode;
>
>             if (CONSTRUCTOR_NELTS (exp)
> @@ -6600,7 +6618,7 @@ store_constructor (tree exp, rtx target,
>                             == n_elts);
>                 emode = TYPE_MODE (etype);
>               }
> -           icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
> +           icode = convert_optab_handler (vec_init_optab, mode, emode);
>             if (icode != CODE_FOR_nothing)
>               {
>                 unsigned int i, n = n_elts;
> @@ -6648,7 +6666,7 @@ store_constructor (tree exp, rtx target,
>         if (need_to_clear && size > 0 && !vector)
>           {
>             if (REG_P (target))
> -             emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> +             emit_move_insn (target, CONST0_RTX (mode));
>             else
>               clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
>             cleared = 1;
> @@ -6656,7 +6674,7 @@ store_constructor (tree exp, rtx target,
>
>         /* Inform later passes that the old value is dead.  */
>         if (!cleared && !vector && REG_P (target))
> -         emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> +         emit_move_insn (target, CONST0_RTX (mode));
>
>          if (MEM_P (target))
>           alias = MEM_ALIAS_SET (target);
> @@ -6707,8 +6725,7 @@ store_constructor (tree exp, rtx target,
>
>         if (vector)
>           emit_insn (GEN_FCN (icode) (target,
> -                                     gen_rtx_PARALLEL (GET_MODE (target),
> -                                                       vector)));
> +                                     gen_rtx_PARALLEL (mode, vector)));
>         break;
>        }
>
> @@ -7686,6 +7703,19 @@ expand_operands (tree exp0, tree exp1, r
>  }
>
>
> +/* Expand constant vector element ELT, which has mode MODE.  This is used
> +   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
> +
> +static rtx
> +const_vector_element (scalar_mode mode, const_tree elt)
> +{
> +  if (TREE_CODE (elt) == REAL_CST)
> +    return const_double_from_real_value (TREE_REAL_CST (elt), mode);
> +  if (TREE_CODE (elt) == FIXED_CST)
> +    return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
> +  return immed_wide_int_const (wi::to_wide (elt), mode);
> +}
> +
>  /* Return a MEM that contains constant EXP.  DEFER is as for
>     output_constant_def and MODIFIER is as for expand_expr.  */
>
> @@ -9551,6 +9581,12 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
>        return target;
>
> +    case VEC_DUPLICATE_EXPR:
> +      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
> +      target = expand_vector_broadcast (mode, op0);
> +      gcc_assert (target);
> +      return target;
> +
>      case BIT_INSERT_EXPR:
>        {
>         unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -10003,6 +10039,11 @@ expand_expr_real_1 (tree exp, rtx target
>                             tmode, modifier);
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      op0 = const_vector_element (GET_MODE_INNER (mode),
> +                                 VEC_DUPLICATE_CST_ELT (exp));
> +      return gen_const_vec_duplicate (mode, op0);
> +
>      case CONST_DECL:
>        if (modifier == EXPAND_WRITE)
>         {
> @@ -11764,8 +11805,7 @@ const_vector_from_tree (tree exp)
>  {
>    rtvec v;
>    unsigned i, units;
> -  tree elt;
> -  machine_mode inner, mode;
> +  machine_mode mode;
>
>    mode = TYPE_MODE (TREE_TYPE (exp));
>
> @@ -11776,23 +11816,12 @@ const_vector_from_tree (tree exp)
>      return const_vector_mask_from_tree (exp);
>
>    units = VECTOR_CST_NELTS (exp);
> -  inner = GET_MODE_INNER (mode);
>
>    v = rtvec_alloc (units);
>
>    for (i = 0; i < units; ++i)
> -    {
> -      elt = VECTOR_CST_ELT (exp, i);
> -
> -      if (TREE_CODE (elt) == REAL_CST)
> -       RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
> -                                                        inner);
> -      else if (TREE_CODE (elt) == FIXED_CST)
> -       RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
> -                                                        inner);
> -      else
> -       RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
> -    }
> +    RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
> +                                            VECTOR_CST_ELT (exp, i));
>
>    return gen_rtx_CONST_VECTOR (mode, v);
>  }
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2017-10-23 11:41:23.529089619 +0100
> +++ gcc/internal-fn.c   2017-10-23 11:41:51.767200753 +0100
> @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
>        emit_move_insn (cntvar, const0_rtx);
>        emit_label (loop_lab);
>      }
> -  if (TREE_CODE (arg0) != VECTOR_CST)
> +  if (!CONSTANT_CLASS_P (arg0))
>      {
>        rtx arg0r = expand_normal (arg0);
>        arg0 = make_tree (TREE_TYPE (arg0), arg0r);
>      }
> -  if (TREE_CODE (arg1) != VECTOR_CST)
> +  if (!CONSTANT_CLASS_P (arg1))
>      {
>        rtx arg1r = expand_normal (arg1);
>        arg1 = make_tree (TREE_TYPE (arg1), arg1r);
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-10-23 11:41:25.864967029 +0100
> +++ gcc/tree-cfg.c      2017-10-23 11:41:51.770094616 +0100
> @@ -3803,6 +3803,17 @@ verify_gimple_assign_unary (gassign *stm
>      case CONJ_EXPR:
>        break;
>
> +    case VEC_DUPLICATE_EXPR:
> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> +       {
> +         error ("vec_duplicate should be from a scalar to a like vector");
> +         debug_generic_expr (lhs_type);
> +         debug_generic_expr (rhs1_type);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> @@ -4473,6 +4484,7 @@ verify_gimple_assign_single (gassign *st
>      case FIXED_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>        return res;
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2017-10-23 11:41:25.833048208 +0100
> +++ gcc/tree-inline.c   2017-10-23 11:41:51.771059237 +0100
> @@ -4002,6 +4002,7 @@ estimate_operator_cost (enum tree_code c
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
> +    case VEC_DUPLICATE_EXPR:
>
>        return 1;
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [08/nn] Add a fixed_size_mode class
  2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
@ 2017-10-26 11:57   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:57 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a fixed_size_mode machine_mode wrapper
> for modes that are known to have a fixed size.  That applies
> to all current modes, but future patches will add support for
> variable-sized modes.
>
> The use of this class should be pretty restricted.  One important
> use case is to hold the mode of static data, which can never be
> variable-sized with current file formats.  Another is to hold
> the modes of registers involved in __builtin_apply and
> __builtin_result, since those interfaces don't cope well with
> variable-sized data.
>
> The class can also be useful when reinterpreting the contents of
> a fixed-length bit string as a different kind of value.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * machmode.h (fixed_size_mode): New class.
>         * rtl.h (get_pool_mode): Return fixed_size_mode.
>         * gengtype.c (main): Add fixed_size_mode.
>         * target.def (get_raw_result_mode): Return a fixed_size_mode.
>         (get_raw_arg_mode): Likewise.
>         * doc/tm.texi: Regenerate.
>         * targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode.
>         * targhooks.c (default_get_reg_raw_mode): Likewise.
>         * config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise.
>         * config/mips/mips.c (mips_get_reg_raw_mode): Likewise.
>         * config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise.
>         (msp430_get_raw_result_mode): Likewise.
>         * config/avr/avr-protos.h (regmask): Use as_a <fixed_side_mode>
>         * dbxout.c (dbxout_parms): Require fixed-size modes.
>         * expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise.
>         * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
>         * omp-low.c (lower_oacc_reductions): Likewise.
>         * simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes.
>         (simplify_subreg): Update accordingly.
>         * varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode.
>         (force_const_mem): Update accordingly.  Return NULL_RTX for modes
>         that aren't fixed-size.
>         (get_pool_mode): Return a fixed_size_mode.
>         (output_constant_pool_2): Take a fixed_size_mode.
>
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h      2017-09-15 14:47:33.184331588 +0100
> +++ gcc/machmode.h      2017-10-23 11:42:52.014721093 +0100
> @@ -652,6 +652,39 @@ GET_MODE_2XWIDER_MODE (const T &m)
>  extern const unsigned char mode_complex[NUM_MACHINE_MODES];
>  #define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
>
> +/* Represents a machine mode that must have a fixed size.  The main
> +   use of this class is to represent the modes of objects that always
> +   have static storage duration, such as constant pool entries.
> +   (No current target supports the concept of variable-size static data.)  */
> +class fixed_size_mode
> +{
> +public:
> +  typedef mode_traits<fixed_size_mode>::from_int from_int;
> +
> +  ALWAYS_INLINE fixed_size_mode () {}
> +  ALWAYS_INLINE fixed_size_mode (from_int m) : m_mode (machine_mode (m)) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_int_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_float_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_mode_pod &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const scalar_int_mode_pod &m) : m_mode (m) {}
> +  ALWAYS_INLINE fixed_size_mode (const complex_mode &m) : m_mode (m) {}
> +  ALWAYS_INLINE operator machine_mode () const { return m_mode; }
> +
> +  static bool includes_p (machine_mode);
> +
> +protected:
> +  machine_mode m_mode;
> +};
> +
> +/* Return true if MODE has a fixed size.  */
> +
> +inline bool
> +fixed_size_mode::includes_p (machine_mode)
> +{
> +  return true;
> +}
> +
>  extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
>
>  /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
> Index: gcc/rtl.h
> ===================================================================
> --- gcc/rtl.h   2017-10-23 11:42:47.297720974 +0100
> +++ gcc/rtl.h   2017-10-23 11:42:52.015721094 +0100
> @@ -3020,7 +3020,7 @@ extern rtx force_const_mem (machine_mode
>  struct function;
>  extern rtx get_pool_constant (const_rtx);
>  extern rtx get_pool_constant_mark (rtx, bool *);
> -extern machine_mode get_pool_mode (const_rtx);
> +extern fixed_size_mode get_pool_mode (const_rtx);
>  extern rtx simplify_subtraction (rtx);
>  extern void decide_function_section (tree);
>
> Index: gcc/gengtype.c
> ===================================================================
> --- gcc/gengtype.c      2017-05-23 19:29:56.919436344 +0100
> +++ gcc/gengtype.c      2017-10-23 11:42:52.014721093 +0100
> @@ -5197,6 +5197,7 @@ #define POS_HERE(Call) do { pos.file = t
>        POS_HERE (do_scalar_typedef ("JCF_u2", &pos));
>        POS_HERE (do_scalar_typedef ("void", &pos));
>        POS_HERE (do_scalar_typedef ("machine_mode", &pos));
> +      POS_HERE (do_scalar_typedef ("fixed_size_mode", &pos));
>        POS_HERE (do_typedef ("PTR",
>                             create_pointer (resolve_typedef ("void", &pos)),
>                             &pos));
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def      2017-10-23 11:41:23.134456913 +0100
> +++ gcc/target.def      2017-10-23 11:42:52.017721094 +0100
> @@ -5021,7 +5021,7 @@ DEFHOOK
>   "This target hook returns the mode to be used when accessing raw return\
>   registers in @code{__builtin_return}.  Define this macro if the value\
>   in @var{reg_raw_mode} is not correct.",
> - machine_mode, (int regno),
> + fixed_size_mode, (int regno),
>   default_get_reg_raw_mode)
>
>  /* Return a mode wide enough to copy any argument value that might be
> @@ -5031,7 +5031,7 @@ DEFHOOK
>   "This target hook returns the mode to be used when accessing raw argument\
>   registers in @code{__builtin_apply_args}.  Define this macro if the value\
>   in @var{reg_raw_mode} is not correct.",
> - machine_mode, (int regno),
> + fixed_size_mode, (int regno),
>   default_get_reg_raw_mode)
>
>  HOOK_VECTOR_END (calls)
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi     2017-10-23 11:41:22.175925023 +0100
> +++ gcc/doc/tm.texi     2017-10-23 11:42:52.012721093 +0100
> @@ -4536,11 +4536,11 @@ This macro has effect in @option{-fpcc-s
>  nothing when you use @option{-freg-struct-return} mode.
>  @end defmac
>
> -@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
> +@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
>  This target hook returns the mode to be used when accessing raw return registers in @code{__builtin_return}.  Define this macro if the value in @var{reg_raw_mode} is not correct.
>  @end deftypefn
>
> -@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
> +@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
>  This target hook returns the mode to be used when accessing raw argument registers in @code{__builtin_apply_args}.  Define this macro if the value in @var{reg_raw_mode} is not correct.
>  @end deftypefn
>
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h     2017-10-02 09:08:43.318933786 +0100
> +++ gcc/targhooks.h     2017-10-23 11:42:52.017721094 +0100
> @@ -233,7 +233,7 @@ extern int default_jump_align_max_skip (
>  extern section * default_function_section(tree decl, enum node_frequency freq,
>                                           bool startup, bool exit);
>  extern machine_mode default_dwarf_frame_reg_mode (int);
> -extern machine_mode default_get_reg_raw_mode (int);
> +extern fixed_size_mode default_get_reg_raw_mode (int);
>  extern bool default_keep_leaf_when_profiled ();
>
>  extern void *default_get_pch_validity (size_t *);
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c     2017-10-23 11:41:23.195392846 +0100
> +++ gcc/targhooks.c     2017-10-23 11:42:52.017721094 +0100
> @@ -1834,10 +1834,12 @@ default_dwarf_frame_reg_mode (int regno)
>  /* To be used by targets where reg_raw_mode doesn't return the right
>     mode for registers used in apply_builtin_return and apply_builtin_arg.  */
>
> -machine_mode
> +fixed_size_mode
>  default_get_reg_raw_mode (int regno)
>  {
> -  return reg_raw_mode[regno];
> +  /* Targets must override this hook if the underlying register is
> +     variable-sized.  */
> +  return as_a <fixed_size_mode> (reg_raw_mode[regno]);
>  }
>
>  /* Return true if a leaf function should stay leaf even with profiling
> Index: gcc/config/ia64/ia64.c
> ===================================================================
> --- gcc/config/ia64/ia64.c      2017-10-23 11:41:32.363050263 +0100
> +++ gcc/config/ia64/ia64.c      2017-10-23 11:42:52.009721093 +0100
> @@ -329,7 +329,7 @@ static tree ia64_fold_builtin (tree, int
>  static tree ia64_builtin_decl (unsigned, bool);
>
>  static reg_class_t ia64_preferred_reload_class (rtx, reg_class_t);
> -static machine_mode ia64_get_reg_raw_mode (int regno);
> +static fixed_size_mode ia64_get_reg_raw_mode (int regno);
>  static section * ia64_hpux_function_section (tree, enum node_frequency,
>                                              bool, bool);
>
> @@ -11328,7 +11328,7 @@ ia64_dconst_0_375 (void)
>    return ia64_dconst_0_375_rtx;
>  }
>
> -static machine_mode
> +static fixed_size_mode
>  ia64_get_reg_raw_mode (int regno)
>  {
>    if (FR_REGNO_P (regno))
> Index: gcc/config/mips/mips.c
> ===================================================================
> --- gcc/config/mips/mips.c      2017-10-23 11:41:32.365050264 +0100
> +++ gcc/config/mips/mips.c      2017-10-23 11:42:52.010721093 +0100
> @@ -1132,7 +1132,6 @@ static rtx mips_find_pic_call_symbol (rt
>  static int mips_register_move_cost (machine_mode, reg_class_t,
>                                     reg_class_t);
>  static unsigned int mips_function_arg_boundary (machine_mode, const_tree);
> -static machine_mode mips_get_reg_raw_mode (int regno);
>  static rtx mips_gen_const_int_vector_shuffle (machine_mode, int);
>
>  /* This hash table keeps track of implicit "mips16" and "nomips16" attributes
> @@ -6111,7 +6110,7 @@ mips_function_arg_boundary (machine_mode
>
>  /* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE.  */
>
> -static machine_mode
> +static fixed_size_mode
>  mips_get_reg_raw_mode (int regno)
>  {
>    if (TARGET_FLOATXX && FP_REG_P (regno))
> Index: gcc/config/msp430/msp430.c
> ===================================================================
> --- gcc/config/msp430/msp430.c  2017-10-23 11:41:23.047405581 +0100
> +++ gcc/config/msp430/msp430.c  2017-10-23 11:42:52.011721093 +0100
> @@ -1398,16 +1398,17 @@ msp430_return_in_memory (const_tree ret_
>  #undef  TARGET_GET_RAW_ARG_MODE
>  #define TARGET_GET_RAW_ARG_MODE msp430_get_raw_arg_mode
>
> -static machine_mode
> +static fixed_size_mode
>  msp430_get_raw_arg_mode (int regno)
>  {
> -  return (regno == ARG_POINTER_REGNUM) ? VOIDmode : Pmode;
> +  return as_a <fixed_size_mode> (regno == ARG_POINTER_REGNUM
> +                                ? VOIDmode : Pmode);
>  }
>
>  #undef  TARGET_GET_RAW_RESULT_MODE
>  #define TARGET_GET_RAW_RESULT_MODE msp430_get_raw_result_mode
>
> -static machine_mode
> +static fixed_size_mode
>  msp430_get_raw_result_mode (int regno ATTRIBUTE_UNUSED)
>  {
>    return Pmode;
> Index: gcc/config/avr/avr-protos.h
> ===================================================================
> --- gcc/config/avr/avr-protos.h 2017-10-23 11:41:22.812366984 +0100
> +++ gcc/config/avr/avr-protos.h 2017-10-23 11:42:52.007721093 +0100
> @@ -132,7 +132,7 @@ extern bool avr_casei_sequence_check_ope
>  static inline unsigned
>  regmask (machine_mode mode, unsigned regno)
>  {
> -  return ((1u << GET_MODE_SIZE (mode)) - 1) << regno;
> +  return ((1u << GET_MODE_SIZE (as_a <fixed_size_mode> (mode))) - 1) << regno;
>  }
>
>  extern void avr_fix_inputs (rtx*, unsigned, unsigned);
> Index: gcc/dbxout.c
> ===================================================================
> --- gcc/dbxout.c        2017-10-10 17:55:22.088175460 +0100
> +++ gcc/dbxout.c        2017-10-23 11:42:52.011721093 +0100
> @@ -3393,12 +3393,16 @@ dbxout_parms (tree parms)
>  {
>    ++debug_nesting;
>    emit_pending_bincls_if_required ();
> +  fixed_size_mode rtl_mode, type_mode;
>
>    for (; parms; parms = DECL_CHAIN (parms))
>      if (DECL_NAME (parms)
>         && TREE_TYPE (parms) != error_mark_node
>         && DECL_RTL_SET_P (parms)
> -       && DECL_INCOMING_RTL (parms))
> +       && DECL_INCOMING_RTL (parms)
> +       /* We can't represent variable-sized types in this format.  */
> +       && is_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (parms)), &type_mode)
> +       && is_a <fixed_size_mode> (GET_MODE (DECL_RTL (parms)), &rtl_mode))
>        {
>         tree eff_type;
>         char letter;
> @@ -3555,10 +3559,9 @@ dbxout_parms (tree parms)
>             /* Make a big endian correction if the mode of the type of the
>                parameter is not the same as the mode of the rtl.  */
>             if (BYTES_BIG_ENDIAN
> -               && TYPE_MODE (TREE_TYPE (parms)) != GET_MODE (DECL_RTL (parms))
> -               && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))) < UNITS_PER_WORD)
> -             number += (GET_MODE_SIZE (GET_MODE (DECL_RTL (parms)))
> -                        - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))));
> +               && type_mode != rtl_mode
> +               && GET_MODE_SIZE (type_mode) < UNITS_PER_WORD)
> +             number += GET_MODE_SIZE (rtl_mode) - GET_MODE_SIZE (type_mode);
>           }
>         else
>           /* ??? We don't know how to represent this argument.  */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-10-23 11:42:34.915720660 +0100
> +++ gcc/expr.c  2017-10-23 11:42:52.013721093 +0100
> @@ -2628,9 +2628,10 @@ copy_blkmode_from_reg (rtx target, rtx s
>    rtx src = NULL, dst = NULL;
>    unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD);
>    unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0;
> -  machine_mode mode = GET_MODE (srcreg);
> -  machine_mode tmode = GET_MODE (target);
> -  machine_mode copy_mode;
> +  /* No current ABI uses variable-sized modes to pass a BLKmnode type.  */
> +  fixed_size_mode mode = as_a <fixed_size_mode> (GET_MODE (srcreg));
> +  fixed_size_mode tmode = as_a <fixed_size_mode> (GET_MODE (target));
> +  fixed_size_mode copy_mode;
>
>    /* BLKmode registers created in the back-end shouldn't have survived.  */
>    gcc_assert (mode != BLKmode);
> @@ -2728,19 +2729,21 @@ copy_blkmode_from_reg (rtx target, rtx s
>      }
>  }
>
> -/* Copy BLKmode value SRC into a register of mode MODE.  Return the
> +/* Copy BLKmode value SRC into a register of mode MODE_IN.  Return the
>     register if it contains any data, otherwise return null.
>
>     This is used on targets that return BLKmode values in registers.  */
>
>  rtx
> -copy_blkmode_to_reg (machine_mode mode, tree src)
> +copy_blkmode_to_reg (machine_mode mode_in, tree src)
>  {
>    int i, n_regs;
>    unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0, bytes;
>    unsigned int bitsize;
>    rtx *dst_words, dst, x, src_word = NULL_RTX, dst_word = NULL_RTX;
> -  machine_mode dst_mode;
> +  /* No current ABI uses variable-sized modes to pass a BLKmnode type.  */
> +  fixed_size_mode mode = as_a <fixed_size_mode> (mode_in);
> +  fixed_size_mode dst_mode;
>
>    gcc_assert (TYPE_MODE (TREE_TYPE (src)) == BLKmode);
>
> Index: gcc/gimple-ssa-store-merging.c
> ===================================================================
> --- gcc/gimple-ssa-store-merging.c      2017-10-09 11:50:52.446411111 +0100
> +++ gcc/gimple-ssa-store-merging.c      2017-10-23 11:42:52.014721093 +0100
> @@ -401,8 +401,11 @@ encode_tree_to_bitpos (tree expr, unsign
>      The awkwardness comes from the fact that bitpos is counted from the
>      most significant bit of a byte.  */
>
> +  /* We must be dealing with fixed-size data at this point, since the
> +     total size is also fixed.  */
> +  fixed_size_mode mode = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (expr)));
>    /* Allocate an extra byte so that we have space to shift into.  */
> -  unsigned int byte_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (expr))) + 1;
> +  unsigned int byte_size = GET_MODE_SIZE (mode) + 1;
>    unsigned char *tmpbuf = XALLOCAVEC (unsigned char, byte_size);
>    memset (tmpbuf, '\0', byte_size);
>    /* The store detection code should only have allowed constants that are
> Index: gcc/omp-low.c
> ===================================================================
> --- gcc/omp-low.c       2017-10-10 17:55:22.100175459 +0100
> +++ gcc/omp-low.c       2017-10-23 11:42:52.015721094 +0100
> @@ -5067,8 +5067,10 @@ lower_oacc_reductions (location_t loc, t
>           v1 = v2 = v3 = var;
>
>         /* Determine position in reduction buffer, which may be used
> -          by target.  */
> -       machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> +          by target.  The parser has ensured that this is not a
> +          variable-sized type.  */
> +       fixed_size_mode mode
> +         = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (var)));
>         unsigned align = GET_MODE_ALIGNMENT (mode) /  BITS_PER_UNIT;
>         offset = (offset + align - 1) & ~(align - 1);
>         tree off = build_int_cst (sizetype, offset);
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c  2017-10-23 11:41:41.550050496 +0100
> +++ gcc/simplify-rtx.c  2017-10-23 11:42:52.016721094 +0100
> @@ -48,8 +48,6 @@ #define HWI_SIGN_EXTEND(low) \
>  static rtx neg_const_int (machine_mode, const_rtx);
>  static bool plus_minus_operand_p (const_rtx);
>  static rtx simplify_plus_minus (enum rtx_code, machine_mode, rtx, rtx);
> -static rtx simplify_immed_subreg (machine_mode, rtx, machine_mode,
> -                                 unsigned int);
>  static rtx simplify_associative_operation (enum rtx_code, machine_mode,
>                                            rtx, rtx);
>  static rtx simplify_relational_operation_1 (enum rtx_code, machine_mode,
> @@ -5802,8 +5800,8 @@ simplify_ternary_operation (enum rtx_cod
>     and then repacking them again for OUTERMODE.  */
>
>  static rtx
> -simplify_immed_subreg (machine_mode outermode, rtx op,
> -                      machine_mode innermode, unsigned int byte)
> +simplify_immed_subreg (fixed_size_mode outermode, rtx op,
> +                      fixed_size_mode innermode, unsigned int byte)
>  {
>    enum {
>      value_bit = 8,
> @@ -6171,7 +6169,18 @@ simplify_subreg (machine_mode outermode,
>        || CONST_DOUBLE_AS_FLOAT_P (op)
>        || GET_CODE (op) == CONST_FIXED
>        || GET_CODE (op) == CONST_VECTOR)
> -    return simplify_immed_subreg (outermode, op, innermode, byte);
> +    {
> +      /* simplify_immed_subreg deconstructs OP into bytes and constructs
> +        the result from bytes, so it only works if the sizes of the modes
> +        are known at compile time.  Cases that apply to general modes
> +        should be handled here before calling simplify_immed_subreg.  */
> +      fixed_size_mode fs_outermode, fs_innermode;
> +      if (is_a <fixed_size_mode> (outermode, &fs_outermode)
> +         && is_a <fixed_size_mode> (innermode, &fs_innermode))
> +       return simplify_immed_subreg (fs_outermode, op, fs_innermode, byte);
> +
> +      return NULL_RTX;
> +    }
>
>    /* Changing mode twice with SUBREG => just change it once,
>       or not at all if changing back op starting mode.  */
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c        2017-10-23 11:42:34.927720660 +0100
> +++ gcc/varasm.c        2017-10-23 11:42:52.018721094 +0100
> @@ -3584,7 +3584,7 @@ struct GTY((chain_next ("%h.next"), for_
>    rtx constant;
>    HOST_WIDE_INT offset;
>    hashval_t hash;
> -  machine_mode mode;
> +  fixed_size_mode mode;
>    unsigned int align;
>    int labelno;
>    int mark;
> @@ -3760,10 +3760,11 @@ simplify_subtraction (rtx x)
>  }
>
>  /* Given a constant rtx X, make (or find) a memory constant for its value
> -   and return a MEM rtx to refer to it in memory.  */
> +   and return a MEM rtx to refer to it in memory.  IN_MODE is the mode
> +   of X.  */
>
>  rtx
> -force_const_mem (machine_mode mode, rtx x)
> +force_const_mem (machine_mode in_mode, rtx x)
>  {
>    struct constant_descriptor_rtx *desc, tmp;
>    struct rtx_constant_pool *pool;
> @@ -3772,6 +3773,11 @@ force_const_mem (machine_mode mode, rtx
>    hashval_t hash;
>    unsigned int align;
>    constant_descriptor_rtx **slot;
> +  fixed_size_mode mode;
> +
> +  /* We can't force variable-sized objects to memory.  */
> +  if (!is_a <fixed_size_mode> (in_mode, &mode))
> +    return NULL_RTX;
>
>    /* If we're not allowed to drop X into the constant pool, don't.  */
>    if (targetm.cannot_force_const_mem (mode, x))
> @@ -3881,7 +3887,7 @@ get_pool_constant_mark (rtx addr, bool *
>
>  /* Similar, return the mode.  */
>
> -machine_mode
> +fixed_size_mode
>  get_pool_mode (const_rtx addr)
>  {
>    return SYMBOL_REF_CONSTANT (addr)->mode;
> @@ -3901,7 +3907,7 @@ constant_pool_empty_p (void)
>     in MODE with known alignment ALIGN.  */
>
>  static void
> -output_constant_pool_2 (machine_mode mode, rtx x, unsigned int align)
> +output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
>  {
>    switch (GET_MODE_CLASS (mode))
>      {

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [12/nn] Add an is_narrower_int_mode helper function
  2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
@ 2017-10-26 11:59   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:59 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:24 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a function for testing whether an arbitrary mode X
> is an integer mode that is narrower than integer mode Y.  This is
> useful for code like expand_float and expand_fix that could in
> principle handle vectors as well as scalars.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * machmode.h (is_narrower_int_mode): New function
>         * optabs.c (expand_float, expand_fix): Use it.
>         * dwarf2out.c (rotate_loc_descriptor): Likewise.
>
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h      2017-10-23 11:44:06.561720156 +0100
> +++ gcc/machmode.h      2017-10-23 11:44:23.979432614 +0100
> @@ -893,6 +893,17 @@ is_complex_float_mode (machine_mode mode
>    return false;
>  }
>
> +/* Return true if MODE is a scalar integer mode with a precision
> +   smaller than LIMIT's precision.  */
> +
> +inline bool
> +is_narrower_int_mode (machine_mode mode, scalar_int_mode limit)
> +{
> +  scalar_int_mode int_mode;
> +  return (is_a <scalar_int_mode> (mode, &int_mode)
> +         && GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (limit));
> +}
> +
>  namespace mode_iterator
>  {
>    /* Start mode iterator *ITER at the first mode in class MCLASS, if any.  */
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-10-23 11:44:07.732431531 +0100
> +++ gcc/optabs.c        2017-10-23 11:44:23.980398548 +0100
> @@ -4820,7 +4820,7 @@ expand_float (rtx to, rtx from, int unsi
>        rtx value;
>        convert_optab tab = unsignedp ? ufloat_optab : sfloat_optab;
>
> -      if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION (SImode))
> +      if (is_narrower_int_mode (GET_MODE (from), SImode))
>         from = convert_to_mode (SImode, from, unsignedp);
>
>        libfunc = convert_optab_libfunc (tab, GET_MODE (to), GET_MODE (from));
> @@ -5002,7 +5002,7 @@ expand_fix (rtx to, rtx from, int unsign
>       that the mode of TO is at least as wide as SImode, since those are the
>       only library calls we know about.  */
>
> -  if (GET_MODE_PRECISION (GET_MODE (to)) < GET_MODE_PRECISION (SImode))
> +  if (is_narrower_int_mode (GET_MODE (to), SImode))
>      {
>        target = gen_reg_rtx (SImode);
>
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c     2017-10-23 11:44:05.684652559 +0100
> +++ gcc/dwarf2out.c     2017-10-23 11:44:23.979432614 +0100
> @@ -14530,8 +14530,7 @@ rotate_loc_descriptor (rtx rtl, scalar_i
>    dw_loc_descr_ref op0, op1, ret, mask[2] = { NULL, NULL };
>    int i;
>
> -  if (GET_MODE (rtlop1) != VOIDmode
> -      && GET_MODE_BITSIZE (GET_MODE (rtlop1)) < GET_MODE_BITSIZE (mode))
> +  if (is_narrower_int_mode (GET_MODE (rtlop1), mode))
>      rtlop1 = gen_rtx_ZERO_EXTEND (mode, rtlop1);
>    op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
>                             VAR_INIT_STATUS_INITIALIZED);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
@ 2017-10-26 11:59   ` Richard Biener
  2017-10-26 12:18     ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:59 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a POD version of fixed_size_mode.  The only current use
> is for storing the __builtin_apply and __builtin_result register modes,
> which were made fixed_size_modes by the previous patch.

Bah - can we update our host compiler to C++11/14 please ...?
(maybe requiring that build with GCC 4.8 as host compiler works,
GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).

Ok.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * coretypes.h (fixed_size_mode): Declare.
>         (fixed_size_mode_pod): New typedef.
>         * builtins.h (target_builtins::x_apply_args_mode)
>         (target_builtins::x_apply_result_mode): Change type to
>         fixed_size_mode_pod.
>         * builtins.c (apply_args_size, apply_result_size, result_vector)
>         (expand_builtin_apply_args_1, expand_builtin_apply)
>         (expand_builtin_return): Update accordingly.
>
> Index: gcc/coretypes.h
> ===================================================================
> --- gcc/coretypes.h     2017-09-11 17:10:58.656085547 +0100
> +++ gcc/coretypes.h     2017-10-23 11:42:57.592545063 +0100
> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>  class scalar_int_mode;
>  class scalar_float_mode;
>  class complex_mode;
> +class fixed_size_mode;
>  template<typename> class opt_mode;
>  typedef opt_mode<scalar_mode> opt_scalar_mode;
>  typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
> @@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
>  template<typename> class pod_mode;
>  typedef pod_mode<scalar_mode> scalar_mode_pod;
>  typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
> +typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
>
>  /* Subclasses of rtx_def, using indentation to show the class
>     hierarchy, along with the relevant invariant.
> Index: gcc/builtins.h
> ===================================================================
> --- gcc/builtins.h      2017-08-30 12:18:46.602740973 +0100
> +++ gcc/builtins.h      2017-10-23 11:42:57.592545063 +0100
> @@ -29,14 +29,14 @@ struct target_builtins {
>       the register is not used for calling a function.  If the machine
>       has register windows, this gives only the outbound registers.
>       INCOMING_REGNO gives the corresponding inbound register.  */
> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>
>    /* For each register that may be used for returning values, this gives
>       a mode used to copy the register's value.  VOIDmode indicates the
>       register is not used for returning values.  If the machine has
>       register windows, this gives only the outbound registers.
>       INCOMING_REGNO gives the corresponding inbound register.  */
> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>  };
>
>  extern struct target_builtins default_target_builtins;
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c      2017-10-23 11:41:23.140260335 +0100
> +++ gcc/builtins.c      2017-10-23 11:42:57.592545063 +0100
> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>    static int size = -1;
>    int align;
>    unsigned int regno;
> -  machine_mode mode;
>
>    /* The values computed by this function never change.  */
>    if (size < 0)
> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>        for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>         if (FUNCTION_ARG_REGNO_P (regno))
>           {
> -           mode = targetm.calls.get_raw_arg_mode (regno);
> +           fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>
>             gcc_assert (mode != VOIDmode);
>
> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>           }
>         else
>           {
> -           apply_args_mode[regno] = VOIDmode;
> +           apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>           }
>      }
>    return size;
> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>  {
>    static int size = -1;
>    int align, regno;
> -  machine_mode mode;
>
>    /* The values computed by this function never change.  */
>    if (size < 0)
> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>        for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>         if (targetm.calls.function_value_regno_p (regno))
>           {
> -           mode = targetm.calls.get_raw_result_mode (regno);
> +           fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>
>             gcc_assert (mode != VOIDmode);
>
> @@ -1421,7 +1419,7 @@ apply_result_size (void)
>             apply_result_mode[regno] = mode;
>           }
>         else
> -         apply_result_mode[regno] = VOIDmode;
> +         apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>
>        /* Allow targets that use untyped_call and untyped_return to override
>          the size so that machine-specific information can be stored here.  */
> @@ -1440,7 +1438,7 @@ apply_result_size (void)
>  result_vector (int savep, rtx result)
>  {
>    int regno, size, align, nelts;
> -  machine_mode mode;
> +  fixed_size_mode mode;
>    rtx reg, mem;
>    rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
>
> @@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
>  {
>    rtx registers, tem;
>    int size, align, regno;
> -  machine_mode mode;
> +  fixed_size_mode mode;
>    rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
>
>    /* Create a block where the arg-pointer, structure value address,
> @@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
>  expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
>  {
>    int size, align, regno;
> -  machine_mode mode;
> +  fixed_size_mode mode;
>    rtx incoming_args, result, reg, dest, src;
>    rtx_call_insn *call_insn;
>    rtx old_stack_level = 0;
> @@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
>  expand_builtin_return (rtx result)
>  {
>    int size, align, regno;
> -  machine_mode mode;
> +  fixed_size_mode mode;
>    rtx reg;
>    rtx_insn *call_fusage = 0;
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [13/nn] More is_a <scalar_int_mode>
  2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
@ 2017-10-26 12:03   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:03 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> alias.c:find_base_term and find_base_value checked:
>
>       if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
>
> but (a) comparing the precision seems more correct, since it's possible
> for modes to have the same memory size as Pmode but fewer bits and
> (b) the functions are called on arbitrary rtl, so there's no guarantee
> that we're handling an integer truncation.
>
> Since there's no point processing truncations of anything other than an
> integer, this patch checks that first.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * alias.c (find_base_value, find_base_term): Only process integer
>         truncations.  Check the precision rather than the size.
>
> Index: gcc/alias.c
> ===================================================================
> --- gcc/alias.c 2017-10-23 11:41:25.511925516 +0100
> +++ gcc/alias.c 2017-10-23 11:44:27.544693078 +0100
> @@ -1349,6 +1349,7 @@ known_base_value_p (rtx x)
>  find_base_value (rtx src)
>  {
>    unsigned int regno;
> +  scalar_int_mode int_mode;
>
>  #if defined (FIND_BASE_TERM)
>    /* Try machine-dependent ways to find the base term.  */
> @@ -1475,7 +1476,8 @@ find_base_value (rtx src)
>          address modes depending on the address space.  */
>        if (!target_default_pointer_address_modes_p ())
>         break;
> -      if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
> +      if (!is_a <scalar_int_mode> (GET_MODE (src), &int_mode)
> +         || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
>         break;
>        /* Fall through.  */
>      case HIGH:
> @@ -1876,6 +1878,7 @@ find_base_term (rtx x)
>    cselib_val *val;
>    struct elt_loc_list *l, *f;
>    rtx ret;
> +  scalar_int_mode int_mode;
>
>  #if defined (FIND_BASE_TERM)
>    /* Try machine-dependent ways to find the base term.  */
> @@ -1893,7 +1896,8 @@ find_base_term (rtx x)
>          address modes depending on the address space.  */
>        if (!target_default_pointer_address_modes_p ())
>         return 0;
> -      if (GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (Pmode))
> +      if (!is_a <scalar_int_mode> (GET_MODE (x), &int_mode)
> +         || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
>         return 0;
>        /* Fall through.  */
>      case HIGH:

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
@ 2017-10-26 12:07   ` Richard Biener
  2017-10-26 12:07     ` Richard Biener
  2017-10-30 15:03     ` Jeff Law
  0 siblings, 2 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:07 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a stub helper routine to provide the mode
> of a scalar shift amount, given the mode of the values
> being shifted.
>
> One long-standing problem has been to decide what this mode
> should be for arbitrary rtxes (as opposed to those directly
> tied to a target pattern).  Is it the mode of the shifted
> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
> the corresponding target pattern says?  (In which case what
> should the mode be when the target doesn't have a pattern?)
>
> For now the patch picks word_mode, which should be safe on
> all targets but could perhaps become suboptimal if the helper
> routine is used more often than it is in this patch.  As it
> stands the patch does not change the generated code.
>
> The patch also adds a helper function that constructs rtxes
> for constant shift amounts, again given the mode of the value
> being shifted.  As well as helping with the SVE patches, this
> is one step towards allowing CONST_INTs to have a real mode.

I think gen_shift_amount_mode is flawed and while encapsulating
constant shift amount RTX generation into a gen_int_shift_amount
looks good to me I'd rather have that ??? in this function (and
I'd use the mode of the RTX shifted, not word_mode...).

In the end it's up to insn recognizing to convert the op to the
expected mode and for generic RTL it's us that should decide
on the mode -- on GENERIC the shift amount has to be an
integer so why not simply use a mode that is large enough to
make the constant fit?

Just throwing in some comments here, RTL isn't my primary
expertise.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * target.h (get_shift_amount_mode): New function.
>         * emit-rtl.h (gen_int_shift_amount): Declare.
>         * emit-rtl.c (gen_int_shift_amount): New function.
>         * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>         instead of GEN_INT.
>         * calls.c (shift_return_value): Likewise.
>         * cse.c (fold_rtx): Likewise.
>         * dse.c (find_shift_sequence): Likewise.
>         * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>         (expand_shift, expand_smod_pow2): Likewise.
>         * lower-subreg.c (shift_cost): Likewise.
>         * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>         (simplify_binary_operation_1): Likewise.
>         * combine.c (try_combine, find_split_point, force_int_to_mode)
>         (simplify_shift_const_1, simplify_shift_const): Likewise.
>         (change_zero_ext): Likewise.  Use simplify_gen_binary.
>         * optabs.c (expand_superword_shift, expand_doubleword_mult)
>         (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
>         (expand_binop): Likewise.  Use get_shift_amount_mode instead
>         of word_mode as the mode of a CONST_INT shift amount.
>         (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>         Use gen_int_shift_amount instead of GEN_INT.
>         (expand_vec_perm): Update caller accordingly.  Use
>         gen_int_shift_amount instead of GEN_INT.
>
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h        2017-10-23 11:47:06.643477568 +0100
> +++ gcc/target.h        2017-10-23 11:47:11.277288162 +0100
> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>
>  extern struct gcc_target targetm;
>
> +/* Return the mode that should be used to hold a scalar shift amount
> +   when shifting values of the given mode.  */
> +/* ??? This could in principle be generated automatically from the .md
> +   shift patterns, but for now word_mode should be universally OK.  */
> +
> +inline scalar_int_mode
> +get_shift_amount_mode (machine_mode)
> +{
> +  return word_mode;
> +}
> +
>  #ifdef GCC_TM_H
>
>  #ifndef CUMULATIVE_ARGS_MAGIC
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h      2017-10-23 11:47:06.643477568 +0100
> +++ gcc/emit-rtl.h      2017-10-23 11:47:11.274393237 +0100
> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>  extern void adjust_reg_mode (rtx, machine_mode);
>  extern int mem_expr_equal_p (const_tree, const_tree);
> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>
>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c      2017-10-23 11:47:06.643477568 +0100
> +++ gcc/emit-rtl.c      2017-10-23 11:47:11.273428262 +0100
> @@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
>      }
>  }
>
> +/* Return a constant shift amount for shifting a value of mode MODE
> +   by VALUE bits.  */
> +
> +rtx
> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
> +{
> +  return gen_int_mode (value, get_shift_amount_mode (mode));
> +}
> +
>  /* Initialize fields of rtl_data related to stack alignment.  */
>
>  void
> Index: gcc/asan.c
> ===================================================================
> --- gcc/asan.c  2017-10-23 11:47:06.643477568 +0100
> +++ gcc/asan.c  2017-10-23 11:47:11.270533336 +0100
> @@ -1388,7 +1388,7 @@ asan_emit_stack_protection (rtx base, rt
>    TREE_ASM_WRITTEN (id) = 1;
>    emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>    shadow_base = expand_binop (Pmode, lshr_optab, base,
> -                             GEN_INT (ASAN_SHADOW_SHIFT),
> +                             gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>                               NULL_RTX, 1, OPTAB_DIRECT);
>    shadow_base
>      = plus_constant (Pmode, shadow_base,
> Index: gcc/calls.c
> ===================================================================
> --- gcc/calls.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/calls.c 2017-10-23 11:47:11.270533336 +0100
> @@ -2749,15 +2749,17 @@ shift_return_value (machine_mode mode, b
>    HOST_WIDE_INT shift;
>
>    gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
> -  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
> +  machine_mode value_mode = GET_MODE (value);
> +  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
>    if (shift == 0)
>      return false;
>
>    /* Use ashr rather than lshr for right shifts.  This is for the benefit
>       of the MIPS port, which requires SImode values to be sign-extended
>       when stored in 64-bit registers.  */
> -  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
> -                          value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
> +  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
> +                          value, gen_int_shift_amount (value_mode, shift),
> +                          value, 1, OPTAB_WIDEN))
>      gcc_unreachable ();
>    return true;
>  }
> Index: gcc/cse.c
> ===================================================================
> --- gcc/cse.c   2017-10-23 11:47:03.707058235 +0100
> +++ gcc/cse.c   2017-10-23 11:47:11.273428262 +0100
> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>                       || INTVAL (const_arg1) < 0))
>                 {
>                   if (SHIFT_COUNT_TRUNCATED)
> -                   canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
> -                                               & (GET_MODE_UNIT_BITSIZE (mode)
> -                                                  - 1));
> +                   canon_const_arg1 = gen_int_shift_amount
> +                     (mode, (INTVAL (const_arg1)
> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>                   else
>                     break;
>                 }
> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>                       || INTVAL (inner_const) < 0))
>                 {
>                   if (SHIFT_COUNT_TRUNCATED)
> -                   inner_const = GEN_INT (INTVAL (inner_const)
> -                                          & (GET_MODE_UNIT_BITSIZE (mode)
> -                                             - 1));
> +                   inner_const = gen_int_shift_amount
> +                     (mode, (INTVAL (inner_const)
> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>                   else
>                     break;
>                 }
> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
>                   /* As an exception, we can turn an ASHIFTRT of this
>                      form into a shift of the number of bits - 1.  */
>                   if (code == ASHIFTRT)
> -                   new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
> +                   new_const = gen_int_shift_amount
> +                     (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
>                   else if (!side_effects_p (XEXP (y, 0)))
>                     return CONST0_RTX (mode);
>                   else
> Index: gcc/dse.c
> ===================================================================
> --- gcc/dse.c   2017-10-23 11:47:06.643477568 +0100
> +++ gcc/dse.c   2017-10-23 11:47:11.273428262 +0100
> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
>                                      store_mode, byte);
>           if (ret && CONSTANT_P (ret))
>             {
> +             rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
>               ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
> -                                                    ret, GEN_INT (shift));
> +                                                    ret, shift_rtx);
>               if (ret && CONSTANT_P (ret))
>                 {
>                   byte = subreg_lowpart_offset (read_mode, new_mode);
> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
>          of one dsp where the cost of these two was not the same.  But
>          this really is a rare case anyway.  */
>        target = expand_binop (new_mode, lshr_optab, new_reg,
> -                            GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
> +                            gen_int_shift_amount (new_mode, shift),
> +                            new_reg, 1, OPTAB_DIRECT);
>
>        shift_seq = get_insns ();
>        end_sequence ();
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c        2017-10-23 11:47:06.643477568 +0100
> +++ gcc/expmed.c        2017-10-23 11:47:11.274393237 +0100
> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
>           PUT_MODE (all->zext, wider_mode);
>           PUT_MODE (all->wide_mult, wider_mode);
>           PUT_MODE (all->wide_lshr, wider_mode);
> -         XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
> +         XEXP (all->wide_lshr, 1)
> +           = gen_int_shift_amount (wider_mode, mode_bitsize);
>
>           set_mul_widen_cost (speed, wider_mode,
>                               set_src_cost (all->wide_mult, wider_mode, speed));
> @@ -908,12 +909,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
>              to make sure that for big-endian machines the higher order
>              bits are used.  */
>           if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
> -           value_word = simplify_expand_binop (word_mode, lshr_optab,
> -                                               value_word,
> -                                               GEN_INT (BITS_PER_WORD
> -                                                        - new_bitsize),
> -                                               NULL_RTX, true,
> -                                               OPTAB_LIB_WIDEN);
> +           {
> +             int shift = BITS_PER_WORD - new_bitsize;
> +             rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
> +             value_word = simplify_expand_binop (word_mode, lshr_optab,
> +                                                 value_word, shift_rtx,
> +                                                 NULL_RTX, true,
> +                                                 OPTAB_LIB_WIDEN);
> +           }
>
>           if (!store_bit_field_1 (op0, new_bitsize,
>                                   bitnum + bit_offset,
> @@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
>        if (CONST_INT_P (op1)
>           && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
>               (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
> -       op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
> -                      % GET_MODE_BITSIZE (scalar_mode));
> +       op1 = gen_int_shift_amount (mode,
> +                                   (unsigned HOST_WIDE_INT) INTVAL (op1)
> +                                   % GET_MODE_BITSIZE (scalar_mode));
>        else if (GET_CODE (op1) == SUBREG
>                && subreg_lowpart_p (op1)
>                && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
> @@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
>        && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
>                    GET_MODE_BITSIZE (scalar_mode) - 1))
>      {
> -      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
> +      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
> +                                        - INTVAL (op1)));
>        left = !left;
>        code = left ? LROTATE_EXPR : RROTATE_EXPR;
>      }
> @@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
>               if (op1 == const0_rtx)
>                 return shifted;
>               else if (CONST_INT_P (op1))
> -               other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
> -                                       - INTVAL (op1));
> +               other_amount = gen_int_shift_amount
> +                 (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>               else
>                 {
>                   other_amount
> @@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
>  expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>               int amount, rtx target, int unsignedp)
>  {
> -  return expand_shift_1 (code, mode,
> -                        shifted, GEN_INT (amount), target, unsignedp);
> +  return expand_shift_1 (code, mode, shifted,
> +                        gen_int_shift_amount (mode, amount),
> +                        target, unsignedp);
>  }
>
>  /* Likewise, but return 0 if that cannot be done.  */
> @@ -3855,7 +3861,7 @@ expand_smod_pow2 (scalar_int_mode mode,
>         {
>           HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
>           signmask = force_reg (mode, signmask);
> -         shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
> +         shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>
>           /* Use the rtx_cost of a LSHIFTRT instruction to determine
>              which instruction sequence to use.  If logical right shifts
> Index: gcc/lower-subreg.c
> ===================================================================
> --- gcc/lower-subreg.c  2017-10-23 11:47:06.643477568 +0100
> +++ gcc/lower-subreg.c  2017-10-23 11:47:11.274393237 +0100
> @@ -129,7 +129,7 @@ shift_cost (bool speed_p, struct cost_rt
>    PUT_CODE (rtxes->shift, code);
>    PUT_MODE (rtxes->shift, mode);
>    PUT_MODE (rtxes->source, mode);
> -  XEXP (rtxes->shift, 1) = GEN_INT (op1);
> +  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
>    return set_src_cost (rtxes->shift, mode, speed_p);
>  }
>
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c  2017-10-23 11:47:06.643477568 +0100
> +++ gcc/simplify-rtx.c  2017-10-23 11:47:11.277288162 +0100
> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
>           if (STORE_FLAG_VALUE == 1)
>             {
>               temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
> -                                         GEN_INT (isize - 1));
> +                                         gen_int_shift_amount (inner,
> +                                                               isize - 1));
>               if (int_mode == inner)
>                 return temp;
>               if (GET_MODE_PRECISION (int_mode) > isize)
> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
>           else if (STORE_FLAG_VALUE == -1)
>             {
>               temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
> -                                         GEN_INT (isize - 1));
> +                                         gen_int_shift_amount (inner,
> +                                                               isize - 1));
>               if (int_mode == inner)
>                 return temp;
>               if (GET_MODE_PRECISION (int_mode) > isize)
> @@ -2679,7 +2681,8 @@ simplify_binary_operation_1 (enum rtx_co
>         {
>           val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>           if (val >= 0)
> -           return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
> +           return simplify_gen_binary (ASHIFT, mode, op0,
> +                                       gen_int_shift_amount (mode, val));
>         }
>
>        /* x*2 is x+x and x*(-1) is -x */
> @@ -3303,7 +3306,8 @@ simplify_binary_operation_1 (enum rtx_co
>        /* Convert divide by power of two into shift.  */
>        if (CONST_INT_P (trueop1)
>           && (val = exact_log2 (UINTVAL (trueop1))) > 0)
> -       return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
> +       return simplify_gen_binary (LSHIFTRT, mode, op0,
> +                                   gen_int_shift_amount (mode, val));
>        break;
>
>      case DIV:
> @@ -3423,10 +3427,12 @@ simplify_binary_operation_1 (enum rtx_co
>           && IN_RANGE (INTVAL (trueop1),
>                        GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
>                        GET_MODE_UNIT_PRECISION (mode) - 1))
> -       return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
> -                                   mode, op0,
> -                                   GEN_INT (GET_MODE_UNIT_PRECISION (mode)
> -                                            - INTVAL (trueop1)));
> +       {
> +         int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
> +         rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
> +         return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
> +                                     mode, op0, new_amount_rtx);
> +       }
>  #endif
>        /* FALLTHRU */
>      case ASHIFTRT:
> @@ -3466,8 +3472,8 @@ simplify_binary_operation_1 (enum rtx_co
>               == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
>           && subreg_lowpart_p (op0))
>         {
> -         rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
> -                            + INTVAL (op1));
> +         rtx tmp = gen_int_shift_amount
> +           (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
>           tmp = simplify_gen_binary (code, inner_mode,
>                                      XEXP (SUBREG_REG (op0), 0),
>                                      tmp);
> @@ -3478,7 +3484,8 @@ simplify_binary_operation_1 (enum rtx_co
>         {
>           val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
>           if (val != INTVAL (op1))
> -           return simplify_gen_binary (code, mode, op0, GEN_INT (val));
> +           return simplify_gen_binary (code, mode, op0,
> +                                       gen_int_shift_amount (mode, val));
>         }
>        break;
>
> Index: gcc/combine.c
> ===================================================================
> --- gcc/combine.c       2017-10-23 11:47:06.643477568 +0100
> +++ gcc/combine.c       2017-10-23 11:47:11.272463287 +0100
> @@ -3773,8 +3773,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>               && INTVAL (XEXP (*split, 1)) > 0
>               && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
>             {
> +             rtx i_rtx = gen_int_shift_amount (split_mode, i);
>               SUBST (*split, gen_rtx_ASHIFT (split_mode,
> -                                            XEXP (*split, 0), GEN_INT (i)));
> +                                            XEXP (*split, 0), i_rtx));
>               /* Update split_code because we may not have a multiply
>                  anymore.  */
>               split_code = GET_CODE (*split);
> @@ -3788,8 +3789,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>               && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>             {
>               rtx nsplit = XEXP (*split, 0);
> +             rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
>               SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
> -                                            XEXP (nsplit, 0), GEN_INT (i)));
> +                                                      XEXP (nsplit, 0),
> +                                                      i_rtx));
>               /* Update split_code because we may not have a multiply
>                  anymore.  */
>               split_code = GET_CODE (*split);
> @@ -5057,12 +5060,12 @@ find_split_point (rtx *loc, rtx_insn *in
>                                       GET_MODE (XEXP (SET_SRC (x), 0))))))
>             {
>               machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
> -
> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>               SUBST (SET_SRC (x),
>                      gen_rtx_NEG (mode,
>                                   gen_rtx_LSHIFTRT (mode,
>                                                     XEXP (SET_SRC (x), 0),
> -                                                   GEN_INT (pos))));
> +                                                   pos_rtx)));
>
>               split = find_split_point (&SET_SRC (x), insn, true);
>               if (split && split != &SET_SRC (x))
> @@ -5120,11 +5123,11 @@ find_split_point (rtx *loc, rtx_insn *in
>             {
>               unsigned HOST_WIDE_INT mask
>                 = (HOST_WIDE_INT_1U << len) - 1;
> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>               SUBST (SET_SRC (x),
>                      gen_rtx_AND (mode,
>                                   gen_rtx_LSHIFTRT
> -                                 (mode, gen_lowpart (mode, inner),
> -                                  GEN_INT (pos)),
> +                                 (mode, gen_lowpart (mode, inner), pos_rtx),
>                                   gen_int_mode (mask, mode)));
>
>               split = find_split_point (&SET_SRC (x), insn, true);
> @@ -5133,14 +5136,15 @@ find_split_point (rtx *loc, rtx_insn *in
>             }
>           else
>             {
> +             int left_bits = GET_MODE_PRECISION (mode) - len - pos;
> +             int right_bits = GET_MODE_PRECISION (mode) - len;
>               SUBST (SET_SRC (x),
>                      gen_rtx_fmt_ee
>                      (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
>                       gen_rtx_ASHIFT (mode,
>                                       gen_lowpart (mode, inner),
> -                                     GEN_INT (GET_MODE_PRECISION (mode)
> -                                              - len - pos)),
> -                     GEN_INT (GET_MODE_PRECISION (mode) - len)));
> +                                     gen_int_shift_amount (mode, left_bits)),
> +                     gen_int_shift_amount (mode, right_bits)));
>
>               split = find_split_point (&SET_SRC (x), insn, true);
>               if (split && split != &SET_SRC (x))
> @@ -8915,10 +8919,11 @@ force_int_to_mode (rtx x, scalar_int_mod
>           /* Must be more sign bit copies than the mask needs.  */
>           && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>               >= exact_log2 (mask + 1)))
> -       x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
> -                                GEN_INT (GET_MODE_PRECISION (xmode)
> -                                         - exact_log2 (mask + 1)));
> -
> +       {
> +         int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
> +         x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
> +                                  gen_int_shift_amount (xmode, nbits));
> +       }
>        goto shiftrt;
>
>      case ASHIFTRT:
> @@ -10415,7 +10420,7 @@ simplify_shift_const_1 (enum rtx_code co
>  {
>    enum rtx_code orig_code = code;
>    rtx orig_varop = varop;
> -  int count;
> +  int count, log2;
>    machine_mode mode = result_mode;
>    machine_mode shift_mode;
>    scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
> @@ -10618,13 +10623,11 @@ simplify_shift_const_1 (enum rtx_code co
>              is cheaper.  But it is still better on those machines to
>              merge two shifts into one.  */
>           if (CONST_INT_P (XEXP (varop, 1))
> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>             {
> -             varop
> -               = simplify_gen_binary (ASHIFT, GET_MODE (varop),
> -                                      XEXP (varop, 0),
> -                                      GEN_INT (exact_log2 (
> -                                               UINTVAL (XEXP (varop, 1)))));
> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
> +             varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
> +                                          XEXP (varop, 0), log2_rtx);
>               continue;
>             }
>           break;
> @@ -10632,13 +10635,11 @@ simplify_shift_const_1 (enum rtx_code co
>         case UDIV:
>           /* Similar, for when divides are cheaper.  */
>           if (CONST_INT_P (XEXP (varop, 1))
> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>             {
> -             varop
> -               = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
> -                                      XEXP (varop, 0),
> -                                      GEN_INT (exact_log2 (
> -                                               UINTVAL (XEXP (varop, 1)))));
> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
> +             varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
> +                                          XEXP (varop, 0), log2_rtx);
>               continue;
>             }
>           break;
> @@ -10773,10 +10774,10 @@ simplify_shift_const_1 (enum rtx_code co
>
>               mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>                                        int_result_mode);
> -
> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>               mask_rtx
>                 = simplify_const_binary_operation (code, int_result_mode,
> -                                                  mask_rtx, GEN_INT (count));
> +                                                  mask_rtx, count_rtx);
>
>               /* Give up if we can't compute an outer operation to use.  */
>               if (mask_rtx == 0
> @@ -10832,9 +10833,10 @@ simplify_shift_const_1 (enum rtx_code co
>               if (code == ASHIFTRT && int_mode != int_result_mode)
>                 break;
>
> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>               rtx new_rtx = simplify_const_binary_operation (code, int_mode,
>                                                              XEXP (varop, 0),
> -                                                            GEN_INT (count));
> +                                                            count_rtx);
>               varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
>               count = 0;
>               continue;
> @@ -10900,7 +10902,7 @@ simplify_shift_const_1 (enum rtx_code co
>               && (new_rtx = simplify_const_binary_operation
>                   (code, int_result_mode,
>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> -                  GEN_INT (count))) != 0
> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>               && CONST_INT_P (new_rtx)
>               && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
>                                   INTVAL (new_rtx), int_result_mode,
> @@ -11043,7 +11045,7 @@ simplify_shift_const_1 (enum rtx_code co
>               && (new_rtx = simplify_const_binary_operation
>                   (ASHIFT, int_result_mode,
>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> -                  GEN_INT (count))) != 0
> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>               && CONST_INT_P (new_rtx)
>               && merge_outer_ops (&outer_op, &outer_const, PLUS,
>                                   INTVAL (new_rtx), int_result_mode,
> @@ -11064,7 +11066,7 @@ simplify_shift_const_1 (enum rtx_code co
>               && (new_rtx = simplify_const_binary_operation
>                   (code, int_result_mode,
>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> -                  GEN_INT (count))) != 0
> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>               && CONST_INT_P (new_rtx)
>               && merge_outer_ops (&outer_op, &outer_const, XOR,
>                                   INTVAL (new_rtx), int_result_mode,
> @@ -11119,12 +11121,12 @@ simplify_shift_const_1 (enum rtx_code co
>                       - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
>             {
>               rtx varop_inner = XEXP (varop, 0);
> -
> -             varop_inner
> -               = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
> -                                   XEXP (varop_inner, 0),
> -                                   GEN_INT
> -                                   (count + INTVAL (XEXP (varop_inner, 1))));
> +             int new_count = count + INTVAL (XEXP (varop_inner, 1));
> +             rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
> +                                                       new_count);
> +             varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
> +                                             XEXP (varop_inner, 0),
> +                                             new_count_rtx);
>               varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
>               count = 0;
>               continue;
> @@ -11176,7 +11178,8 @@ simplify_shift_const_1 (enum rtx_code co
>      x = NULL_RTX;
>
>    if (x == NULL_RTX)
> -    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
> +    x = simplify_gen_binary (code, shift_mode, varop,
> +                            gen_int_shift_amount (shift_mode, count));
>
>    /* If we were doing an LSHIFTRT in a wider mode than it was originally,
>       turn off all the bits that the shift would have turned off.  */
> @@ -11238,7 +11241,8 @@ simplify_shift_const (rtx x, enum rtx_co
>      return tem;
>
>    if (!x)
> -    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
> +    x = simplify_gen_binary (code, GET_MODE (varop), varop,
> +                            gen_int_shift_amount (GET_MODE (varop), count));
>    if (GET_MODE (x) != result_mode)
>      x = gen_lowpart (result_mode, x);
>    return x;
> @@ -11429,8 +11433,9 @@ change_zero_ext (rtx pat)
>           if (BITS_BIG_ENDIAN)
>             start = GET_MODE_PRECISION (inner_mode) - size - start;
>
> -         if (start)
> -           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
> +         if (start != 0)
> +           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
> +                                 gen_int_shift_amount (inner_mode, start));
>           else
>             x = XEXP (x, 0);
>           if (mode != inner_mode)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-10-23 11:47:06.643477568 +0100
> +++ gcc/optabs.c        2017-10-23 11:47:11.276323187 +0100
> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
>        if (binoptab != ashr_optab)
>         emit_move_insn (outof_target, CONST0_RTX (word_mode));
>        else
> -       if (!force_expand_binop (word_mode, binoptab,
> -                                outof_input, GEN_INT (BITS_PER_WORD - 1),
> +       if (!force_expand_binop (word_mode, binoptab, outof_input,
> +                                gen_int_shift_amount (word_mode,
> +                                                      BITS_PER_WORD - 1),
>                                  outof_target, unsignedp, methods))
>           return false;
>      }
> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
>  {
>    int low = (WORDS_BIG_ENDIAN ? 1 : 0);
>    int high = (WORDS_BIG_ENDIAN ? 0 : 1);
> -  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
> +  rtx wordm1 = (umulp ? NULL_RTX
> +               : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
>    rtx product, adjust, product_high, temp;
>
>    rtx op0_high = operand_subword_force (op0, high, mode);
> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
>        unsigned int bits = GET_MODE_PRECISION (int_mode);
>
>        if (CONST_INT_P (op1))
> -        newop1 = GEN_INT (bits - INTVAL (op1));
> +       newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
>        else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
>          newop1 = negate_rtx (GET_MODE (op1), op1);
>        else
> @@ -1399,11 +1401,11 @@ expand_binop (machine_mode mode, optab b
>        shift_mask = targetm.shift_truncation_mask (word_mode);
>        op1_mode = (GET_MODE (op1) != VOIDmode
>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
> -                 : word_mode);
> +                 : get_shift_amount_mode (word_mode));
>
>        /* Apply the truncation to constant shifts.  */
>        if (double_shift_mask > 0 && CONST_INT_P (op1))
> -       op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
> +       op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>
>        if (op1 == CONST0_RTX (op1_mode))
>         return op0;
> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
>        else
>         {
>           rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
> -         rtx first_shift_count, second_shift_count;
> +         HOST_WIDE_INT first_shift_count, second_shift_count;
>           optab reverse_unsigned_shift, unsigned_shift;
>
>           reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>
>           if (shift_count > BITS_PER_WORD)
>             {
> -             first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
> -             second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
> +             first_shift_count = shift_count - BITS_PER_WORD;
> +             second_shift_count = 2 * BITS_PER_WORD - shift_count;
>             }
>           else
>             {
> -             first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
> -             second_shift_count = GEN_INT (shift_count);
> +             first_shift_count = BITS_PER_WORD - shift_count;
> +             second_shift_count = shift_count;
>             }
> +         rtx first_shift_count_rtx
> +           = gen_int_shift_amount (word_mode, first_shift_count);
> +         rtx second_shift_count_rtx
> +           = gen_int_shift_amount (word_mode, second_shift_count);
>
>           into_temp1 = expand_binop (word_mode, unsigned_shift,
> -                                    outof_input, first_shift_count,
> +                                    outof_input, first_shift_count_rtx,
>                                      NULL_RTX, unsignedp, next_methods);
>           into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
> -                                    into_input, second_shift_count,
> +                                    into_input, second_shift_count_rtx,
>                                      NULL_RTX, unsignedp, next_methods);
>
>           if (into_temp1 != 0 && into_temp2 != 0)
> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
>             emit_move_insn (into_target, inter);
>
>           outof_temp1 = expand_binop (word_mode, unsigned_shift,
> -                                     into_input, first_shift_count,
> +                                     into_input, first_shift_count_rtx,
>                                       NULL_RTX, unsignedp, next_methods);
>           outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
> -                                     outof_input, second_shift_count,
> +                                     outof_input, second_shift_count_rtx,
>                                       NULL_RTX, unsignedp, next_methods);
>
>           if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>
>           if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
>             {
> -             temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
> -                                  unsignedp, OPTAB_DIRECT);
> +             temp = expand_binop (mode, rotl_optab, op0,
> +                                  gen_int_shift_amount (mode, 8),
> +                                  target, unsignedp, OPTAB_DIRECT);
>               if (temp)
>                 return temp;
>              }
>
>           if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
>             {
> -             temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
> -                                  unsignedp, OPTAB_DIRECT);
> +             temp = expand_binop (mode, rotr_optab, op0,
> +                                  gen_int_shift_amount (mode, 8),
> +                                  target, unsignedp, OPTAB_DIRECT);
>               if (temp)
>                 return temp;
>             }
>
>           last = get_last_insn ();
>
> -         temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
> +         temp1 = expand_binop (mode, ashl_optab, op0,
> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>                                 unsignedp, OPTAB_WIDEN);
> -         temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
> +         temp2 = expand_binop (mode, lshr_optab, op0,
> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>                                 unsignedp, OPTAB_WIDEN);
>           if (temp1 && temp2)
>             {
> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
>  }
>
>  /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
> -   vec_perm operand, assuming the second operand is a constant vector of zeroes.
> -   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
> -   shift.  */
> +   vec_perm operand (which has mode OP0_MODE), assuming the second
> +   operand is a constant vector of zeroes.  Return the shift distance in
> +   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
>  static rtx
> -shift_amt_for_vec_perm_mask (rtx sel)
> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
>  {
>    unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>         return NULL_RTX;
>      }
>
> -  return GEN_INT (first * bitsize);
> +  return gen_int_shift_amount (op0_mode, first * bitsize);
>  }
>
>  /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
>           && (shift_code != CODE_FOR_nothing
>               || shift_code_qi != CODE_FOR_nothing))
>         {
> -         shift_amt = shift_amt_for_vec_perm_mask (sel);
> +         shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>           if (shift_amt)
>             {
>               struct expand_operand ops[3];
> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
>                                    NULL, 0, OPTAB_DIRECT);
>        else
>         sel = expand_simple_binop (selmode, ASHIFT, sel,
> -                                  GEN_INT (exact_log2 (u)),
> +                                  gen_int_shift_amount (selmode,
> +                                                        exact_log2 (u)),
>                                    NULL, 0, OPTAB_DIRECT);
>        gcc_assert (sel != NULL);
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-10-26 12:07   ` Richard Biener
@ 2017-10-26 12:07     ` Richard Biener
  2017-11-20 21:04       ` Richard Sandiford
  2017-10-30 15:03     ` Jeff Law
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:07 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern).  Is it the mode of the shifted
>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>> the corresponding target pattern says?  (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch.  As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted.  As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
>
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
>
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
>
> Just throwing in some comments here, RTL isn't my primary
> expertise.

To add a little bit - shift amounts is maybe the only(?) place
where a modeless CONST_INT makes sense!  So "fixing"
that first sounds backwards.

Richard.

> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>             Alan Hayward  <alan.hayward@arm.com>
>>             David Sherwood  <david.sherwood@arm.com>
>>
>> gcc/
>>         * target.h (get_shift_amount_mode): New function.
>>         * emit-rtl.h (gen_int_shift_amount): Declare.
>>         * emit-rtl.c (gen_int_shift_amount): New function.
>>         * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>>         instead of GEN_INT.
>>         * calls.c (shift_return_value): Likewise.
>>         * cse.c (fold_rtx): Likewise.
>>         * dse.c (find_shift_sequence): Likewise.
>>         * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>>         (expand_shift, expand_smod_pow2): Likewise.
>>         * lower-subreg.c (shift_cost): Likewise.
>>         * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>>         (simplify_binary_operation_1): Likewise.
>>         * combine.c (try_combine, find_split_point, force_int_to_mode)
>>         (simplify_shift_const_1, simplify_shift_const): Likewise.
>>         (change_zero_ext): Likewise.  Use simplify_gen_binary.
>>         * optabs.c (expand_superword_shift, expand_doubleword_mult)
>>         (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
>>         (expand_binop): Likewise.  Use get_shift_amount_mode instead
>>         of word_mode as the mode of a CONST_INT shift amount.
>>         (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>>         Use gen_int_shift_amount instead of GEN_INT.
>>         (expand_vec_perm): Update caller accordingly.  Use
>>         gen_int_shift_amount instead of GEN_INT.
>>
>> Index: gcc/target.h
>> ===================================================================
>> --- gcc/target.h        2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/target.h        2017-10-23 11:47:11.277288162 +0100
>> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>>
>>  extern struct gcc_target targetm;
>>
>> +/* Return the mode that should be used to hold a scalar shift amount
>> +   when shifting values of the given mode.  */
>> +/* ??? This could in principle be generated automatically from the .md
>> +   shift patterns, but for now word_mode should be universally OK.  */
>> +
>> +inline scalar_int_mode
>> +get_shift_amount_mode (machine_mode)
>> +{
>> +  return word_mode;
>> +}
>> +
>>  #ifdef GCC_TM_H
>>
>>  #ifndef CUMULATIVE_ARGS_MAGIC
>> Index: gcc/emit-rtl.h
>> ===================================================================
>> --- gcc/emit-rtl.h      2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/emit-rtl.h      2017-10-23 11:47:11.274393237 +0100
>> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>>  extern void adjust_reg_mode (rtx, machine_mode);
>>  extern int mem_expr_equal_p (const_tree, const_tree);
>> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>>
>>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>>
>> Index: gcc/emit-rtl.c
>> ===================================================================
>> --- gcc/emit-rtl.c      2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/emit-rtl.c      2017-10-23 11:47:11.273428262 +0100
>> @@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
>>      }
>>  }
>>
>> +/* Return a constant shift amount for shifting a value of mode MODE
>> +   by VALUE bits.  */
>> +
>> +rtx
>> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
>> +{
>> +  return gen_int_mode (value, get_shift_amount_mode (mode));
>> +}
>> +
>>  /* Initialize fields of rtl_data related to stack alignment.  */
>>
>>  void
>> Index: gcc/asan.c
>> ===================================================================
>> --- gcc/asan.c  2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/asan.c  2017-10-23 11:47:11.270533336 +0100
>> @@ -1388,7 +1388,7 @@ asan_emit_stack_protection (rtx base, rt
>>    TREE_ASM_WRITTEN (id) = 1;
>>    emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>>    shadow_base = expand_binop (Pmode, lshr_optab, base,
>> -                             GEN_INT (ASAN_SHADOW_SHIFT),
>> +                             gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>>                               NULL_RTX, 1, OPTAB_DIRECT);
>>    shadow_base
>>      = plus_constant (Pmode, shadow_base,
>> Index: gcc/calls.c
>> ===================================================================
>> --- gcc/calls.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/calls.c 2017-10-23 11:47:11.270533336 +0100
>> @@ -2749,15 +2749,17 @@ shift_return_value (machine_mode mode, b
>>    HOST_WIDE_INT shift;
>>
>>    gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
>> -  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
>> +  machine_mode value_mode = GET_MODE (value);
>> +  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
>>    if (shift == 0)
>>      return false;
>>
>>    /* Use ashr rather than lshr for right shifts.  This is for the benefit
>>       of the MIPS port, which requires SImode values to be sign-extended
>>       when stored in 64-bit registers.  */
>> -  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
>> -                          value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
>> +  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
>> +                          value, gen_int_shift_amount (value_mode, shift),
>> +                          value, 1, OPTAB_WIDEN))
>>      gcc_unreachable ();
>>    return true;
>>  }
>> Index: gcc/cse.c
>> ===================================================================
>> --- gcc/cse.c   2017-10-23 11:47:03.707058235 +0100
>> +++ gcc/cse.c   2017-10-23 11:47:11.273428262 +0100
>> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>>                       || INTVAL (const_arg1) < 0))
>>                 {
>>                   if (SHIFT_COUNT_TRUNCATED)
>> -                   canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
>> -                                               & (GET_MODE_UNIT_BITSIZE (mode)
>> -                                                  - 1));
>> +                   canon_const_arg1 = gen_int_shift_amount
>> +                     (mode, (INTVAL (const_arg1)
>> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>>                   else
>>                     break;
>>                 }
>> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>>                       || INTVAL (inner_const) < 0))
>>                 {
>>                   if (SHIFT_COUNT_TRUNCATED)
>> -                   inner_const = GEN_INT (INTVAL (inner_const)
>> -                                          & (GET_MODE_UNIT_BITSIZE (mode)
>> -                                             - 1));
>> +                   inner_const = gen_int_shift_amount
>> +                     (mode, (INTVAL (inner_const)
>> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>>                   else
>>                     break;
>>                 }
>> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
>>                   /* As an exception, we can turn an ASHIFTRT of this
>>                      form into a shift of the number of bits - 1.  */
>>                   if (code == ASHIFTRT)
>> -                   new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
>> +                   new_const = gen_int_shift_amount
>> +                     (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
>>                   else if (!side_effects_p (XEXP (y, 0)))
>>                     return CONST0_RTX (mode);
>>                   else
>> Index: gcc/dse.c
>> ===================================================================
>> --- gcc/dse.c   2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/dse.c   2017-10-23 11:47:11.273428262 +0100
>> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
>>                                      store_mode, byte);
>>           if (ret && CONSTANT_P (ret))
>>             {
>> +             rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
>>               ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
>> -                                                    ret, GEN_INT (shift));
>> +                                                    ret, shift_rtx);
>>               if (ret && CONSTANT_P (ret))
>>                 {
>>                   byte = subreg_lowpart_offset (read_mode, new_mode);
>> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
>>          of one dsp where the cost of these two was not the same.  But
>>          this really is a rare case anyway.  */
>>        target = expand_binop (new_mode, lshr_optab, new_reg,
>> -                            GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
>> +                            gen_int_shift_amount (new_mode, shift),
>> +                            new_reg, 1, OPTAB_DIRECT);
>>
>>        shift_seq = get_insns ();
>>        end_sequence ();
>> Index: gcc/expmed.c
>> ===================================================================
>> --- gcc/expmed.c        2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/expmed.c        2017-10-23 11:47:11.274393237 +0100
>> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
>>           PUT_MODE (all->zext, wider_mode);
>>           PUT_MODE (all->wide_mult, wider_mode);
>>           PUT_MODE (all->wide_lshr, wider_mode);
>> -         XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
>> +         XEXP (all->wide_lshr, 1)
>> +           = gen_int_shift_amount (wider_mode, mode_bitsize);
>>
>>           set_mul_widen_cost (speed, wider_mode,
>>                               set_src_cost (all->wide_mult, wider_mode, speed));
>> @@ -908,12 +909,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
>>              to make sure that for big-endian machines the higher order
>>              bits are used.  */
>>           if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
>> -           value_word = simplify_expand_binop (word_mode, lshr_optab,
>> -                                               value_word,
>> -                                               GEN_INT (BITS_PER_WORD
>> -                                                        - new_bitsize),
>> -                                               NULL_RTX, true,
>> -                                               OPTAB_LIB_WIDEN);
>> +           {
>> +             int shift = BITS_PER_WORD - new_bitsize;
>> +             rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
>> +             value_word = simplify_expand_binop (word_mode, lshr_optab,
>> +                                                 value_word, shift_rtx,
>> +                                                 NULL_RTX, true,
>> +                                                 OPTAB_LIB_WIDEN);
>> +           }
>>
>>           if (!store_bit_field_1 (op0, new_bitsize,
>>                                   bitnum + bit_offset,
>> @@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
>>        if (CONST_INT_P (op1)
>>           && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
>>               (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
>> -       op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
>> -                      % GET_MODE_BITSIZE (scalar_mode));
>> +       op1 = gen_int_shift_amount (mode,
>> +                                   (unsigned HOST_WIDE_INT) INTVAL (op1)
>> +                                   % GET_MODE_BITSIZE (scalar_mode));
>>        else if (GET_CODE (op1) == SUBREG
>>                && subreg_lowpart_p (op1)
>>                && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
>> @@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
>>        && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
>>                    GET_MODE_BITSIZE (scalar_mode) - 1))
>>      {
>> -      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>> +      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
>> +                                        - INTVAL (op1)));
>>        left = !left;
>>        code = left ? LROTATE_EXPR : RROTATE_EXPR;
>>      }
>> @@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
>>               if (op1 == const0_rtx)
>>                 return shifted;
>>               else if (CONST_INT_P (op1))
>> -               other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
>> -                                       - INTVAL (op1));
>> +               other_amount = gen_int_shift_amount
>> +                 (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>>               else
>>                 {
>>                   other_amount
>> @@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
>>  expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>>               int amount, rtx target, int unsignedp)
>>  {
>> -  return expand_shift_1 (code, mode,
>> -                        shifted, GEN_INT (amount), target, unsignedp);
>> +  return expand_shift_1 (code, mode, shifted,
>> +                        gen_int_shift_amount (mode, amount),
>> +                        target, unsignedp);
>>  }
>>
>>  /* Likewise, but return 0 if that cannot be done.  */
>> @@ -3855,7 +3861,7 @@ expand_smod_pow2 (scalar_int_mode mode,
>>         {
>>           HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
>>           signmask = force_reg (mode, signmask);
>> -         shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
>> +         shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>>
>>           /* Use the rtx_cost of a LSHIFTRT instruction to determine
>>              which instruction sequence to use.  If logical right shifts
>> Index: gcc/lower-subreg.c
>> ===================================================================
>> --- gcc/lower-subreg.c  2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/lower-subreg.c  2017-10-23 11:47:11.274393237 +0100
>> @@ -129,7 +129,7 @@ shift_cost (bool speed_p, struct cost_rt
>>    PUT_CODE (rtxes->shift, code);
>>    PUT_MODE (rtxes->shift, mode);
>>    PUT_MODE (rtxes->source, mode);
>> -  XEXP (rtxes->shift, 1) = GEN_INT (op1);
>> +  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
>>    return set_src_cost (rtxes->shift, mode, speed_p);
>>  }
>>
>> Index: gcc/simplify-rtx.c
>> ===================================================================
>> --- gcc/simplify-rtx.c  2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/simplify-rtx.c  2017-10-23 11:47:11.277288162 +0100
>> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
>>           if (STORE_FLAG_VALUE == 1)
>>             {
>>               temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
>> -                                         GEN_INT (isize - 1));
>> +                                         gen_int_shift_amount (inner,
>> +                                                               isize - 1));
>>               if (int_mode == inner)
>>                 return temp;
>>               if (GET_MODE_PRECISION (int_mode) > isize)
>> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
>>           else if (STORE_FLAG_VALUE == -1)
>>             {
>>               temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
>> -                                         GEN_INT (isize - 1));
>> +                                         gen_int_shift_amount (inner,
>> +                                                               isize - 1));
>>               if (int_mode == inner)
>>                 return temp;
>>               if (GET_MODE_PRECISION (int_mode) > isize)
>> @@ -2679,7 +2681,8 @@ simplify_binary_operation_1 (enum rtx_co
>>         {
>>           val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>>           if (val >= 0)
>> -           return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
>> +           return simplify_gen_binary (ASHIFT, mode, op0,
>> +                                       gen_int_shift_amount (mode, val));
>>         }
>>
>>        /* x*2 is x+x and x*(-1) is -x */
>> @@ -3303,7 +3306,8 @@ simplify_binary_operation_1 (enum rtx_co
>>        /* Convert divide by power of two into shift.  */
>>        if (CONST_INT_P (trueop1)
>>           && (val = exact_log2 (UINTVAL (trueop1))) > 0)
>> -       return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
>> +       return simplify_gen_binary (LSHIFTRT, mode, op0,
>> +                                   gen_int_shift_amount (mode, val));
>>        break;
>>
>>      case DIV:
>> @@ -3423,10 +3427,12 @@ simplify_binary_operation_1 (enum rtx_co
>>           && IN_RANGE (INTVAL (trueop1),
>>                        GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
>>                        GET_MODE_UNIT_PRECISION (mode) - 1))
>> -       return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>> -                                   mode, op0,
>> -                                   GEN_INT (GET_MODE_UNIT_PRECISION (mode)
>> -                                            - INTVAL (trueop1)));
>> +       {
>> +         int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
>> +         rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
>> +         return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>> +                                     mode, op0, new_amount_rtx);
>> +       }
>>  #endif
>>        /* FALLTHRU */
>>      case ASHIFTRT:
>> @@ -3466,8 +3472,8 @@ simplify_binary_operation_1 (enum rtx_co
>>               == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
>>           && subreg_lowpart_p (op0))
>>         {
>> -         rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
>> -                            + INTVAL (op1));
>> +         rtx tmp = gen_int_shift_amount
>> +           (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
>>           tmp = simplify_gen_binary (code, inner_mode,
>>                                      XEXP (SUBREG_REG (op0), 0),
>>                                      tmp);
>> @@ -3478,7 +3484,8 @@ simplify_binary_operation_1 (enum rtx_co
>>         {
>>           val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
>>           if (val != INTVAL (op1))
>> -           return simplify_gen_binary (code, mode, op0, GEN_INT (val));
>> +           return simplify_gen_binary (code, mode, op0,
>> +                                       gen_int_shift_amount (mode, val));
>>         }
>>        break;
>>
>> Index: gcc/combine.c
>> ===================================================================
>> --- gcc/combine.c       2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/combine.c       2017-10-23 11:47:11.272463287 +0100
>> @@ -3773,8 +3773,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>>               && INTVAL (XEXP (*split, 1)) > 0
>>               && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
>>             {
>> +             rtx i_rtx = gen_int_shift_amount (split_mode, i);
>>               SUBST (*split, gen_rtx_ASHIFT (split_mode,
>> -                                            XEXP (*split, 0), GEN_INT (i)));
>> +                                            XEXP (*split, 0), i_rtx));
>>               /* Update split_code because we may not have a multiply
>>                  anymore.  */
>>               split_code = GET_CODE (*split);
>> @@ -3788,8 +3789,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>>               && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>>             {
>>               rtx nsplit = XEXP (*split, 0);
>> +             rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
>>               SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
>> -                                            XEXP (nsplit, 0), GEN_INT (i)));
>> +                                                      XEXP (nsplit, 0),
>> +                                                      i_rtx));
>>               /* Update split_code because we may not have a multiply
>>                  anymore.  */
>>               split_code = GET_CODE (*split);
>> @@ -5057,12 +5060,12 @@ find_split_point (rtx *loc, rtx_insn *in
>>                                       GET_MODE (XEXP (SET_SRC (x), 0))))))
>>             {
>>               machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
>> -
>> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>>               SUBST (SET_SRC (x),
>>                      gen_rtx_NEG (mode,
>>                                   gen_rtx_LSHIFTRT (mode,
>>                                                     XEXP (SET_SRC (x), 0),
>> -                                                   GEN_INT (pos))));
>> +                                                   pos_rtx)));
>>
>>               split = find_split_point (&SET_SRC (x), insn, true);
>>               if (split && split != &SET_SRC (x))
>> @@ -5120,11 +5123,11 @@ find_split_point (rtx *loc, rtx_insn *in
>>             {
>>               unsigned HOST_WIDE_INT mask
>>                 = (HOST_WIDE_INT_1U << len) - 1;
>> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>>               SUBST (SET_SRC (x),
>>                      gen_rtx_AND (mode,
>>                                   gen_rtx_LSHIFTRT
>> -                                 (mode, gen_lowpart (mode, inner),
>> -                                  GEN_INT (pos)),
>> +                                 (mode, gen_lowpart (mode, inner), pos_rtx),
>>                                   gen_int_mode (mask, mode)));
>>
>>               split = find_split_point (&SET_SRC (x), insn, true);
>> @@ -5133,14 +5136,15 @@ find_split_point (rtx *loc, rtx_insn *in
>>             }
>>           else
>>             {
>> +             int left_bits = GET_MODE_PRECISION (mode) - len - pos;
>> +             int right_bits = GET_MODE_PRECISION (mode) - len;
>>               SUBST (SET_SRC (x),
>>                      gen_rtx_fmt_ee
>>                      (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
>>                       gen_rtx_ASHIFT (mode,
>>                                       gen_lowpart (mode, inner),
>> -                                     GEN_INT (GET_MODE_PRECISION (mode)
>> -                                              - len - pos)),
>> -                     GEN_INT (GET_MODE_PRECISION (mode) - len)));
>> +                                     gen_int_shift_amount (mode, left_bits)),
>> +                     gen_int_shift_amount (mode, right_bits)));
>>
>>               split = find_split_point (&SET_SRC (x), insn, true);
>>               if (split && split != &SET_SRC (x))
>> @@ -8915,10 +8919,11 @@ force_int_to_mode (rtx x, scalar_int_mod
>>           /* Must be more sign bit copies than the mask needs.  */
>>           && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>>               >= exact_log2 (mask + 1)))
>> -       x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>> -                                GEN_INT (GET_MODE_PRECISION (xmode)
>> -                                         - exact_log2 (mask + 1)));
>> -
>> +       {
>> +         int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
>> +         x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>> +                                  gen_int_shift_amount (xmode, nbits));
>> +       }
>>        goto shiftrt;
>>
>>      case ASHIFTRT:
>> @@ -10415,7 +10420,7 @@ simplify_shift_const_1 (enum rtx_code co
>>  {
>>    enum rtx_code orig_code = code;
>>    rtx orig_varop = varop;
>> -  int count;
>> +  int count, log2;
>>    machine_mode mode = result_mode;
>>    machine_mode shift_mode;
>>    scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
>> @@ -10618,13 +10623,11 @@ simplify_shift_const_1 (enum rtx_code co
>>              is cheaper.  But it is still better on those machines to
>>              merge two shifts into one.  */
>>           if (CONST_INT_P (XEXP (varop, 1))
>> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>>             {
>> -             varop
>> -               = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>> -                                      XEXP (varop, 0),
>> -                                      GEN_INT (exact_log2 (
>> -                                               UINTVAL (XEXP (varop, 1)))));
>> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>> +             varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>> +                                          XEXP (varop, 0), log2_rtx);
>>               continue;
>>             }
>>           break;
>> @@ -10632,13 +10635,11 @@ simplify_shift_const_1 (enum rtx_code co
>>         case UDIV:
>>           /* Similar, for when divides are cheaper.  */
>>           if (CONST_INT_P (XEXP (varop, 1))
>> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>>             {
>> -             varop
>> -               = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>> -                                      XEXP (varop, 0),
>> -                                      GEN_INT (exact_log2 (
>> -                                               UINTVAL (XEXP (varop, 1)))));
>> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>> +             varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>> +                                          XEXP (varop, 0), log2_rtx);
>>               continue;
>>             }
>>           break;
>> @@ -10773,10 +10774,10 @@ simplify_shift_const_1 (enum rtx_code co
>>
>>               mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>>                                        int_result_mode);
>> -
>> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>>               mask_rtx
>>                 = simplify_const_binary_operation (code, int_result_mode,
>> -                                                  mask_rtx, GEN_INT (count));
>> +                                                  mask_rtx, count_rtx);
>>
>>               /* Give up if we can't compute an outer operation to use.  */
>>               if (mask_rtx == 0
>> @@ -10832,9 +10833,10 @@ simplify_shift_const_1 (enum rtx_code co
>>               if (code == ASHIFTRT && int_mode != int_result_mode)
>>                 break;
>>
>> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>>               rtx new_rtx = simplify_const_binary_operation (code, int_mode,
>>                                                              XEXP (varop, 0),
>> -                                                            GEN_INT (count));
>> +                                                            count_rtx);
>>               varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
>>               count = 0;
>>               continue;
>> @@ -10900,7 +10902,7 @@ simplify_shift_const_1 (enum rtx_code co
>>               && (new_rtx = simplify_const_binary_operation
>>                   (code, int_result_mode,
>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> -                  GEN_INT (count))) != 0
>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>               && CONST_INT_P (new_rtx)
>>               && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
>>                                   INTVAL (new_rtx), int_result_mode,
>> @@ -11043,7 +11045,7 @@ simplify_shift_const_1 (enum rtx_code co
>>               && (new_rtx = simplify_const_binary_operation
>>                   (ASHIFT, int_result_mode,
>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> -                  GEN_INT (count))) != 0
>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>               && CONST_INT_P (new_rtx)
>>               && merge_outer_ops (&outer_op, &outer_const, PLUS,
>>                                   INTVAL (new_rtx), int_result_mode,
>> @@ -11064,7 +11066,7 @@ simplify_shift_const_1 (enum rtx_code co
>>               && (new_rtx = simplify_const_binary_operation
>>                   (code, int_result_mode,
>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> -                  GEN_INT (count))) != 0
>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>               && CONST_INT_P (new_rtx)
>>               && merge_outer_ops (&outer_op, &outer_const, XOR,
>>                                   INTVAL (new_rtx), int_result_mode,
>> @@ -11119,12 +11121,12 @@ simplify_shift_const_1 (enum rtx_code co
>>                       - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
>>             {
>>               rtx varop_inner = XEXP (varop, 0);
>> -
>> -             varop_inner
>> -               = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>> -                                   XEXP (varop_inner, 0),
>> -                                   GEN_INT
>> -                                   (count + INTVAL (XEXP (varop_inner, 1))));
>> +             int new_count = count + INTVAL (XEXP (varop_inner, 1));
>> +             rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
>> +                                                       new_count);
>> +             varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>> +                                             XEXP (varop_inner, 0),
>> +                                             new_count_rtx);
>>               varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
>>               count = 0;
>>               continue;
>> @@ -11176,7 +11178,8 @@ simplify_shift_const_1 (enum rtx_code co
>>      x = NULL_RTX;
>>
>>    if (x == NULL_RTX)
>> -    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
>> +    x = simplify_gen_binary (code, shift_mode, varop,
>> +                            gen_int_shift_amount (shift_mode, count));
>>
>>    /* If we were doing an LSHIFTRT in a wider mode than it was originally,
>>       turn off all the bits that the shift would have turned off.  */
>> @@ -11238,7 +11241,8 @@ simplify_shift_const (rtx x, enum rtx_co
>>      return tem;
>>
>>    if (!x)
>> -    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
>> +    x = simplify_gen_binary (code, GET_MODE (varop), varop,
>> +                            gen_int_shift_amount (GET_MODE (varop), count));
>>    if (GET_MODE (x) != result_mode)
>>      x = gen_lowpart (result_mode, x);
>>    return x;
>> @@ -11429,8 +11433,9 @@ change_zero_ext (rtx pat)
>>           if (BITS_BIG_ENDIAN)
>>             start = GET_MODE_PRECISION (inner_mode) - size - start;
>>
>> -         if (start)
>> -           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
>> +         if (start != 0)
>> +           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
>> +                                 gen_int_shift_amount (inner_mode, start));
>>           else
>>             x = XEXP (x, 0);
>>           if (mode != inner_mode)
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c        2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/optabs.c        2017-10-23 11:47:11.276323187 +0100
>> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
>>        if (binoptab != ashr_optab)
>>         emit_move_insn (outof_target, CONST0_RTX (word_mode));
>>        else
>> -       if (!force_expand_binop (word_mode, binoptab,
>> -                                outof_input, GEN_INT (BITS_PER_WORD - 1),
>> +       if (!force_expand_binop (word_mode, binoptab, outof_input,
>> +                                gen_int_shift_amount (word_mode,
>> +                                                      BITS_PER_WORD - 1),
>>                                  outof_target, unsignedp, methods))
>>           return false;
>>      }
>> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
>>  {
>>    int low = (WORDS_BIG_ENDIAN ? 1 : 0);
>>    int high = (WORDS_BIG_ENDIAN ? 0 : 1);
>> -  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
>> +  rtx wordm1 = (umulp ? NULL_RTX
>> +               : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
>>    rtx product, adjust, product_high, temp;
>>
>>    rtx op0_high = operand_subword_force (op0, high, mode);
>> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
>>        unsigned int bits = GET_MODE_PRECISION (int_mode);
>>
>>        if (CONST_INT_P (op1))
>> -        newop1 = GEN_INT (bits - INTVAL (op1));
>> +       newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
>>        else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
>>          newop1 = negate_rtx (GET_MODE (op1), op1);
>>        else
>> @@ -1399,11 +1401,11 @@ expand_binop (machine_mode mode, optab b
>>        shift_mask = targetm.shift_truncation_mask (word_mode);
>>        op1_mode = (GET_MODE (op1) != VOIDmode
>>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>> -                 : word_mode);
>> +                 : get_shift_amount_mode (word_mode));
>>
>>        /* Apply the truncation to constant shifts.  */
>>        if (double_shift_mask > 0 && CONST_INT_P (op1))
>> -       op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
>> +       op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>>
>>        if (op1 == CONST0_RTX (op1_mode))
>>         return op0;
>> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
>>        else
>>         {
>>           rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
>> -         rtx first_shift_count, second_shift_count;
>> +         HOST_WIDE_INT first_shift_count, second_shift_count;
>>           optab reverse_unsigned_shift, unsigned_shift;
>>
>>           reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
>> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>>
>>           if (shift_count > BITS_PER_WORD)
>>             {
>> -             first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
>> -             second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
>> +             first_shift_count = shift_count - BITS_PER_WORD;
>> +             second_shift_count = 2 * BITS_PER_WORD - shift_count;
>>             }
>>           else
>>             {
>> -             first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
>> -             second_shift_count = GEN_INT (shift_count);
>> +             first_shift_count = BITS_PER_WORD - shift_count;
>> +             second_shift_count = shift_count;
>>             }
>> +         rtx first_shift_count_rtx
>> +           = gen_int_shift_amount (word_mode, first_shift_count);
>> +         rtx second_shift_count_rtx
>> +           = gen_int_shift_amount (word_mode, second_shift_count);
>>
>>           into_temp1 = expand_binop (word_mode, unsigned_shift,
>> -                                    outof_input, first_shift_count,
>> +                                    outof_input, first_shift_count_rtx,
>>                                      NULL_RTX, unsignedp, next_methods);
>>           into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>> -                                    into_input, second_shift_count,
>> +                                    into_input, second_shift_count_rtx,
>>                                      NULL_RTX, unsignedp, next_methods);
>>
>>           if (into_temp1 != 0 && into_temp2 != 0)
>> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
>>             emit_move_insn (into_target, inter);
>>
>>           outof_temp1 = expand_binop (word_mode, unsigned_shift,
>> -                                     into_input, first_shift_count,
>> +                                     into_input, first_shift_count_rtx,
>>                                       NULL_RTX, unsignedp, next_methods);
>>           outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>> -                                     outof_input, second_shift_count,
>> +                                     outof_input, second_shift_count_rtx,
>>                                       NULL_RTX, unsignedp, next_methods);
>>
>>           if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
>> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>>
>>           if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
>>             {
>> -             temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
>> -                                  unsignedp, OPTAB_DIRECT);
>> +             temp = expand_binop (mode, rotl_optab, op0,
>> +                                  gen_int_shift_amount (mode, 8),
>> +                                  target, unsignedp, OPTAB_DIRECT);
>>               if (temp)
>>                 return temp;
>>              }
>>
>>           if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
>>             {
>> -             temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
>> -                                  unsignedp, OPTAB_DIRECT);
>> +             temp = expand_binop (mode, rotr_optab, op0,
>> +                                  gen_int_shift_amount (mode, 8),
>> +                                  target, unsignedp, OPTAB_DIRECT);
>>               if (temp)
>>                 return temp;
>>             }
>>
>>           last = get_last_insn ();
>>
>> -         temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
>> +         temp1 = expand_binop (mode, ashl_optab, op0,
>> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>>                                 unsignedp, OPTAB_WIDEN);
>> -         temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
>> +         temp2 = expand_binop (mode, lshr_optab, op0,
>> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>>                                 unsignedp, OPTAB_WIDEN);
>>           if (temp1 && temp2)
>>             {
>> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
>>  }
>>
>>  /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
>> -   vec_perm operand, assuming the second operand is a constant vector of zeroes.
>> -   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
>> -   shift.  */
>> +   vec_perm operand (which has mode OP0_MODE), assuming the second
>> +   operand is a constant vector of zeroes.  Return the shift distance in
>> +   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
>>  static rtx
>> -shift_amt_for_vec_perm_mask (rtx sel)
>> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
>>  {
>>    unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
>> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>>         return NULL_RTX;
>>      }
>>
>> -  return GEN_INT (first * bitsize);
>> +  return gen_int_shift_amount (op0_mode, first * bitsize);
>>  }
>>
>>  /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
>> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
>>           && (shift_code != CODE_FOR_nothing
>>               || shift_code_qi != CODE_FOR_nothing))
>>         {
>> -         shift_amt = shift_amt_for_vec_perm_mask (sel);
>> +         shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>>           if (shift_amt)
>>             {
>>               struct expand_operand ops[3];
>> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
>>                                    NULL, 0, OPTAB_DIRECT);
>>        else
>>         sel = expand_simple_binop (selmode, ASHIFT, sel,
>> -                                  GEN_INT (exact_log2 (u)),
>> +                                  gen_int_shift_amount (selmode,
>> +                                                        exact_log2 (u)),
>>                                    NULL, 0, OPTAB_DIRECT);
>>        gcc_assert (sel != NULL);
>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [15/nn] Use more specific hash functions in rtlhash.c
  2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
@ 2017-10-26 12:08   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:08 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:26 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Avoid using add_object when we have more specific routines available.

Ok.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * rtlhash.c (add_rtx): Use add_hwi for 'w' and add_int for 'i'.
>
> Index: gcc/rtlhash.c
> ===================================================================
> --- gcc/rtlhash.c       2017-02-23 19:54:03.000000000 +0000
> +++ gcc/rtlhash.c       2017-10-23 11:47:20.120201389 +0100
> @@ -77,11 +77,11 @@ add_rtx (const_rtx x, hash &hstate)
>      switch (fmt[i])
>        {
>        case 'w':
> -       hstate.add_object (XWINT (x, i));
> +       hstate.add_hwi (XWINT (x, i));
>         break;
>        case 'n':
>        case 'i':
> -       hstate.add_object (XINT (x, i));
> +       hstate.add_int (XINT (x, i));
>         break;
>        case 'V':
>        case 'E':

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [16/nn] Factor out the mode handling in lower-subreg.c
  2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
@ 2017-10-26 12:09   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:09 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:27 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a helper routine (interesting_mode_p) to lower-subreg.c,
> to make the decision about whether a mode can be split and, if so,
> calculate the number of bytes and words in the mode.  At present this
> function always returns true; a later patch will add cases in which it
> can return false.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * lower-subreg.c (interesting_mode_p): New function.
>         (compute_costs, find_decomposable_subregs, decompose_register)
>         (simplify_subreg_concatn, can_decompose_p, resolve_simple_move)
>         (resolve_clobber, dump_choices): Use it.
>
> Index: gcc/lower-subreg.c
> ===================================================================
> --- gcc/lower-subreg.c  2017-10-23 11:47:11.274393237 +0100
> +++ gcc/lower-subreg.c  2017-10-23 11:47:23.555013148 +0100
> @@ -103,6 +103,18 @@ #define twice_word_mode \
>  #define choices \
>    this_target_lower_subreg->x_choices
>
> +/* Return true if MODE is a mode we know how to lower.  When returning true,
> +   store its byte size in *BYTES and its word size in *WORDS.  */
> +
> +static inline bool
> +interesting_mode_p (machine_mode mode, unsigned int *bytes,
> +                   unsigned int *words)
> +{
> +  *bytes = GET_MODE_SIZE (mode);
> +  *words = CEIL (*bytes, UNITS_PER_WORD);
> +  return true;
> +}
> +
>  /* RTXes used while computing costs.  */
>  struct cost_rtxes {
>    /* Source and target registers.  */
> @@ -199,10 +211,10 @@ compute_costs (bool speed_p, struct cost
>    for (i = 0; i < MAX_MACHINE_MODE; i++)
>      {
>        machine_mode mode = (machine_mode) i;
> -      int factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
> -      if (factor > 1)
> +      unsigned int size, factor;
> +      if (interesting_mode_p (mode, &size, &factor) && factor > 1)
>         {
> -         int mode_move_cost;
> +         unsigned int mode_move_cost;
>
>           PUT_MODE (rtxes->target, mode);
>           PUT_MODE (rtxes->source, mode);
> @@ -469,10 +481,10 @@ find_decomposable_subregs (rtx *loc, enu
>               continue;
>             }
>
> -         outer_size = GET_MODE_SIZE (GET_MODE (x));
> -         inner_size = GET_MODE_SIZE (GET_MODE (inner));
> -         outer_words = (outer_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> -         inner_words = (inner_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> +         if (!interesting_mode_p (GET_MODE (x), &outer_size, &outer_words)
> +             || !interesting_mode_p (GET_MODE (inner), &inner_size,
> +                                     &inner_words))
> +           continue;
>
>           /* We only try to decompose single word subregs of multi-word
>              registers.  When we find one, we return -1 to avoid iterating
> @@ -507,7 +519,7 @@ find_decomposable_subregs (rtx *loc, enu
>         }
>        else if (REG_P (x))
>         {
> -         unsigned int regno;
> +         unsigned int regno, size, words;
>
>           /* We will see an outer SUBREG before we see the inner REG, so
>              when we see a plain REG here it means a direct reference to
> @@ -527,7 +539,8 @@ find_decomposable_subregs (rtx *loc, enu
>
>           regno = REGNO (x);
>           if (!HARD_REGISTER_NUM_P (regno)
> -             && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
> +             && interesting_mode_p (GET_MODE (x), &size, &words)
> +             && words > 1)
>             {
>               switch (*pcmi)
>                 {
> @@ -567,15 +580,15 @@ find_decomposable_subregs (rtx *loc, enu
>  decompose_register (unsigned int regno)
>  {
>    rtx reg;
> -  unsigned int words, i;
> +  unsigned int size, words, i;
>    rtvec v;
>
>    reg = regno_reg_rtx[regno];
>
>    regno_reg_rtx[regno] = NULL_RTX;
>
> -  words = GET_MODE_SIZE (GET_MODE (reg));
> -  words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> +  if (!interesting_mode_p (GET_MODE (reg), &size, &words))
> +    gcc_unreachable ();
>
>    v = rtvec_alloc (words);
>    for (i = 0; i < words; ++i)
> @@ -599,25 +612,29 @@ decompose_register (unsigned int regno)
>  simplify_subreg_concatn (machine_mode outermode, rtx op,
>                          unsigned int byte)
>  {
> -  unsigned int inner_size;
> +  unsigned int outer_size, outer_words, inner_size, inner_words;
>    machine_mode innermode, partmode;
>    rtx part;
>    unsigned int final_offset;
>
> +  innermode = GET_MODE (op);
> +  if (!interesting_mode_p (outermode, &outer_size, &outer_words)
> +      || !interesting_mode_p (innermode, &inner_size, &inner_words))
> +    gcc_unreachable ();
> +
>    gcc_assert (GET_CODE (op) == CONCATN);
> -  gcc_assert (byte % GET_MODE_SIZE (outermode) == 0);
> +  gcc_assert (byte % outer_size == 0);
>
> -  innermode = GET_MODE (op);
> -  gcc_assert (byte < GET_MODE_SIZE (innermode));
> -  if (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))
> +  gcc_assert (byte < inner_size);
> +  if (outer_size > inner_size)
>      return NULL_RTX;
>
> -  inner_size = GET_MODE_SIZE (innermode) / XVECLEN (op, 0);
> +  inner_size /= XVECLEN (op, 0);
>    part = XVECEXP (op, 0, byte / inner_size);
>    partmode = GET_MODE (part);
>
>    final_offset = byte % inner_size;
> -  if (final_offset + GET_MODE_SIZE (outermode) > inner_size)
> +  if (final_offset + outer_size > inner_size)
>      return NULL_RTX;
>
>    /* VECTOR_CSTs in debug expressions are expanded into CONCATN instead of
> @@ -801,9 +818,10 @@ can_decompose_p (rtx x)
>
>        if (HARD_REGISTER_NUM_P (regno))
>         {
> -         unsigned int byte, num_bytes;
> +         unsigned int byte, num_bytes, num_words;
>
> -         num_bytes = GET_MODE_SIZE (GET_MODE (x));
> +         if (!interesting_mode_p (GET_MODE (x), &num_bytes, &num_words))
> +           return false;
>           for (byte = 0; byte < num_bytes; byte += UNITS_PER_WORD)
>             if (simplify_subreg_regno (regno, GET_MODE (x), byte, word_mode) < 0)
>               return false;
> @@ -826,14 +844,15 @@ resolve_simple_move (rtx set, rtx_insn *
>    rtx src, dest, real_dest;
>    rtx_insn *insns;
>    machine_mode orig_mode, dest_mode;
> -  unsigned int words;
> +  unsigned int orig_size, words;
>    bool pushing;
>
>    src = SET_SRC (set);
>    dest = SET_DEST (set);
>    orig_mode = GET_MODE (dest);
>
> -  words = (GET_MODE_SIZE (orig_mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> +  if (!interesting_mode_p (orig_mode, &orig_size, &words))
> +    gcc_unreachable ();
>    gcc_assert (words > 1);
>
>    start_sequence ();
> @@ -964,7 +983,7 @@ resolve_simple_move (rtx set, rtx_insn *
>      {
>        unsigned int i, j, jinc;
>
> -      gcc_assert (GET_MODE_SIZE (orig_mode) % UNITS_PER_WORD == 0);
> +      gcc_assert (orig_size % UNITS_PER_WORD == 0);
>        gcc_assert (GET_CODE (XEXP (dest, 0)) != PRE_MODIFY);
>        gcc_assert (GET_CODE (XEXP (dest, 0)) != POST_MODIFY);
>
> @@ -1059,7 +1078,7 @@ resolve_clobber (rtx pat, rtx_insn *insn
>  {
>    rtx reg;
>    machine_mode orig_mode;
> -  unsigned int words, i;
> +  unsigned int orig_size, words, i;
>    int ret;
>
>    reg = XEXP (pat, 0);
> @@ -1067,8 +1086,8 @@ resolve_clobber (rtx pat, rtx_insn *insn
>      return false;
>
>    orig_mode = GET_MODE (reg);
> -  words = GET_MODE_SIZE (orig_mode);
> -  words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> +  if (!interesting_mode_p (orig_mode, &orig_size, &words))
> +    gcc_unreachable ();
>
>    ret = validate_change (NULL_RTX, &XEXP (pat, 0),
>                          simplify_gen_subreg_concatn (word_mode, reg,
> @@ -1332,12 +1351,13 @@ dump_shift_choices (enum rtx_code code,
>  static void
>  dump_choices (bool speed_p, const char *description)
>  {
> -  unsigned int i;
> +  unsigned int size, factor, i;
>
>    fprintf (dump_file, "Choices when optimizing for %s:\n", description);
>
>    for (i = 0; i < MAX_MACHINE_MODE; i++)
> -    if (GET_MODE_SIZE ((machine_mode) i) > UNITS_PER_WORD)
> +    if (interesting_mode_p ((machine_mode) i, &size, &factor)
> +       && factor > 1)
>        fprintf (dump_file, "  %s mode %s for copy lowering.\n",
>                choices[speed_p].move_modes_to_split[i]
>                ? "Splitting"

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function
  2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
@ 2017-10-26 12:10   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:10 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:27 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This avoids the double evaluation mentioned in the comments and
> simplifies the change to make MEM_OFFSET variable.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * var-tracking.c (INT_MEM_OFFSET): Replace with...
>         (int_mem_offset): ...this new function.
>         (var_mem_set, var_mem_delete_and_set, var_mem_delete)
>         (find_mem_expr_in_1pdv, dataflow_set_preserve_mem_locs)
>         (same_variable_part_p, use_type, add_stores, vt_get_decl_and_offset):
>         Update accordingly.
>
> Index: gcc/var-tracking.c
> ===================================================================
> --- gcc/var-tracking.c  2017-09-12 14:28:56.401824826 +0100
> +++ gcc/var-tracking.c  2017-10-23 11:47:27.197231712 +0100
> @@ -390,8 +390,15 @@ struct variable
>  /* Pointer to the BB's information specific to variable tracking pass.  */
>  #define VTI(BB) ((variable_tracking_info *) (BB)->aux)
>
> -/* Macro to access MEM_OFFSET as an HOST_WIDE_INT.  Evaluates MEM twice.  */
> -#define INT_MEM_OFFSET(mem) (MEM_OFFSET_KNOWN_P (mem) ? MEM_OFFSET (mem) : 0)
> +/* Return MEM_OFFSET (MEM) as a HOST_WIDE_INT, or 0 if we can't.  */
> +
> +static inline HOST_WIDE_INT
> +int_mem_offset (const_rtx mem)
> +{
> +  if (MEM_OFFSET_KNOWN_P (mem))
> +    return MEM_OFFSET (mem);
> +  return 0;
> +}
>
>  #if CHECKING_P && (GCC_VERSION >= 2007)
>
> @@ -2336,7 +2343,7 @@ var_mem_set (dataflow_set *set, rtx loc,
>              rtx set_src)
>  {
>    tree decl = MEM_EXPR (loc);
> -  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> +  HOST_WIDE_INT offset = int_mem_offset (loc);
>
>    var_mem_decl_set (set, loc, initialized,
>                     dv_from_decl (decl), offset, set_src, INSERT);
> @@ -2354,7 +2361,7 @@ var_mem_delete_and_set (dataflow_set *se
>                         enum var_init_status initialized, rtx set_src)
>  {
>    tree decl = MEM_EXPR (loc);
> -  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> +  HOST_WIDE_INT offset = int_mem_offset (loc);
>
>    clobber_overlapping_mems (set, loc);
>    decl = var_debug_decl (decl);
> @@ -2375,7 +2382,7 @@ var_mem_delete_and_set (dataflow_set *se
>  var_mem_delete (dataflow_set *set, rtx loc, bool clobber)
>  {
>    tree decl = MEM_EXPR (loc);
> -  HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> +  HOST_WIDE_INT offset = int_mem_offset (loc);
>
>    clobber_overlapping_mems (set, loc);
>    decl = var_debug_decl (decl);
> @@ -4618,7 +4625,7 @@ find_mem_expr_in_1pdv (tree expr, rtx va
>    for (node = var->var_part[0].loc_chain; node; node = node->next)
>      if (MEM_P (node->loc)
>         && MEM_EXPR (node->loc) == expr
> -       && INT_MEM_OFFSET (node->loc) == 0)
> +       && int_mem_offset (node->loc) == 0)
>        {
>         where = node;
>         break;
> @@ -4683,7 +4690,7 @@ dataflow_set_preserve_mem_locs (variable
>               /* We want to remove dying MEMs that don't refer to DECL.  */
>               if (GET_CODE (loc->loc) == MEM
>                   && (MEM_EXPR (loc->loc) != decl
> -                     || INT_MEM_OFFSET (loc->loc) != 0)
> +                     || int_mem_offset (loc->loc) != 0)
>                   && mem_dies_at_call (loc->loc))
>                 break;
>               /* We want to move here MEMs that do refer to DECL.  */
> @@ -4727,7 +4734,7 @@ dataflow_set_preserve_mem_locs (variable
>
>           if (GET_CODE (loc->loc) != MEM
>               || (MEM_EXPR (loc->loc) == decl
> -                 && INT_MEM_OFFSET (loc->loc) == 0)
> +                 && int_mem_offset (loc->loc) == 0)
>               || !mem_dies_at_call (loc->loc))
>             {
>               if (old_loc != loc->loc && emit_notes)
> @@ -5254,7 +5261,7 @@ same_variable_part_p (rtx loc, tree expr
>    else if (MEM_P (loc))
>      {
>        expr2 = MEM_EXPR (loc);
> -      offset2 = INT_MEM_OFFSET (loc);
> +      offset2 = int_mem_offset (loc);
>      }
>    else
>      return false;
> @@ -5522,7 +5529,7 @@ use_type (rtx loc, struct count_use_info
>         return MO_CLOBBER;
>        else if (target_for_debug_bind (var_debug_decl (expr)))
>         return MO_CLOBBER;
> -      else if (track_loc_p (loc, expr, INT_MEM_OFFSET (loc),
> +      else if (track_loc_p (loc, expr, int_mem_offset (loc),
>                             false, modep, NULL)
>                /* Multi-part variables shouldn't refer to one-part
>                   variable names such as VALUEs (never happens) or
> @@ -6017,7 +6024,7 @@ add_stores (rtx loc, const_rtx expr, voi
>               rtx xexpr = gen_rtx_SET (loc, src);
>               if (same_variable_part_p (SET_SRC (xexpr),
>                                         MEM_EXPR (loc),
> -                                       INT_MEM_OFFSET (loc)))
> +                                       int_mem_offset (loc)))
>                 mo.type = MO_COPY;
>               else
>                 mo.type = MO_SET;
> @@ -9579,7 +9586,7 @@ vt_get_decl_and_offset (rtx rtl, tree *d
>        if (MEM_ATTRS (rtl))
>         {
>           *declp = MEM_EXPR (rtl);
> -         *offsetp = INT_MEM_OFFSET (rtl);
> +         *offsetp = int_mem_offset (rtl);
>           return true;
>         }
>      }

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c
  2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
@ 2017-10-26 12:13   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:13 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:28 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch avoids some calculations of the form:
>
>   GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)
>
> in simplify-rtx.c.  If we're dealing with CONST_VECTORs, it's better
> to use CONST_VECTOR_NUNITS, since that remains constant even after the
> SVE patches.  In other cases we can get the number from GET_MODE_NUNITS.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
>         and CONST_VECTOR_NUNITS instead of computing the number of units from
>         the byte sizes of the vector and element.
>         (simplify_binary_operation_1): Likewise.
>         (simplify_const_binary_operation): Likewise.
>         (simplify_ternary_operation): Likewise.
>
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c  2017-10-23 11:47:11.277288162 +0100
> +++ gcc/simplify-rtx.c  2017-10-23 11:47:32.868935554 +0100
> @@ -1752,18 +1752,12 @@ simplify_const_unary_operation (enum rtx
>         return gen_const_vec_duplicate (mode, op);
>        if (GET_CODE (op) == CONST_VECTOR)
>         {
> -         int elt_size = GET_MODE_UNIT_SIZE (mode);
> -          unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> -         rtvec v = rtvec_alloc (n_elts);
> -         unsigned int i;
> -
> -         machine_mode inmode = GET_MODE (op);
> -         int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
> -         unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
> -
> +         unsigned int n_elts = GET_MODE_NUNITS (mode);
> +         unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
>           gcc_assert (in_n_elts < n_elts);
>           gcc_assert ((n_elts % in_n_elts) == 0);
> -         for (i = 0; i < n_elts; i++)
> +         rtvec v = rtvec_alloc (n_elts);
> +         for (unsigned i = 0; i < n_elts; i++)
>             RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
>           return gen_rtx_CONST_VECTOR (mode, v);
>         }
> @@ -3608,9 +3602,7 @@ simplify_binary_operation_1 (enum rtx_co
>               rtx op0 = XEXP (trueop0, 0);
>               rtx op1 = XEXP (trueop0, 1);
>
> -             machine_mode opmode = GET_MODE (op0);
> -             int elt_size = GET_MODE_UNIT_SIZE (opmode);
> -             int n_elts = GET_MODE_SIZE (opmode) / elt_size;
> +             int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
>
>               int i = INTVAL (XVECEXP (trueop1, 0, 0));
>               int elem;
> @@ -3637,21 +3629,8 @@ simplify_binary_operation_1 (enum rtx_co
>                   mode01 = GET_MODE (op01);
>
>                   /* Find out number of elements of each operand.  */
> -                 if (VECTOR_MODE_P (mode00))
> -                   {
> -                     elt_size = GET_MODE_UNIT_SIZE (mode00);
> -                     n_elts00 = GET_MODE_SIZE (mode00) / elt_size;
> -                   }
> -                 else
> -                   n_elts00 = 1;
> -
> -                 if (VECTOR_MODE_P (mode01))
> -                   {
> -                     elt_size = GET_MODE_UNIT_SIZE (mode01);
> -                     n_elts01 = GET_MODE_SIZE (mode01) / elt_size;
> -                   }
> -                 else
> -                   n_elts01 = 1;
> +                 n_elts00 = GET_MODE_NUNITS (mode00);
> +                 n_elts01 = GET_MODE_NUNITS (mode01);
>
>                   gcc_assert (n_elts == n_elts00 + n_elts01);
>
> @@ -3771,9 +3750,8 @@ simplify_binary_operation_1 (enum rtx_co
>               rtx subop1 = XEXP (trueop0, 1);
>               machine_mode mode0 = GET_MODE (subop0);
>               machine_mode mode1 = GET_MODE (subop1);
> -             int li = GET_MODE_UNIT_SIZE (mode0);
> -             int l0 = GET_MODE_SIZE (mode0) / li;
> -             int l1 = GET_MODE_SIZE (mode1) / li;
> +             int l0 = GET_MODE_NUNITS (mode0);
> +             int l1 = GET_MODE_NUNITS (mode1);
>               int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
>               if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
>                 {
> @@ -3931,14 +3909,10 @@ simplify_binary_operation_1 (enum rtx_co
>                 || CONST_SCALAR_INT_P (trueop1)
>                 || CONST_DOUBLE_AS_FLOAT_P (trueop1)))
>           {
> -           int elt_size = GET_MODE_UNIT_SIZE (mode);
> -           unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> +           unsigned n_elts = GET_MODE_NUNITS (mode);
> +           unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
>             rtvec v = rtvec_alloc (n_elts);
>             unsigned int i;
> -           unsigned in_n_elts = 1;
> -
> -           if (VECTOR_MODE_P (op0_mode))
> -             in_n_elts = (GET_MODE_SIZE (op0_mode) / elt_size);
>             for (i = 0; i < n_elts; i++)
>               {
>                 if (i < in_n_elts)
> @@ -4026,16 +4000,12 @@ simplify_const_binary_operation (enum rt
>        && GET_CODE (op0) == CONST_VECTOR
>        && GET_CODE (op1) == CONST_VECTOR)
>      {
> -      unsigned n_elts = GET_MODE_NUNITS (mode);
> -      machine_mode op0mode = GET_MODE (op0);
> -      unsigned op0_n_elts = GET_MODE_NUNITS (op0mode);
> -      machine_mode op1mode = GET_MODE (op1);
> -      unsigned op1_n_elts = GET_MODE_NUNITS (op1mode);
> +      unsigned int n_elts = CONST_VECTOR_NUNITS (op0);
> +      gcc_assert (n_elts == (unsigned int) CONST_VECTOR_NUNITS (op1));
> +      gcc_assert (n_elts == GET_MODE_NUNITS (mode));
>        rtvec v = rtvec_alloc (n_elts);
>        unsigned int i;
>
> -      gcc_assert (op0_n_elts == n_elts);
> -      gcc_assert (op1_n_elts == n_elts);
>        for (i = 0; i < n_elts; i++)
>         {
>           rtx x = simplify_binary_operation (code, GET_MODE_INNER (mode),
> @@ -5712,8 +5682,7 @@ simplify_ternary_operation (enum rtx_cod
>        trueop2 = avoid_constant_pool_reference (op2);
>        if (CONST_INT_P (trueop2))
>         {
> -         int elt_size = GET_MODE_UNIT_SIZE (mode);
> -         unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> +         unsigned n_elts = GET_MODE_NUNITS (mode);
>           unsigned HOST_WIDE_INT sel = UINTVAL (trueop2);
>           unsigned HOST_WIDE_INT mask;
>           if (n_elts == HOST_BITS_PER_WIDE_INT)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [19/nn] Don't treat zero-sized ranges as overlapping
  2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
@ 2017-10-26 12:14   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:14 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:29 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Most GCC ranges seem to be represented as an offset and a size (rather
> than a start and inclusive end or start and exclusive end).  The usual
> test for whether X is in a range is of course:
>
>   x >= start && x < start + size
> or:
>   x >= start && x - start < size
>
> which means that an empty range of size 0 contains nothing.  But other
> range tests aren't as obvious.
>
> The usual test for whether one range is contained within another
> range is:
>
>   start1 >= start2 && start1 + size1 <= start2 + size2
>
> while the test for whether two ranges overlap (from ranges_overlap_p) is:
>
>      (start1 >= start2 && start1 < start2 + size2)
>   || (start2 >= start1 && start2 < start1 + size1)
>
> i.e. the ranges overlap if one range contains the start of the other
> range.  This leads to strange results like:
>
>   (start X, size 0) is a subrange of (start X, size 0) but
>   (start X, size 0) does not overlap (start X, size 0)
>
> Similarly:
>
>   (start 4, size 0) is a subrange of (start 2, size 2) but
>   (start 4, size 0) does not overlap (start 2, size 2)
>
> It seems like "X is a subrange of Y" should imply "X overlaps Y".
>
> This becomes harder to ignore with the runtime sizes and offsets
> added for SVE.  The most obvious fix seemed to be to say that
> an empty range does not overlap anything, and is therefore not
> a subrange of anything.
>
> Using the new definition of subranges didn't seem to cause any
> codegen differences in the testsuite.  But there was one change
> with the new definition of overlapping ranges.  strncpy-chk.c has:
>
>   memset (dst, 0, sizeof (dst));
>   if (strncpy (dst, src, 0) != dst || strcmp (dst, ""))
>     abort();
>
> The strncpy is detected as a zero-size write, and so with the new
> definition of overlapping ranges, we treat the strncpy as having
> no effect on the strcmp (which is true).  The reaching definition
> is the memset instead.
>
> This patch makes ranges_overlap_p return false for zero-sized
> ranges, even if the other range has an unknown size.

Ok.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>
> gcc/
>         * tree-ssa-alias.h (ranges_overlap_p): Return false if either
>         range is known to be empty.
>
> Index: gcc/tree-ssa-alias.h
> ===================================================================
> --- gcc/tree-ssa-alias.h        2017-03-28 16:19:22.000000000 +0100
> +++ gcc/tree-ssa-alias.h        2017-10-23 11:47:38.181155696 +0100
> @@ -171,6 +171,8 @@ ranges_overlap_p (HOST_WIDE_INT pos1,
>                   HOST_WIDE_INT pos2,
>                   unsigned HOST_WIDE_INT size2)
>  {
> +  if (size1 == 0 || size2 == 0)
> +    return false;
>    if (pos1 >= pos2
>        && (size2 == (unsigned HOST_WIDE_INT)-1
>           || pos1 < (pos2 + (HOST_WIDE_INT) size2)))

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [22/nn] Make dse.c use offset/width instead of start/end
  2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
@ 2017-10-26 12:18   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:18 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> store_info and read_info_type in dse.c represented the ranges as
> start/end, but a lot of the internal code used offset/width instead.
> Using offset/width throughout fits better with the poly_int.h
> range-checking functions.

Ok.

Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * dse.c (store_info, read_info_type): Replace begin and end with
>         offset and width.
>         (print_range): New function.
>         (set_all_positions_unneeded, any_positions_needed_p)
>         (check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Update
>         accordingly.
>         (record_store): Likewise.  Optimize the case in which all positions
>         are unneeded.
>         (get_stored_val): Replace read_begin and read_end with read_offset
>         and read_width.
>         (replace_read): Update call accordingly.
>
> Index: gcc/dse.c
> ===================================================================
> --- gcc/dse.c   2017-10-23 11:47:11.273428262 +0100
> +++ gcc/dse.c   2017-10-23 11:47:48.294155952 +0100
> @@ -243,9 +243,12 @@ struct store_info
>    /* Canonized MEM address for use by canon_true_dependence.  */
>    rtx mem_addr;
>
> -  /* The offset of the first and byte before the last byte associated
> -     with the operation.  */
> -  HOST_WIDE_INT begin, end;
> +  /* The offset of the first byte associated with the operation.  */
> +  HOST_WIDE_INT offset;
> +
> +  /* The number of bytes covered by the operation.  This is always exact
> +     and known (rather than -1).  */
> +  HOST_WIDE_INT width;
>
>    union
>      {
> @@ -261,7 +264,7 @@ struct store_info
>           bitmap bmap;
>
>           /* Number of set bits (i.e. unneeded bytes) in BITMAP.  If it is
> -            equal to END - BEGIN, the whole store is unused.  */
> +            equal to WIDTH, the whole store is unused.  */
>           int count;
>         } large;
>      } positions_needed;
> @@ -304,10 +307,11 @@ struct read_info_type
>    /* The id of the mem group of the base address.  */
>    int group_id;
>
> -  /* The offset of the first and byte after the last byte associated
> -     with the operation.  If begin == end == 0, the read did not have
> -     a constant offset.  */
> -  int begin, end;
> +  /* The offset of the first byte associated with the operation.  */
> +  HOST_WIDE_INT offset;
> +
> +  /* The number of bytes covered by the operation, or -1 if not known.  */
> +  HOST_WIDE_INT width;
>
>    /* The mem being read.  */
>    rtx mem;
> @@ -586,6 +590,18 @@ static deferred_change *deferred_change_
>
>  /* The number of bits used in the global bitmaps.  */
>  static unsigned int current_position;
> +
> +/* Print offset range [OFFSET, OFFSET + WIDTH) to FILE.  */
> +
> +static void
> +print_range (FILE *file, poly_int64 offset, poly_int64 width)
> +{
> +  fprintf (file, "[");
> +  print_dec (offset, file, SIGNED);
> +  fprintf (file, "..");
> +  print_dec (offset + width, file, SIGNED);
> +  fprintf (file, ")");
> +}
>
>  /*----------------------------------------------------------------------------
>     Zeroth step.
> @@ -1212,10 +1228,9 @@ set_all_positions_unneeded (store_info *
>  {
>    if (__builtin_expect (s_info->is_large, false))
>      {
> -      int pos, end = s_info->end - s_info->begin;
> -      for (pos = 0; pos < end; pos++)
> -       bitmap_set_bit (s_info->positions_needed.large.bmap, pos);
> -      s_info->positions_needed.large.count = end;
> +      bitmap_set_range (s_info->positions_needed.large.bmap,
> +                       0, s_info->width);
> +      s_info->positions_needed.large.count = s_info->width;
>      }
>    else
>      s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
> @@ -1227,8 +1242,7 @@ set_all_positions_unneeded (store_info *
>  any_positions_needed_p (store_info *s_info)
>  {
>    if (__builtin_expect (s_info->is_large, false))
> -    return (s_info->positions_needed.large.count
> -           < s_info->end - s_info->begin);
> +    return s_info->positions_needed.large.count < s_info->width;
>    else
>      return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
>  }
> @@ -1355,8 +1369,12 @@ record_store (rtx body, bb_info_t bb_inf
>        set_usage_bits (group, offset, width, expr);
>
>        if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, " processing const base store gid=%d[%d..%d)\n",
> -                group_id, (int)offset, (int)(offset+width));
> +       {
> +         fprintf (dump_file, " processing const base store gid=%d",
> +                  group_id);
> +         print_range (dump_file, offset, width);
> +         fprintf (dump_file, "\n");
> +       }
>      }
>    else
>      {
> @@ -1368,8 +1386,11 @@ record_store (rtx body, bb_info_t bb_inf
>        group_id = -1;
>
>        if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, " processing cselib store [%d..%d)\n",
> -                (int)offset, (int)(offset+width));
> +       {
> +         fprintf (dump_file, " processing cselib store ");
> +         print_range (dump_file, offset, width);
> +         fprintf (dump_file, "\n");
> +       }
>      }
>
>    const_rhs = rhs = NULL_RTX;
> @@ -1435,18 +1456,21 @@ record_store (rtx body, bb_info_t bb_inf
>         {
>           HOST_WIDE_INT i;
>           if (dump_file && (dump_flags & TDF_DETAILS))
> -           fprintf (dump_file, "    trying store in insn=%d gid=%d[%d..%d)\n",
> -                    INSN_UID (ptr->insn), s_info->group_id,
> -                    (int)s_info->begin, (int)s_info->end);
> +           {
> +             fprintf (dump_file, "    trying store in insn=%d gid=%d",
> +                      INSN_UID (ptr->insn), s_info->group_id);
> +             print_range (dump_file, s_info->offset, s_info->width);
> +             fprintf (dump_file, "\n");
> +           }
>
>           /* Even if PTR won't be eliminated as unneeded, if both
>              PTR and this insn store the same constant value, we might
>              eliminate this insn instead.  */
>           if (s_info->const_rhs
>               && const_rhs
> -             && offset >= s_info->begin
> -             && offset + width <= s_info->end
> -             && all_positions_needed_p (s_info, offset - s_info->begin,
> +             && known_subrange_p (offset, width,
> +                                  s_info->offset, s_info->width)
> +             && all_positions_needed_p (s_info, offset - s_info->offset,
>                                          width))
>             {
>               if (GET_MODE (mem) == BLKmode)
> @@ -1462,8 +1486,7 @@ record_store (rtx body, bb_info_t bb_inf
>                 {
>                   rtx val;
>                   start_sequence ();
> -                 val = get_stored_val (s_info, GET_MODE (mem),
> -                                       offset, offset + width,
> +                 val = get_stored_val (s_info, GET_MODE (mem), offset, width,
>                                         BLOCK_FOR_INSN (insn_info->insn),
>                                         true);
>                   if (get_insns () != NULL)
> @@ -1474,10 +1497,18 @@ record_store (rtx body, bb_info_t bb_inf
>                 }
>             }
>
> -         for (i = MAX (offset, s_info->begin);
> -              i < offset + width && i < s_info->end;
> -              i++)
> -           set_position_unneeded (s_info, i - s_info->begin);
> +         if (known_subrange_p (s_info->offset, s_info->width, offset, width))
> +           /* The new store touches every byte that S_INFO does.  */
> +           set_all_positions_unneeded (s_info);
> +         else
> +           {
> +             HOST_WIDE_INT begin_unneeded = offset - s_info->offset;
> +             HOST_WIDE_INT end_unneeded = begin_unneeded + width;
> +             begin_unneeded = MAX (begin_unneeded, 0);
> +             end_unneeded = MIN (end_unneeded, s_info->width);
> +             for (i = begin_unneeded; i < end_unneeded; ++i)
> +               set_position_unneeded (s_info, i);
> +           }
>         }
>        else if (s_info->rhs)
>         /* Need to see if it is possible for this store to overwrite
> @@ -1535,8 +1566,8 @@ record_store (rtx body, bb_info_t bb_inf
>        store_info->positions_needed.small_bitmask = lowpart_bitmask (width);
>      }
>    store_info->group_id = group_id;
> -  store_info->begin = offset;
> -  store_info->end = offset + width;
> +  store_info->offset = offset;
> +  store_info->width = width;
>    store_info->is_set = GET_CODE (body) == SET;
>    store_info->rhs = rhs;
>    store_info->const_rhs = const_rhs;
> @@ -1700,39 +1731,38 @@ look_for_hardregs (rtx x, const_rtx pat
>  }
>
>  /* Helper function for replace_read and record_store.
> -   Attempt to return a value stored in STORE_INFO, from READ_BEGIN
> -   to one before READ_END bytes read in READ_MODE.  Return NULL
> +   Attempt to return a value of mode READ_MODE stored in STORE_INFO,
> +   consisting of READ_WIDTH bytes starting from READ_OFFSET.  Return NULL
>     if not successful.  If REQUIRE_CST is true, return always constant.  */
>
>  static rtx
>  get_stored_val (store_info *store_info, machine_mode read_mode,
> -               HOST_WIDE_INT read_begin, HOST_WIDE_INT read_end,
> +               HOST_WIDE_INT read_offset, HOST_WIDE_INT read_width,
>                 basic_block bb, bool require_cst)
>  {
>    machine_mode store_mode = GET_MODE (store_info->mem);
> -  int shift;
> -  int access_size; /* In bytes.  */
> +  HOST_WIDE_INT gap;
>    rtx read_reg;
>
>    /* To get here the read is within the boundaries of the write so
>       shift will never be negative.  Start out with the shift being in
>       bytes.  */
>    if (store_mode == BLKmode)
> -    shift = 0;
> +    gap = 0;
>    else if (BYTES_BIG_ENDIAN)
> -    shift = store_info->end - read_end;
> +    gap = ((store_info->offset + store_info->width)
> +          - (read_offset + read_width));
>    else
> -    shift = read_begin - store_info->begin;
> -
> -  access_size = shift + GET_MODE_SIZE (read_mode);
> -
> -  /* From now on it is bits.  */
> -  shift *= BITS_PER_UNIT;
> +    gap = read_offset - store_info->offset;
>
> -  if (shift)
> -    read_reg = find_shift_sequence (access_size, store_info, read_mode, shift,
> -                                   optimize_bb_for_speed_p (bb),
> -                                   require_cst);
> +  if (gap != 0)
> +    {
> +      HOST_WIDE_INT shift = gap * BITS_PER_UNIT;
> +      HOST_WIDE_INT access_size = GET_MODE_SIZE (read_mode) + gap;
> +      read_reg = find_shift_sequence (access_size, store_info, read_mode,
> +                                     shift, optimize_bb_for_speed_p (bb),
> +                                     require_cst);
> +    }
>    else if (store_mode == BLKmode)
>      {
>        /* The store is a memset (addr, const_val, const_size).  */
> @@ -1835,7 +1865,7 @@ replace_read (store_info *store_info, in
>    start_sequence ();
>    bb = BLOCK_FOR_INSN (read_insn->insn);
>    read_reg = get_stored_val (store_info,
> -                            read_mode, read_info->begin, read_info->end,
> +                            read_mode, read_info->offset, read_info->width,
>                              bb, false);
>    if (read_reg == NULL_RTX)
>      {
> @@ -1986,8 +2016,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>    read_info = read_info_type_pool.allocate ();
>    read_info->group_id = group_id;
>    read_info->mem = mem;
> -  read_info->begin = offset;
> -  read_info->end = offset + width;
> +  read_info->offset = offset;
> +  read_info->width = width;
>    read_info->next = insn_info->read_rec;
>    insn_info->read_rec = read_info;
>    if (group_id < 0)
> @@ -2013,8 +2043,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>             fprintf (dump_file, " processing const load gid=%d[BLK]\n",
>                      group_id);
>           else
> -           fprintf (dump_file, " processing const load gid=%d[%d..%d)\n",
> -                    group_id, (int)offset, (int)(offset+width));
> +           {
> +             fprintf (dump_file, " processing const load gid=%d", group_id);
> +             print_range (dump_file, offset, width);
> +             fprintf (dump_file, "\n");
> +           }
>         }
>
>        while (i_ptr)
> @@ -2052,19 +2085,19 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>               else
>                 {
>                   if (store_info->rhs
> -                     && offset >= store_info->begin
> -                     && offset + width <= store_info->end
> +                     && known_subrange_p (offset, width, store_info->offset,
> +                                          store_info->width)
>                       && all_positions_needed_p (store_info,
> -                                                offset - store_info->begin,
> +                                                offset - store_info->offset,
>                                                  width)
>                       && replace_read (store_info, i_ptr, read_info,
>                                        insn_info, loc, bb_info->regs_live))
>                     return;
>
>                   /* The bases are the same, just see if the offsets
> -                    overlap.  */
> -                 if ((offset < store_info->end)
> -                     && (offset + width > store_info->begin))
> +                    could overlap.  */
> +                 if (ranges_may_overlap_p (offset, width, store_info->offset,
> +                                           store_info->width))
>                     remove = true;
>                 }
>             }
> @@ -2119,11 +2152,10 @@ check_mem_read_rtx (rtx *loc, bb_info_t
>           if (store_info->rhs
>               && store_info->group_id == -1
>               && store_info->cse_base == base
> -             && width != -1
> -             && offset >= store_info->begin
> -             && offset + width <= store_info->end
> +             && known_subrange_p (offset, width, store_info->offset,
> +                                  store_info->width)
>               && all_positions_needed_p (store_info,
> -                                        offset - store_info->begin, width)
> +                                        offset - store_info->offset, width)
>               && replace_read (store_info, i_ptr,  read_info, insn_info, loc,
>                                bb_info->regs_live))
>             return;
> @@ -2775,16 +2807,19 @@ scan_stores (store_info *store_info, bit
>        group_info *group_info
>         = rtx_group_vec[store_info->group_id];
>        if (group_info->process_globally)
> -       for (i = store_info->begin; i < store_info->end; i++)
> -         {
> -           int index = get_bitmap_index (group_info, i);
> -           if (index != 0)
> -             {
> -               bitmap_set_bit (gen, index);
> -               if (kill)
> -                 bitmap_clear_bit (kill, index);
> -             }
> -         }
> +       {
> +         HOST_WIDE_INT end = store_info->offset + store_info->width;
> +         for (i = store_info->offset; i < end; i++)
> +           {
> +             int index = get_bitmap_index (group_info, i);
> +             if (index != 0)
> +               {
> +                 bitmap_set_bit (gen, index);
> +                 if (kill)
> +                   bitmap_clear_bit (kill, index);
> +               }
> +           }
> +       }
>        store_info = store_info->next;
>      }
>  }
> @@ -2834,9 +2869,9 @@ scan_reads (insn_info_t insn_info, bitma
>             {
>               if (i == read_info->group_id)
>                 {
> -                 if (read_info->begin > read_info->end)
> +                 if (!known_size_p (read_info->width))
>                     {
> -                     /* Begin > end for block mode reads.  */
> +                     /* Handle block mode reads.  */
>                       if (kill)
>                         bitmap_ior_into (kill, group->group_kill);
>                       bitmap_and_compl_into (gen, group->group_kill);
> @@ -2846,7 +2881,8 @@ scan_reads (insn_info_t insn_info, bitma
>                       /* The groups are the same, just process the
>                          offsets.  */
>                       HOST_WIDE_INT j;
> -                     for (j = read_info->begin; j < read_info->end; j++)
> +                     HOST_WIDE_INT end = read_info->offset + read_info->width;
> +                     for (j = read_info->offset; j < end; j++)
>                         {
>                           int index = get_bitmap_index (group, j);
>                           if (index != 0)
> @@ -3265,7 +3301,8 @@ dse_step5 (void)
>               HOST_WIDE_INT i;
>               group_info *group_info = rtx_group_vec[store_info->group_id];
>
> -             for (i = store_info->begin; i < store_info->end; i++)
> +             HOST_WIDE_INT end = store_info->offset + store_info->width;
> +             for (i = store_info->offset; i < end; i++)
>                 {
>                   int index = get_bitmap_index (group_info, i);
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 11:59   ` Richard Biener
@ 2017-10-26 12:18     ` Richard Sandiford
  2017-10-26 12:46       ` Richard Biener
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-26 12:18 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds a POD version of fixed_size_mode.  The only current use
>> is for storing the __builtin_apply and __builtin_result register modes,
>> which were made fixed_size_modes by the previous patch.
>
> Bah - can we update our host compiler to C++11/14 please ...?
> (maybe requiring that build with GCC 4.8 as host compiler works,
> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).

That'd be great :-)  It would avoid all the poly_int_pod stuff too,
and allow some clean-up of wide-int.h.

Thanks for the reviews,
Richard


>
> Ok.
>
> Thanks,
> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>             Alan Hayward  <alan.hayward@arm.com>
>>             David Sherwood  <david.sherwood@arm.com>
>>
>> gcc/
>>         * coretypes.h (fixed_size_mode): Declare.
>>         (fixed_size_mode_pod): New typedef.
>>         * builtins.h (target_builtins::x_apply_args_mode)
>>         (target_builtins::x_apply_result_mode): Change type to
>>         fixed_size_mode_pod.
>>         * builtins.c (apply_args_size, apply_result_size, result_vector)
>>         (expand_builtin_apply_args_1, expand_builtin_apply)
>>         (expand_builtin_return): Update accordingly.
>>
>> Index: gcc/coretypes.h
>> ===================================================================
>> --- gcc/coretypes.h     2017-09-11 17:10:58.656085547 +0100
>> +++ gcc/coretypes.h     2017-10-23 11:42:57.592545063 +0100
>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>  class scalar_int_mode;
>>  class scalar_float_mode;
>>  class complex_mode;
>> +class fixed_size_mode;
>>  template<typename> class opt_mode;
>>  typedef opt_mode<scalar_mode> opt_scalar_mode;
>>  typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
>> @@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
>>  template<typename> class pod_mode;
>>  typedef pod_mode<scalar_mode> scalar_mode_pod;
>>  typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
>> +typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
>>
>>  /* Subclasses of rtx_def, using indentation to show the class
>>     hierarchy, along with the relevant invariant.
>> Index: gcc/builtins.h
>> ===================================================================
>> --- gcc/builtins.h      2017-08-30 12:18:46.602740973 +0100
>> +++ gcc/builtins.h      2017-10-23 11:42:57.592545063 +0100
>> @@ -29,14 +29,14 @@ struct target_builtins {
>>       the register is not used for calling a function.  If the machine
>>       has register windows, this gives only the outbound registers.
>>       INCOMING_REGNO gives the corresponding inbound register.  */
>> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>
>>    /* For each register that may be used for returning values, this gives
>>       a mode used to copy the register's value.  VOIDmode indicates the
>>       register is not used for returning values.  If the machine has
>>       register windows, this gives only the outbound registers.
>>       INCOMING_REGNO gives the corresponding inbound register.  */
>> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>  };
>>
>>  extern struct target_builtins default_target_builtins;
>> Index: gcc/builtins.c
>> ===================================================================
>> --- gcc/builtins.c      2017-10-23 11:41:23.140260335 +0100
>> +++ gcc/builtins.c      2017-10-23 11:42:57.592545063 +0100
>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>    static int size = -1;
>>    int align;
>>    unsigned int regno;
>> -  machine_mode mode;
>>
>>    /* The values computed by this function never change.  */
>>    if (size < 0)
>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>        for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>         if (FUNCTION_ARG_REGNO_P (regno))
>>           {
>> -           mode = targetm.calls.get_raw_arg_mode (regno);
>> +           fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>
>>             gcc_assert (mode != VOIDmode);
>>
>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>           }
>>         else
>>           {
>> -           apply_args_mode[regno] = VOIDmode;
>> +           apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>           }
>>      }
>>    return size;
>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>  {
>>    static int size = -1;
>>    int align, regno;
>> -  machine_mode mode;
>>
>>    /* The values computed by this function never change.  */
>>    if (size < 0)
>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>>        for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>         if (targetm.calls.function_value_regno_p (regno))
>>           {
>> -           mode = targetm.calls.get_raw_result_mode (regno);
>> +           fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>>
>>             gcc_assert (mode != VOIDmode);
>>
>> @@ -1421,7 +1419,7 @@ apply_result_size (void)
>>             apply_result_mode[regno] = mode;
>>           }
>>         else
>> -         apply_result_mode[regno] = VOIDmode;
>> +         apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>
>>        /* Allow targets that use untyped_call and untyped_return to override
>>          the size so that machine-specific information can be stored here.  */
>> @@ -1440,7 +1438,7 @@ apply_result_size (void)
>>  result_vector (int savep, rtx result)
>>  {
>>    int regno, size, align, nelts;
>> -  machine_mode mode;
>> +  fixed_size_mode mode;
>>    rtx reg, mem;
>>    rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
>>
>> @@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
>>  {
>>    rtx registers, tem;
>>    int size, align, regno;
>> -  machine_mode mode;
>> +  fixed_size_mode mode;
>>    rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
>>
>>    /* Create a block where the arg-pointer, structure value address,
>> @@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
>>  expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
>>  {
>>    int size, align, regno;
>> -  machine_mode mode;
>> +  fixed_size_mode mode;
>>    rtx incoming_args, result, reg, dest, src;
>>    rtx_call_insn *call_insn;
>>    rtx old_stack_level = 0;
>> @@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
>>  expand_builtin_return (rtx result)
>>  {
>>    int size, align, regno;
>> -  machine_mode mode;
>> +  fixed_size_mode mode;
>>    rtx reg;
>>    rtx_insn *call_fusage = 0;
>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [21/nn] Minor vn_reference_lookup_3 tweak
  2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
@ 2017-10-26 12:18   ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:18 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The repeated checks for MEM_REF made this code hard to convert to
> poly_ints as-is.  Hopefully the new structure also makes it clearer
> at a glance what the two cases are.
>
>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid repeated
>         checks for MEM_REF.
>
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c        2017-10-23 11:47:03.852769480 +0100
> +++ gcc/tree-ssa-sccvn.c        2017-10-23 11:47:44.596155858 +0100
> @@ -2234,6 +2234,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>           || offset % BITS_PER_UNIT != 0
>           || ref->size % BITS_PER_UNIT != 0)
>         return (void *)-1;
> +      at = offset / BITS_PER_UNIT;

can you move this just

>        /* Extract a pointer base and an offset for the destination.  */
>        lhs = gimple_call_arg (def_stmt, 0);
> @@ -2301,19 +2302,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
>        copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
>
>        /* The bases of the destination and the references have to agree.  */

here? Ok with that change.

Richard.

> -      if ((TREE_CODE (base) != MEM_REF
> -          && !DECL_P (base))
> -         || (TREE_CODE (base) == MEM_REF
> -             && (TREE_OPERAND (base, 0) != lhs
> -                 || !tree_fits_uhwi_p (TREE_OPERAND (base, 1))))
> -         || (DECL_P (base)
> -             && (TREE_CODE (lhs) != ADDR_EXPR
> -                 || TREE_OPERAND (lhs, 0) != base)))
> +      if (TREE_CODE (base) == MEM_REF)
> +       {
> +         if (TREE_OPERAND (base, 0) != lhs
> +             || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
> +           return (void *) -1;
> +         at += tree_to_uhwi (TREE_OPERAND (base, 1));
> +       }
> +      else if (!DECL_P (base)
> +              || TREE_CODE (lhs) != ADDR_EXPR
> +              || TREE_OPERAND (lhs, 0) != base)
>         return (void *)-1;
>
> -      at = offset / BITS_PER_UNIT;
> -      if (TREE_CODE (base) == MEM_REF)
> -       at += tree_to_uhwi (TREE_OPERAND (base, 1));
>        /* If the access is completely outside of the memcpy destination
>          area there is no aliasing.  */
>        if (lhs_offset >= at + maxsize / BITS_PER_UNIT

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab Richard Sandiford
@ 2017-10-26 12:26   ` Richard Biener
  2017-10-26 12:43     ` Richard Biener
  2017-12-15  0:34   ` Richard Sandiford
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:26 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
> is a tcc_constant.
>
> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
> vectors.  This avoids the need to handle combinations of VECTOR_CST
> and VEC_SERIES_CST.

Similar to the other patch can you document and verify that VEC_SERIES_CST
is only used on variable length vectors?

Ok with that change.

Thanks,
Richard.

>
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
>         * doc/md.texi (vec_series@var{m}): Document.
>         * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
>         * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
>         codes.
>         (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
>         (build_vec_series_cst, build_vec_series): Declare.
>         * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>         (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
>         (build_vec_series_cst, build_vec_series): New functions.
>         * cfgexpand.c (expand_debug_expr): Handle the new codes.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
>         * gimple-expr.h (is_gimple_constant): Likewise.
>         * gimplify.c (gimplify_expr): Likewise.
>         * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>         * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>         (func_checker::compare_operand): Likewise.
>         * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>         * print-tree.c (print_node): Likewise.
>         * tree-ssa-loop.c (for_each_index): Likewise.
>         * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>         * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>         (ao_ref_init_from_vn_reference): Likewise.
>         * varasm.c (const_hash_1, compare_constant): Likewise.
>         * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
>         (fold_checksum_tree): Likewise.
>         (vec_series_equivalent_p): New function.
>         (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
>         * expmed.c (make_tree): Handle VEC_SERIES.
>         * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
>         (expand_expr_real_2): Handle VEC_SERIES_EXPR.
>         (expand_expr_real_1): Handle VEC_SERIES_CST.
>         * optabs.def (vec_series_optab): New optab.
>         * optabs.h (expand_vec_series_expr): Declare.
>         * optabs.c (expand_vec_series_expr): New function.
>         * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
>         * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
>         (verify_gimple_assign_single): Handle VEC_SERIES_CST.
>         * tree-vect-generic.c (expand_vector_operations_1): Check that
>         the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi        2017-10-23 11:41:51.760448406 +0100
> +++ gcc/doc/generic.texi        2017-10-23 11:42:34.910720660 +0100
> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>  @tindex COMPLEX_CST
>  @tindex VECTOR_CST
>  @tindex VEC_DUPLICATE_CST
> +@tindex VEC_SERIES_CST
>  @tindex STRING_CST
>  @findex TREE_STRING_LENGTH
>  @findex TREE_STRING_POINTER
> @@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>  element of a @code{VECTOR_CST}.
>
> +@item VEC_SERIES_CST
> +These nodes represent a vector constant in which element @var{i}
> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
> +constant @var{base} and @var{step}.  The value of @var{base} is
> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
> +given by @code{VEC_SERIES_CST_STEP}.
> +
> +These nodes are restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item STRING_CST
>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>  returns the length of the string, as an @code{int}.  The
> @@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
>  @node Vectors
>  @subsection Vectors
>  @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1721,6 +1733,14 @@ a value from @code{enum annot_expr_kind}
>  This node has a single operand and represents a vector in which every
>  element is equal to that operand.
>
> +@item VEC_SERIES_EXPR
> +This node represents a vector formed from a scalar base and step,
> +given as the first and second operands respectively.  Element @var{i}
> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
> +
> +This node is restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-10-23 11:41:51.761413027 +0100
> +++ gcc/doc/md.texi     2017-10-23 11:42:34.911720660 +0100
> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{vec_series@var{m}} instruction pattern
> +@item @samp{vec_series@var{m}}
> +Initialize vector output operand 0 so that element @var{i} is equal to
> +operand 1 plus @var{i} times operand 2.  In other words, create a linear
> +series whose base value is operand 1 and whose step is operand 2.
> +
> +The vector output has mode @var{m} and the scalar inputs have the mode
> +appropriate for one element of @var{m}.  This pattern is not used for
> +floating-point vectors, in order to avoid having to specify the
> +rounding behavior for @var{i} > 1.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2017-10-23 11:41:51.774917721 +0100
> +++ gcc/tree.def        2017-10-23 11:42:34.924720660 +0100
> @@ -308,6 +308,10 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
>     VEC_DUPLICATE_CST_ELT.  */
>  DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which element i is equal to
> +   VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP.  */
> +DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
> +
>  /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
>  DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -541,6 +545,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
>  /* Represents a vector in which every element is equal to operand 0.  */
>  DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>
> +/* Vector series created from a start (base) value and a step.
> +
> +   A = VEC_SERIES_EXPR (B, C)
> +
> +   means
> +
> +   for (i = 0; i < N; i++)
> +     A[i] = B + C * i;  */
> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2017-10-23 11:41:51.775882341 +0100
> +++ gcc/tree.h  2017-10-23 11:42:34.925720660 +0100
> @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
>  #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
>    (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> -   this means there was an overflow in folding.  */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
> +   or VEC_SERES_CST, this means there was an overflow in folding.  */
>
>  #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1034,6 +1034,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
>  #define VEC_DUPLICATE_CST_ELT(NODE) \
>    (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
>
> +/* In a VEC_SERIES_CST node.  */
> +#define VEC_SERIES_CST_BASE(NODE) \
> +  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
> +#define VEC_SERIES_CST_STEP(NODE) \
> +  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
> +
>  /* Define fields and accessors for some special-purpose tree nodes.  */
>
>  #define IDENTIFIER_LENGTH(NODE) \
> @@ -4030,9 +4036,11 @@ extern tree build_int_cstu (tree type, u
>  extern tree build_int_cst_type (tree, HOST_WIDE_INT);
>  extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
>  extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
> +extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
>  extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
>  extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
>  extern tree build_vector_from_val (tree, tree);
> +extern tree build_vec_series (tree, tree, tree);
>  extern void recompute_constructor_flags (tree);
>  extern void verify_constructor_flags (tree);
>  extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-10-23 11:41:51.774917721 +0100
> +++ gcc/tree.c  2017-10-23 11:42:34.924720660 +0100
> @@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
>      case COMPLEX_CST:          return TS_COMPLEX;
>      case VECTOR_CST:           return TS_VECTOR;
>      case VEC_DUPLICATE_CST:    return TS_VECTOR;
> +    case VEC_SERIES_CST:       return TS_VECTOR;
>      case STRING_CST:           return TS_STRING;
>        /* tcc_exceptional cases.  */
>      case ERROR_MARK:           return TS_COMMON;
> @@ -818,6 +819,8 @@ tree_code_size (enum tree_code code)
>         case COMPLEX_CST:       return sizeof (struct tree_complex);
>         case VECTOR_CST:        return sizeof (struct tree_vector);
>         case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
> +       case VEC_SERIES_CST:
> +         return sizeof (struct tree_vector) + sizeof (tree);
>         case STRING_CST:        gcc_unreachable ();
>         default:
>           return lang_hooks.tree_size (code);
> @@ -880,6 +883,9 @@ tree_size (const_tree node)
>      case VEC_DUPLICATE_CST:
>        return sizeof (struct tree_vector);
>
> +    case VEC_SERIES_CST:
> +      return sizeof (struct tree_vector) + sizeof (tree);
> +
>      case STRING_CST:
>        return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1711,6 +1717,31 @@ build_vec_duplicate_cst (tree type, tree
>    return t;
>  }
>
> +/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
> +
> +   Note that this function is only suitable for callers that specifically
> +   need a VEC_SERIES_CST node.  Use build_vec_series to build a general
> +   series vector from a general base and step.  */
> +
> +tree
> +build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
> +{
> +  int length = sizeof (struct tree_vector) + sizeof (tree);
> +
> +  record_node_allocation_statistics (VEC_SERIES_CST, length);
> +
> +  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> +  TREE_SET_CODE (t, VEC_SERIES_CST);
> +  TREE_TYPE (t) = type;
> +  t->base.u.nelts = 2;
> +  VEC_SERIES_CST_BASE (t) = base;
> +  VEC_SERIES_CST_STEP (t) = step;
> +  TREE_CONSTANT (t) = 1;
> +
> +  return t;
> +}
> +
>  /* Build a newly constructed VECTOR_CST node of length LEN.  */
>
>  tree
> @@ -1821,6 +1852,19 @@ build_vector_from_val (tree vectype, tre
>      }
>  }
>
> +/* Build a vector series of type TYPE in which element I has the value
> +   BASE + I * STEP.  */
> +
> +tree
> +build_vec_series (tree type, tree base, tree step)
> +{
> +  if (integer_zerop (step))
> +    return build_vector_from_val (type, base);
> +  if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
> +    return build_vec_series_cst (type, base, step);
> +  return build2 (VEC_SERIES_EXPR, type, base, step);
> +}
> +
>  /* Something has messed with the elements of CONSTRUCTOR C after it was built;
>     calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
>
> @@ -7136,6 +7180,10 @@ add_expr (const_tree t, inchash::hash &h
>      case VEC_DUPLICATE_CST:
>        inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
>        return;
> +    case VEC_SERIES_CST:
> +      inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
> +      inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
> +      return;
>      case SSA_NAME:
>        /* We can just compare by pointer.  */
>        hstate.add_wide_int (SSA_NAME_VERSION (t));
> @@ -11150,6 +11198,7 @@ #define WALK_SUBTREE_TAIL(NODE)                         \
>      case FIXED_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>      case BLOCK:
>      case PLACEHOLDER_EXPR:
> @@ -12442,6 +12491,15 @@ drop_tree_overflow (tree t)
>        if (TREE_OVERFLOW (*elt))
>         *elt = drop_tree_overflow (*elt);
>      }
> +  if (TREE_CODE (t) == VEC_SERIES_CST)
> +    {
> +      tree *elt = &VEC_SERIES_CST_BASE (t);
> +      if (TREE_OVERFLOW (*elt))
> +       *elt = drop_tree_overflow (*elt);
> +      elt = &VEC_SERIES_CST_STEP (t);
> +      if (TREE_OVERFLOW (*elt))
> +       *elt = drop_tree_overflow (*elt);
> +    }
>    return t;
>  }
>
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-10-23 11:41:51.760448406 +0100
> +++ gcc/cfgexpand.c     2017-10-23 11:42:34.909720660 +0100
> @@ -5051,6 +5051,8 @@ expand_debug_expr (tree exp)
>      case VEC_PERM_EXPR:
>      case VEC_DUPLICATE_CST:
>      case VEC_DUPLICATE_EXPR:
> +    case VEC_SERIES_CST:
> +    case VEC_SERIES_EXPR:
>        return NULL;
>
>      /* Misc codes.  */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2017-10-23 11:41:51.772023858 +0100
> +++ gcc/tree-pretty-print.c     2017-10-23 11:42:34.921720660 +0100
> @@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, ", ... }");
>        break;
>
> +    case VEC_SERIES_CST:
> +      pp_string (pp, "{ ");
> +      dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
> +      pp_string (pp, ", +, ");
> +      dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
> +      pp_string (pp, "}");
> +      break;
> +
>      case FUNCTION_TYPE:
>      case METHOD_TYPE:
>        dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, " > ");
>        break;
>
> +    case VEC_SERIES_EXPR:
>      case VEC_WIDEN_MULT_HI_EXPR:
>      case VEC_WIDEN_MULT_LO_EXPR:
>      case VEC_WIDEN_MULT_EVEN_EXPR:
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c     2017-10-23 11:41:51.763342269 +0100
> +++ gcc/dwarf2out.c     2017-10-23 11:42:34.913720660 +0100
> @@ -18863,6 +18863,7 @@ rtl_for_decl_init (tree init, tree type)
>           {
>           case VECTOR_CST:
>           case VEC_DUPLICATE_CST:
> +         case VEC_SERIES_CST:
>             break;
>           case CONSTRUCTOR:
>             if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h   2017-10-23 11:41:51.765271511 +0100
> +++ gcc/gimple-expr.h   2017-10-23 11:42:34.916720660 +0100
> @@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>        return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2017-10-23 11:41:51.766236132 +0100
> +++ gcc/gimplify.c      2017-10-23 11:42:34.917720660 +0100
> @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>         case COMPLEX_CST:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>           /* Drop the overflow flag on constants, we do not want
>              that in the GIMPLE IL.  */
>           if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c       2017-10-23 11:41:51.767200753 +0100
> +++ gcc/graphite-scop-detection.c       2017-10-23 11:42:34.917720660 +0100
> @@ -1244,6 +1244,7 @@ scan_tree_for_params (sese_info_p s, tre
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>        break;
>
>     default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c        2017-10-23 11:41:51.767200753 +0100
> +++ gcc/ipa-icf-gimple.c        2017-10-23 11:42:34.917720660 +0100
> @@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>      case REAL_CST:
>        {
> @@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>      case REAL_CST:
>      case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c       2017-10-23 11:41:51.768165374 +0100
> +++ gcc/ipa-icf.c       2017-10-23 11:42:34.918720660 +0100
> @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>        inchash::add_expr (exp, hstate);
>        break;
>      case CONSTRUCTOR:
> @@ -2034,6 +2035,11 @@ sem_variable::equals (tree t1, tree t2)
>      case VEC_DUPLICATE_CST:
>        return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
>                                    VEC_DUPLICATE_CST_ELT (t2));
> +     case VEC_SERIES_CST:
> +       return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
> +                                    VEC_SERIES_CST_BASE (t2))
> +              && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
> +                                       VEC_SERIES_CST_STEP (t2)));
>      case ARRAY_REF:
>      case ARRAY_RANGE_REF:
>        {
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c    2017-10-23 11:41:51.769129995 +0100
> +++ gcc/print-tree.c    2017-10-23 11:42:34.919720660 +0100
> @@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
>           print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
>           break;
>
> +       case VEC_SERIES_CST:
> +         print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
> +         print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
> +         break;
> +
>         case COMPLEX_CST:
>           print_node (file, "real", TREE_REALPART (node), indent + 4);
>           print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
> +++ gcc/tree-ssa-loop.c 2017-10-23 11:42:34.921720660 +0100
> @@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
>         case RESULT_DECL:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>         case COMPLEX_CST:
>         case INTEGER_CST:
>         case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c  2017-10-23 11:41:51.772023858 +0100
> +++ gcc/tree-ssa-pre.c  2017-10-23 11:42:34.922720660 +0100
> @@ -2676,6 +2676,7 @@ create_component_ref_by_pieces_1 (basic_
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case REAL_CST:
>      case CONSTRUCTOR:
>      case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c        2017-10-23 11:41:51.773953100 +0100
> +++ gcc/tree-ssa-sccvn.c        2017-10-23 11:42:34.922720660 +0100
> @@ -859,6 +859,7 @@ copy_reference_ops_from_ref (tree ref, v
>         case COMPLEX_CST:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>         case REAL_CST:
>         case FIXED_CST:
>         case CONSTRUCTOR:
> @@ -1052,6 +1053,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
>         case COMPLEX_CST:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>         case REAL_CST:
>         case CONSTRUCTOR:
>         case CONST_DECL:
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c        2017-10-23 11:41:51.775882341 +0100
> +++ gcc/varasm.c        2017-10-23 11:42:34.927720660 +0100
> @@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
>        return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
>               + const_hash_1 (TREE_OPERAND (exp, 1)));
>
> +    case VEC_SERIES_CST:
> +      return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
> +             + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
> +
>      CASE_CONVERT:
>        return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> @@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
>        return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
>                                VEC_DUPLICATE_CST_ELT (t2));
>
> +    case VEC_SERIES_CST:
> +      return (compare_constant (VEC_SERIES_CST_BASE (t1),
> +                               VEC_SERIES_CST_BASE (t2))
> +             && compare_constant (VEC_SERIES_CST_STEP (t1),
> +                                  VEC_SERIES_CST_STEP (t2)));
> +
>      case CONSTRUCTOR:
>        {
>         vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-10-23 11:41:51.765271511 +0100
> +++ gcc/fold-const.c    2017-10-23 11:42:34.916720660 +0100
> @@ -421,6 +421,10 @@ negate_expr_p (tree t)
>      case VEC_DUPLICATE_CST:
>        return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
>
> +    case VEC_SERIES_CST:
> +      return (negate_expr_p (VEC_SERIES_CST_BASE (t))
> +             && negate_expr_p (VEC_SERIES_CST_STEP (t)));
> +
>      case COMPLEX_EXPR:
>        return negate_expr_p (TREE_OPERAND (t, 0))
>              && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
>         return build_vector_from_val (type, sub);
>        }
>
> +    case VEC_SERIES_CST:
> +      {
> +       tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
> +       if (!neg_base)
> +         return NULL_TREE;
> +       tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
> +       if (!neg_step)
> +         return NULL_TREE;
> +       return build_vec_series (type, neg_base, neg_step);
> +      }
> +
>      case COMPLEX_EXPR:
>        if (negate_expr_p (t))
>         return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
>    return int_const_binop_1 (code, arg1, arg2, 1);
>  }
>
> +/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
> +   and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
> +   The step will be zero for VEC_DUPLICATE_CST.  */
> +
> +static bool
> +vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
> +{
> +  if (TREE_CODE (exp) == VEC_SERIES_CST)
> +    {
> +      *base_out = VEC_SERIES_CST_BASE (exp);
> +      *step_out = VEC_SERIES_CST_STEP (exp);
> +      return true;
> +    }
> +  if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
> +    {
> +      *base_out = VEC_DUPLICATE_CST_ELT (exp);
> +      *step_out = build_zero_cst (TREE_TYPE (*base_out));
> +      return true;
> +    }
> +  return false;
> +}
> +
>  /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
>     constant.  We assume ARG1 and ARG2 have the same data type, or at least
>     are the same kind of constant and the same machine mode.  Return zero if
> @@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
>        return build_vector_from_val (TREE_TYPE (arg1), sub);
>      }
>
> +  tree base1, step1, base2, step2;
> +  if ((code == PLUS_EXPR || code == MINUS_EXPR)
> +      && vec_series_equivalent_p (arg1, &base1, &step1)
> +      && vec_series_equivalent_p (arg2, &base2, &step2))
> +    {
> +      tree new_base = const_binop (code, base1, base2);
> +      if (!new_base)
> +       return NULL_TREE;
> +      tree new_step = const_binop (code, step1, step2);
> +      if (!new_step)
> +       return NULL_TREE;
> +      return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
> +    }
> +
>    /* Shifts allow a scalar offset for a vector.  */
>    if (TREE_CODE (arg1) == VECTOR_CST
>        && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
>       result as argument put those cases that need it here.  */
>    switch (code)
>      {
> +    case VEC_SERIES_EXPR:
> +      if (CONSTANT_CLASS_P (arg1)
> +         && CONSTANT_CLASS_P (arg2))
> +       return build_vec_series (type, arg1, arg2);
> +      return NULL_TREE;
> +
>      case COMPLEX_EXPR:
>        if ((TREE_CODE (arg1) == REAL_CST
>            && TREE_CODE (arg2) == REAL_CST)
> @@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
>         return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
>                                 VEC_DUPLICATE_CST_ELT (arg1), flags);
>
> +      case VEC_SERIES_CST:
> +       return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
> +                                VEC_SERIES_CST_BASE (arg1), flags)
> +               && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
> +                                   VEC_SERIES_CST_STEP (arg1), flags));
> +
>        case COMPLEX_CST:
>         return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
>                                  flags)
> @@ -12050,6 +12113,10 @@ fold_checksum_tree (const_tree expr, str
>         case VEC_DUPLICATE_CST:
>           fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
>           break;
> +       case VEC_SERIES_CST:
> +         fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
> +         fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
> +         break;
>         default:
>           break;
>         }
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c        2017-10-23 11:41:39.186050437 +0100
> +++ gcc/expmed.c        2017-10-23 11:42:34.914720660 +0100
> @@ -5253,6 +5253,13 @@ make_tree (tree type, rtx x)
>             tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
>             return build_vector_from_val (type, elt_tree);
>           }
> +       if (GET_CODE (op) == VEC_SERIES)
> +         {
> +           tree itype = TREE_TYPE (type);
> +           tree base_tree = make_tree (itype, XEXP (op, 0));
> +           tree step_tree = make_tree (itype, XEXP (op, 1));
> +           return build_vec_series (type, base_tree, step_tree);
> +         }
>         return make_tree (type, op);
>        }
>
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c   2017-10-23 11:41:25.500318672 +0100
> +++ gcc/gimple-pretty-print.c   2017-10-23 11:42:34.916720660 +0100
> @@ -438,6 +438,7 @@ dump_binary_rhs (pretty_printer *buffer,
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
> +    case VEC_SERIES_EXPR:
>        for (p = get_tree_code_name (code); *p; p++)
>         pp_character (buffer, TOUPPER (*p));
>        pp_string (buffer, " <");
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2017-10-23 11:41:51.771059237 +0100
> +++ gcc/tree-inline.c   2017-10-23 11:42:34.921720660 +0100
> @@ -4003,6 +4003,7 @@ estimate_operator_cost (enum tree_code c
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_DUPLICATE_EXPR:
> +    case VEC_SERIES_EXPR:
>
>        return 1;
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-10-23 11:41:51.764306890 +0100
> +++ gcc/expr.c  2017-10-23 11:42:34.915720660 +0100
> @@ -7704,7 +7704,7 @@ expand_operands (tree exp0, tree exp1, r
>
>
>  /* Expand constant vector element ELT, which has mode MODE.  This is used
> -   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
> +   for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST.  */
>
>  static rtx
>  const_vector_element (scalar_mode mode, const_tree elt)
> @@ -9587,6 +9587,10 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        gcc_assert (target);
>        return target;
>
> +    case VEC_SERIES_EXPR:
> +      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
> +      return expand_vec_series_expr (mode, op0, op1, target);
> +
>      case BIT_INSERT_EXPR:
>        {
>         unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -10044,6 +10048,13 @@ expand_expr_real_1 (tree exp, rtx target
>                                   VEC_DUPLICATE_CST_ELT (exp));
>        return gen_const_vec_duplicate (mode, op0);
>
> +    case VEC_SERIES_CST:
> +      op0 = const_vector_element (GET_MODE_INNER (mode),
> +                                 VEC_SERIES_CST_BASE (exp));
> +      op1 = const_vector_element (GET_MODE_INNER (mode),
> +                                 VEC_SERIES_CST_STEP (exp));
> +      return gen_const_vec_series (mode, op0, op1);
> +
>      case CONST_DECL:
>        if (modifier == EXPAND_WRITE)
>         {
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-10-23 11:41:51.769129995 +0100
> +++ gcc/optabs.def      2017-10-23 11:42:34.919720660 +0100
> @@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>
>  OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-10-23 11:41:51.769129995 +0100
> +++ gcc/optabs.h        2017-10-23 11:42:34.919720660 +0100
> @@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
>  /* Generate code for VEC_COND_EXPR.  */
>  extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>
> +/* Generate code for VEC_SERIES_EXPR.  */
> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
> +
>  /* Generate code for MULT_HIGHPART_EXPR.  */
>  extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-10-23 11:41:51.769129995 +0100
> +++ gcc/optabs.c        2017-10-23 11:42:34.919720660 +0100
> @@ -5693,6 +5693,27 @@ expand_vec_cond_expr (tree vec_cond_type
>    return ops[0].value;
>  }
>
> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
> +   Use TARGET for the result if nonnull and convenient.  */
> +
> +rtx
> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
> +{
> +  struct expand_operand ops[3];
> +  enum insn_code icode;
> +  machine_mode emode = GET_MODE_INNER (vmode);
> +
> +  icode = direct_optab_handler (vec_series_optab, vmode);
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  create_output_operand (&ops[0], target, vmode);
> +  create_input_operand (&ops[1], op0, emode);
> +  create_input_operand (&ops[2], op1, emode);
> +
> +  expand_insn (icode, 3, ops);
> +  return ops[0].value;
> +}
> +
>  /* Generate insns for a vector comparison into a mask.  */
>
>  rtx
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2017-10-23 11:41:51.768165374 +0100
> +++ gcc/optabs-tree.c   2017-10-23 11:42:34.918720660 +0100
> @@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
>      case VEC_DUPLICATE_EXPR:
>        return vec_duplicate_optab;
>
> +    case VEC_SERIES_EXPR:
> +      return vec_series_optab;
> +
>      default:
>        break;
>      }
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-10-23 11:41:51.770094616 +0100
> +++ gcc/tree-cfg.c      2017-10-23 11:42:34.920720660 +0100
> @@ -4119,6 +4119,23 @@ verify_gimple_assign_binary (gassign *st
>        /* Continue with generic binary expression handling.  */
>        break;
>
> +    case VEC_SERIES_EXPR:
> +      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
> +       {
> +         error ("type mismatch in series expression");
> +         debug_generic_expr (rhs1_type);
> +         debug_generic_expr (rhs2_type);
> +         return true;
> +       }
> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> +       {
> +         error ("vector type expected in series expression");
> +         debug_generic_expr (lhs_type);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> @@ -4485,6 +4502,7 @@ verify_gimple_assign_single (gassign *st
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>        return res;
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-10-23 11:41:51.773953100 +0100
> +++ gcc/tree-vect-generic.c     2017-10-23 11:42:34.922720660 +0100
> @@ -1595,7 +1595,8 @@ expand_vector_operations_1 (gimple_stmt_
>    if (rhs_class == GIMPLE_BINARY_RHS)
>      rhs2 = gimple_assign_rhs2 (stmt);
>
> -  if (TREE_CODE (type) != VECTOR_TYPE)
> +  if (!VECTOR_TYPE_P (type)
> +      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
>      return;
>
>    /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-10-26 12:26   ` Richard Biener
@ 2017-10-26 12:43     ` Richard Biener
  2017-11-06 15:21       ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:43 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
>> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
>> is a tcc_constant.
>>
>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>> vectors.  This avoids the need to handle combinations of VECTOR_CST
>> and VEC_SERIES_CST.
>
> Similar to the other patch can you document and verify that VEC_SERIES_CST
> is only used on variable length vectors?
>
> Ok with that change.

Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
via setting step == 0?  I think you can do {1, 1, 1, 1... } + {1, 2,3
,4,5 } constant
folding but you don't implement that.  Propagation can also turn
VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).

Richard.

> Thanks,
> Richard.
>
>>
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>             Alan Hayward  <alan.hayward@arm.com>
>>             David Sherwood  <david.sherwood@arm.com>
>>
>> gcc/
>>         * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
>>         * doc/md.texi (vec_series@var{m}): Document.
>>         * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
>>         * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
>>         codes.
>>         (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
>>         (build_vec_series_cst, build_vec_series): Declare.
>>         * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>>         (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
>>         (build_vec_series_cst, build_vec_series): New functions.
>>         * cfgexpand.c (expand_debug_expr): Handle the new codes.
>>         * tree-pretty-print.c (dump_generic_node): Likewise.
>>         * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
>>         * gimple-expr.h (is_gimple_constant): Likewise.
>>         * gimplify.c (gimplify_expr): Likewise.
>>         * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>>         * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>>         (func_checker::compare_operand): Likewise.
>>         * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>>         * print-tree.c (print_node): Likewise.
>>         * tree-ssa-loop.c (for_each_index): Likewise.
>>         * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>>         * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>>         (ao_ref_init_from_vn_reference): Likewise.
>>         * varasm.c (const_hash_1, compare_constant): Likewise.
>>         * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
>>         (fold_checksum_tree): Likewise.
>>         (vec_series_equivalent_p): New function.
>>         (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
>>         * expmed.c (make_tree): Handle VEC_SERIES.
>>         * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>>         * tree-inline.c (estimate_operator_cost): Likewise.
>>         * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
>>         (expand_expr_real_2): Handle VEC_SERIES_EXPR.
>>         (expand_expr_real_1): Handle VEC_SERIES_CST.
>>         * optabs.def (vec_series_optab): New optab.
>>         * optabs.h (expand_vec_series_expr): Declare.
>>         * optabs.c (expand_vec_series_expr): New function.
>>         * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
>>         * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
>>         (verify_gimple_assign_single): Handle VEC_SERIES_CST.
>>         * tree-vect-generic.c (expand_vector_operations_1): Check that
>>         the operands also have vector type.
>>
>> Index: gcc/doc/generic.texi
>> ===================================================================
>> --- gcc/doc/generic.texi        2017-10-23 11:41:51.760448406 +0100
>> +++ gcc/doc/generic.texi        2017-10-23 11:42:34.910720660 +0100
>> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>>  @tindex COMPLEX_CST
>>  @tindex VECTOR_CST
>>  @tindex VEC_DUPLICATE_CST
>> +@tindex VEC_SERIES_CST
>>  @tindex STRING_CST
>>  @findex TREE_STRING_LENGTH
>>  @findex TREE_STRING_POINTER
>> @@ -1098,6 +1099,16 @@ instead.  The scalar element value is gi
>>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>>  element of a @code{VECTOR_CST}.
>>
>> +@item VEC_SERIES_CST
>> +These nodes represent a vector constant in which element @var{i}
>> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
>> +constant @var{base} and @var{step}.  The value of @var{base} is
>> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
>> +given by @code{VEC_SERIES_CST_STEP}.
>> +
>> +These nodes are restricted to integral types, in order to avoid
>> +specifying the rounding behavior for floating-point types.
>> +
>>  @item STRING_CST
>>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>>  returns the length of the string, as an @code{int}.  The
>> @@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
>>  @node Vectors
>>  @subsection Vectors
>>  @tindex VEC_DUPLICATE_EXPR
>> +@tindex VEC_SERIES_EXPR
>>  @tindex VEC_LSHIFT_EXPR
>>  @tindex VEC_RSHIFT_EXPR
>>  @tindex VEC_WIDEN_MULT_HI_EXPR
>> @@ -1721,6 +1733,14 @@ a value from @code{enum annot_expr_kind}
>>  This node has a single operand and represents a vector in which every
>>  element is equal to that operand.
>>
>> +@item VEC_SERIES_EXPR
>> +This node represents a vector formed from a scalar base and step,
>> +given as the first and second operands respectively.  Element @var{i}
>> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
>> +
>> +This node is restricted to integral types, in order to avoid
>> +specifying the rounding behavior for floating-point types.
>> +
>>  @item VEC_LSHIFT_EXPR
>>  @itemx VEC_RSHIFT_EXPR
>>  These nodes represent whole vector left and right shifts, respectively.
>> Index: gcc/doc/md.texi
>> ===================================================================
>> --- gcc/doc/md.texi     2017-10-23 11:41:51.761413027 +0100
>> +++ gcc/doc/md.texi     2017-10-23 11:42:34.911720660 +0100
>> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>>
>>  This pattern is not allowed to @code{FAIL}.
>>
>> +@cindex @code{vec_series@var{m}} instruction pattern
>> +@item @samp{vec_series@var{m}}
>> +Initialize vector output operand 0 so that element @var{i} is equal to
>> +operand 1 plus @var{i} times operand 2.  In other words, create a linear
>> +series whose base value is operand 1 and whose step is operand 2.
>> +
>> +The vector output has mode @var{m} and the scalar inputs have the mode
>> +appropriate for one element of @var{m}.  This pattern is not used for
>> +floating-point vectors, in order to avoid having to specify the
>> +rounding behavior for @var{i} > 1.
>> +
>> +This pattern is not allowed to @code{FAIL}.
>> +
>>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>>  @item @samp{vec_cmp@var{m}@var{n}}
>>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
>> Index: gcc/tree.def
>> ===================================================================
>> --- gcc/tree.def        2017-10-23 11:41:51.774917721 +0100
>> +++ gcc/tree.def        2017-10-23 11:42:34.924720660 +0100
>> @@ -308,6 +308,10 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
>>     VEC_DUPLICATE_CST_ELT.  */
>>  DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
>>
>> +/* Represents a vector constant in which element i is equal to
>> +   VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP.  */
>> +DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
>> +
>>  /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
>>  DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>>
>> @@ -541,6 +545,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
>>  /* Represents a vector in which every element is equal to operand 0.  */
>>  DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>>
>> +/* Vector series created from a start (base) value and a step.
>> +
>> +   A = VEC_SERIES_EXPR (B, C)
>> +
>> +   means
>> +
>> +   for (i = 0; i < N; i++)
>> +     A[i] = B + C * i;  */
>> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
>> +
>>  /* Vector conditional expression. It is like COND_EXPR, but with
>>     vector operands.
>>
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h  2017-10-23 11:41:51.775882341 +0100
>> +++ gcc/tree.h  2017-10-23 11:42:34.925720660 +0100
>> @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
>>  #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
>>    (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>>
>> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
>> -   this means there was an overflow in folding.  */
>> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
>> +   or VEC_SERES_CST, this means there was an overflow in folding.  */
>>
>>  #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>>
>> @@ -1034,6 +1034,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
>>  #define VEC_DUPLICATE_CST_ELT(NODE) \
>>    (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
>>
>> +/* In a VEC_SERIES_CST node.  */
>> +#define VEC_SERIES_CST_BASE(NODE) \
>> +  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
>> +#define VEC_SERIES_CST_STEP(NODE) \
>> +  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
>> +
>>  /* Define fields and accessors for some special-purpose tree nodes.  */
>>
>>  #define IDENTIFIER_LENGTH(NODE) \
>> @@ -4030,9 +4036,11 @@ extern tree build_int_cstu (tree type, u
>>  extern tree build_int_cst_type (tree, HOST_WIDE_INT);
>>  extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
>>  extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
>> +extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
>>  extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
>>  extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
>>  extern tree build_vector_from_val (tree, tree);
>> +extern tree build_vec_series (tree, tree, tree);
>>  extern void recompute_constructor_flags (tree);
>>  extern void verify_constructor_flags (tree);
>>  extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
>> Index: gcc/tree.c
>> ===================================================================
>> --- gcc/tree.c  2017-10-23 11:41:51.774917721 +0100
>> +++ gcc/tree.c  2017-10-23 11:42:34.924720660 +0100
>> @@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
>>      case COMPLEX_CST:          return TS_COMPLEX;
>>      case VECTOR_CST:           return TS_VECTOR;
>>      case VEC_DUPLICATE_CST:    return TS_VECTOR;
>> +    case VEC_SERIES_CST:       return TS_VECTOR;
>>      case STRING_CST:           return TS_STRING;
>>        /* tcc_exceptional cases.  */
>>      case ERROR_MARK:           return TS_COMMON;
>> @@ -818,6 +819,8 @@ tree_code_size (enum tree_code code)
>>         case COMPLEX_CST:       return sizeof (struct tree_complex);
>>         case VECTOR_CST:        return sizeof (struct tree_vector);
>>         case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
>> +       case VEC_SERIES_CST:
>> +         return sizeof (struct tree_vector) + sizeof (tree);
>>         case STRING_CST:        gcc_unreachable ();
>>         default:
>>           return lang_hooks.tree_size (code);
>> @@ -880,6 +883,9 @@ tree_size (const_tree node)
>>      case VEC_DUPLICATE_CST:
>>        return sizeof (struct tree_vector);
>>
>> +    case VEC_SERIES_CST:
>> +      return sizeof (struct tree_vector) + sizeof (tree);
>> +
>>      case STRING_CST:
>>        return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>>
>> @@ -1711,6 +1717,31 @@ build_vec_duplicate_cst (tree type, tree
>>    return t;
>>  }
>>
>> +/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
>> +
>> +   Note that this function is only suitable for callers that specifically
>> +   need a VEC_SERIES_CST node.  Use build_vec_series to build a general
>> +   series vector from a general base and step.  */
>> +
>> +tree
>> +build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
>> +{
>> +  int length = sizeof (struct tree_vector) + sizeof (tree);
>> +
>> +  record_node_allocation_statistics (VEC_SERIES_CST, length);
>> +
>> +  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
>> +
>> +  TREE_SET_CODE (t, VEC_SERIES_CST);
>> +  TREE_TYPE (t) = type;
>> +  t->base.u.nelts = 2;
>> +  VEC_SERIES_CST_BASE (t) = base;
>> +  VEC_SERIES_CST_STEP (t) = step;
>> +  TREE_CONSTANT (t) = 1;
>> +
>> +  return t;
>> +}
>> +
>>  /* Build a newly constructed VECTOR_CST node of length LEN.  */
>>
>>  tree
>> @@ -1821,6 +1852,19 @@ build_vector_from_val (tree vectype, tre
>>      }
>>  }
>>
>> +/* Build a vector series of type TYPE in which element I has the value
>> +   BASE + I * STEP.  */
>> +
>> +tree
>> +build_vec_series (tree type, tree base, tree step)
>> +{
>> +  if (integer_zerop (step))
>> +    return build_vector_from_val (type, base);
>> +  if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
>> +    return build_vec_series_cst (type, base, step);
>> +  return build2 (VEC_SERIES_EXPR, type, base, step);
>> +}
>> +
>>  /* Something has messed with the elements of CONSTRUCTOR C after it was built;
>>     calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
>>
>> @@ -7136,6 +7180,10 @@ add_expr (const_tree t, inchash::hash &h
>>      case VEC_DUPLICATE_CST:
>>        inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
>>        return;
>> +    case VEC_SERIES_CST:
>> +      inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
>> +      inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
>> +      return;
>>      case SSA_NAME:
>>        /* We can just compare by pointer.  */
>>        hstate.add_wide_int (SSA_NAME_VERSION (t));
>> @@ -11150,6 +11198,7 @@ #define WALK_SUBTREE_TAIL(NODE)                         \
>>      case FIXED_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>      case STRING_CST:
>>      case BLOCK:
>>      case PLACEHOLDER_EXPR:
>> @@ -12442,6 +12491,15 @@ drop_tree_overflow (tree t)
>>        if (TREE_OVERFLOW (*elt))
>>         *elt = drop_tree_overflow (*elt);
>>      }
>> +  if (TREE_CODE (t) == VEC_SERIES_CST)
>> +    {
>> +      tree *elt = &VEC_SERIES_CST_BASE (t);
>> +      if (TREE_OVERFLOW (*elt))
>> +       *elt = drop_tree_overflow (*elt);
>> +      elt = &VEC_SERIES_CST_STEP (t);
>> +      if (TREE_OVERFLOW (*elt))
>> +       *elt = drop_tree_overflow (*elt);
>> +    }
>>    return t;
>>  }
>>
>> Index: gcc/cfgexpand.c
>> ===================================================================
>> --- gcc/cfgexpand.c     2017-10-23 11:41:51.760448406 +0100
>> +++ gcc/cfgexpand.c     2017-10-23 11:42:34.909720660 +0100
>> @@ -5051,6 +5051,8 @@ expand_debug_expr (tree exp)
>>      case VEC_PERM_EXPR:
>>      case VEC_DUPLICATE_CST:
>>      case VEC_DUPLICATE_EXPR:
>> +    case VEC_SERIES_CST:
>> +    case VEC_SERIES_EXPR:
>>        return NULL;
>>
>>      /* Misc codes.  */
>> Index: gcc/tree-pretty-print.c
>> ===================================================================
>> --- gcc/tree-pretty-print.c     2017-10-23 11:41:51.772023858 +0100
>> +++ gcc/tree-pretty-print.c     2017-10-23 11:42:34.921720660 +0100
>> @@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
>>        pp_string (pp, ", ... }");
>>        break;
>>
>> +    case VEC_SERIES_CST:
>> +      pp_string (pp, "{ ");
>> +      dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
>> +      pp_string (pp, ", +, ");
>> +      dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
>> +      pp_string (pp, "}");
>> +      break;
>> +
>>      case FUNCTION_TYPE:
>>      case METHOD_TYPE:
>>        dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
>> @@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
>>        pp_string (pp, " > ");
>>        break;
>>
>> +    case VEC_SERIES_EXPR:
>>      case VEC_WIDEN_MULT_HI_EXPR:
>>      case VEC_WIDEN_MULT_LO_EXPR:
>>      case VEC_WIDEN_MULT_EVEN_EXPR:
>> Index: gcc/dwarf2out.c
>> ===================================================================
>> --- gcc/dwarf2out.c     2017-10-23 11:41:51.763342269 +0100
>> +++ gcc/dwarf2out.c     2017-10-23 11:42:34.913720660 +0100
>> @@ -18863,6 +18863,7 @@ rtl_for_decl_init (tree init, tree type)
>>           {
>>           case VECTOR_CST:
>>           case VEC_DUPLICATE_CST:
>> +         case VEC_SERIES_CST:
>>             break;
>>           case CONSTRUCTOR:
>>             if (TREE_CONSTANT (init))
>> Index: gcc/gimple-expr.h
>> ===================================================================
>> --- gcc/gimple-expr.h   2017-10-23 11:41:51.765271511 +0100
>> +++ gcc/gimple-expr.h   2017-10-23 11:42:34.916720660 +0100
>> @@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>      case STRING_CST:
>>        return true;
>>
>> Index: gcc/gimplify.c
>> ===================================================================
>> --- gcc/gimplify.c      2017-10-23 11:41:51.766236132 +0100
>> +++ gcc/gimplify.c      2017-10-23 11:42:34.917720660 +0100
>> @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>>         case COMPLEX_CST:
>>         case VECTOR_CST:
>>         case VEC_DUPLICATE_CST:
>> +       case VEC_SERIES_CST:
>>           /* Drop the overflow flag on constants, we do not want
>>              that in the GIMPLE IL.  */
>>           if (TREE_OVERFLOW_P (*expr_p))
>> Index: gcc/graphite-scop-detection.c
>> ===================================================================
>> --- gcc/graphite-scop-detection.c       2017-10-23 11:41:51.767200753 +0100
>> +++ gcc/graphite-scop-detection.c       2017-10-23 11:42:34.917720660 +0100
>> @@ -1244,6 +1244,7 @@ scan_tree_for_params (sese_info_p s, tre
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>        break;
>>
>>     default:
>> Index: gcc/ipa-icf-gimple.c
>> ===================================================================
>> --- gcc/ipa-icf-gimple.c        2017-10-23 11:41:51.767200753 +0100
>> +++ gcc/ipa-icf-gimple.c        2017-10-23 11:42:34.917720660 +0100
>> @@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>      case STRING_CST:
>>      case REAL_CST:
>>        {
>> @@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>      case STRING_CST:
>>      case REAL_CST:
>>      case FUNCTION_DECL:
>> Index: gcc/ipa-icf.c
>> ===================================================================
>> --- gcc/ipa-icf.c       2017-10-23 11:41:51.768165374 +0100
>> +++ gcc/ipa-icf.c       2017-10-23 11:42:34.918720660 +0100
>> @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>        inchash::add_expr (exp, hstate);
>>        break;
>>      case CONSTRUCTOR:
>> @@ -2034,6 +2035,11 @@ sem_variable::equals (tree t1, tree t2)
>>      case VEC_DUPLICATE_CST:
>>        return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
>>                                    VEC_DUPLICATE_CST_ELT (t2));
>> +     case VEC_SERIES_CST:
>> +       return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
>> +                                    VEC_SERIES_CST_BASE (t2))
>> +              && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
>> +                                       VEC_SERIES_CST_STEP (t2)));
>>      case ARRAY_REF:
>>      case ARRAY_RANGE_REF:
>>        {
>> Index: gcc/print-tree.c
>> ===================================================================
>> --- gcc/print-tree.c    2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/print-tree.c    2017-10-23 11:42:34.919720660 +0100
>> @@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
>>           print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
>>           break;
>>
>> +       case VEC_SERIES_CST:
>> +         print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
>> +         print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
>> +         break;
>> +
>>         case COMPLEX_CST:
>>           print_node (file, "real", TREE_REALPART (node), indent + 4);
>>           print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
>> Index: gcc/tree-ssa-loop.c
>> ===================================================================
>> --- gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
>> +++ gcc/tree-ssa-loop.c 2017-10-23 11:42:34.921720660 +0100
>> @@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
>>         case RESULT_DECL:
>>         case VECTOR_CST:
>>         case VEC_DUPLICATE_CST:
>> +       case VEC_SERIES_CST:
>>         case COMPLEX_CST:
>>         case INTEGER_CST:
>>         case REAL_CST:
>> Index: gcc/tree-ssa-pre.c
>> ===================================================================
>> --- gcc/tree-ssa-pre.c  2017-10-23 11:41:51.772023858 +0100
>> +++ gcc/tree-ssa-pre.c  2017-10-23 11:42:34.922720660 +0100
>> @@ -2676,6 +2676,7 @@ create_component_ref_by_pieces_1 (basic_
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>      case REAL_CST:
>>      case CONSTRUCTOR:
>>      case VAR_DECL:
>> Index: gcc/tree-ssa-sccvn.c
>> ===================================================================
>> --- gcc/tree-ssa-sccvn.c        2017-10-23 11:41:51.773953100 +0100
>> +++ gcc/tree-ssa-sccvn.c        2017-10-23 11:42:34.922720660 +0100
>> @@ -859,6 +859,7 @@ copy_reference_ops_from_ref (tree ref, v
>>         case COMPLEX_CST:
>>         case VECTOR_CST:
>>         case VEC_DUPLICATE_CST:
>> +       case VEC_SERIES_CST:
>>         case REAL_CST:
>>         case FIXED_CST:
>>         case CONSTRUCTOR:
>> @@ -1052,6 +1053,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
>>         case COMPLEX_CST:
>>         case VECTOR_CST:
>>         case VEC_DUPLICATE_CST:
>> +       case VEC_SERIES_CST:
>>         case REAL_CST:
>>         case CONSTRUCTOR:
>>         case CONST_DECL:
>> Index: gcc/varasm.c
>> ===================================================================
>> --- gcc/varasm.c        2017-10-23 11:41:51.775882341 +0100
>> +++ gcc/varasm.c        2017-10-23 11:42:34.927720660 +0100
>> @@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
>>        return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
>>               + const_hash_1 (TREE_OPERAND (exp, 1)));
>>
>> +    case VEC_SERIES_CST:
>> +      return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
>> +             + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
>> +
>>      CASE_CONVERT:
>>        return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>>
>> @@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
>>        return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
>>                                VEC_DUPLICATE_CST_ELT (t2));
>>
>> +    case VEC_SERIES_CST:
>> +      return (compare_constant (VEC_SERIES_CST_BASE (t1),
>> +                               VEC_SERIES_CST_BASE (t2))
>> +             && compare_constant (VEC_SERIES_CST_STEP (t1),
>> +                                  VEC_SERIES_CST_STEP (t2)));
>> +
>>      case CONSTRUCTOR:
>>        {
>>         vec<constructor_elt, va_gc> *v1, *v2;
>> Index: gcc/fold-const.c
>> ===================================================================
>> --- gcc/fold-const.c    2017-10-23 11:41:51.765271511 +0100
>> +++ gcc/fold-const.c    2017-10-23 11:42:34.916720660 +0100
>> @@ -421,6 +421,10 @@ negate_expr_p (tree t)
>>      case VEC_DUPLICATE_CST:
>>        return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
>>
>> +    case VEC_SERIES_CST:
>> +      return (negate_expr_p (VEC_SERIES_CST_BASE (t))
>> +             && negate_expr_p (VEC_SERIES_CST_STEP (t)));
>> +
>>      case COMPLEX_EXPR:
>>        return negate_expr_p (TREE_OPERAND (t, 0))
>>              && negate_expr_p (TREE_OPERAND (t, 1));
>> @@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
>>         return build_vector_from_val (type, sub);
>>        }
>>
>> +    case VEC_SERIES_CST:
>> +      {
>> +       tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
>> +       if (!neg_base)
>> +         return NULL_TREE;
>> +       tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
>> +       if (!neg_step)
>> +         return NULL_TREE;
>> +       return build_vec_series (type, neg_base, neg_step);
>> +      }
>> +
>>      case COMPLEX_EXPR:
>>        if (negate_expr_p (t))
>>         return fold_build2_loc (loc, COMPLEX_EXPR, type,
>> @@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
>>    return int_const_binop_1 (code, arg1, arg2, 1);
>>  }
>>
>> +/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
>> +   and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
>> +   The step will be zero for VEC_DUPLICATE_CST.  */
>> +
>> +static bool
>> +vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
>> +{
>> +  if (TREE_CODE (exp) == VEC_SERIES_CST)
>> +    {
>> +      *base_out = VEC_SERIES_CST_BASE (exp);
>> +      *step_out = VEC_SERIES_CST_STEP (exp);
>> +      return true;
>> +    }
>> +  if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
>> +    {
>> +      *base_out = VEC_DUPLICATE_CST_ELT (exp);
>> +      *step_out = build_zero_cst (TREE_TYPE (*base_out));
>> +      return true;
>> +    }
>> +  return false;
>> +}
>> +
>>  /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
>>     constant.  We assume ARG1 and ARG2 have the same data type, or at least
>>     are the same kind of constant and the same machine mode.  Return zero if
>> @@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
>>        return build_vector_from_val (TREE_TYPE (arg1), sub);
>>      }
>>
>> +  tree base1, step1, base2, step2;
>> +  if ((code == PLUS_EXPR || code == MINUS_EXPR)
>> +      && vec_series_equivalent_p (arg1, &base1, &step1)
>> +      && vec_series_equivalent_p (arg2, &base2, &step2))
>> +    {
>> +      tree new_base = const_binop (code, base1, base2);
>> +      if (!new_base)
>> +       return NULL_TREE;
>> +      tree new_step = const_binop (code, step1, step2);
>> +      if (!new_step)
>> +       return NULL_TREE;
>> +      return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
>> +    }
>> +
>>    /* Shifts allow a scalar offset for a vector.  */
>>    if (TREE_CODE (arg1) == VECTOR_CST
>>        && TREE_CODE (arg2) == INTEGER_CST)
>> @@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
>>       result as argument put those cases that need it here.  */
>>    switch (code)
>>      {
>> +    case VEC_SERIES_EXPR:
>> +      if (CONSTANT_CLASS_P (arg1)
>> +         && CONSTANT_CLASS_P (arg2))
>> +       return build_vec_series (type, arg1, arg2);
>> +      return NULL_TREE;
>> +
>>      case COMPLEX_EXPR:
>>        if ((TREE_CODE (arg1) == REAL_CST
>>            && TREE_CODE (arg2) == REAL_CST)
>> @@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
>>         return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
>>                                 VEC_DUPLICATE_CST_ELT (arg1), flags);
>>
>> +      case VEC_SERIES_CST:
>> +       return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
>> +                                VEC_SERIES_CST_BASE (arg1), flags)
>> +               && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
>> +                                   VEC_SERIES_CST_STEP (arg1), flags));
>> +
>>        case COMPLEX_CST:
>>         return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
>>                                  flags)
>> @@ -12050,6 +12113,10 @@ fold_checksum_tree (const_tree expr, str
>>         case VEC_DUPLICATE_CST:
>>           fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
>>           break;
>> +       case VEC_SERIES_CST:
>> +         fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
>> +         fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
>> +         break;
>>         default:
>>           break;
>>         }
>> Index: gcc/expmed.c
>> ===================================================================
>> --- gcc/expmed.c        2017-10-23 11:41:39.186050437 +0100
>> +++ gcc/expmed.c        2017-10-23 11:42:34.914720660 +0100
>> @@ -5253,6 +5253,13 @@ make_tree (tree type, rtx x)
>>             tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
>>             return build_vector_from_val (type, elt_tree);
>>           }
>> +       if (GET_CODE (op) == VEC_SERIES)
>> +         {
>> +           tree itype = TREE_TYPE (type);
>> +           tree base_tree = make_tree (itype, XEXP (op, 0));
>> +           tree step_tree = make_tree (itype, XEXP (op, 1));
>> +           return build_vec_series (type, base_tree, step_tree);
>> +         }
>>         return make_tree (type, op);
>>        }
>>
>> Index: gcc/gimple-pretty-print.c
>> ===================================================================
>> --- gcc/gimple-pretty-print.c   2017-10-23 11:41:25.500318672 +0100
>> +++ gcc/gimple-pretty-print.c   2017-10-23 11:42:34.916720660 +0100
>> @@ -438,6 +438,7 @@ dump_binary_rhs (pretty_printer *buffer,
>>      case VEC_PACK_FIX_TRUNC_EXPR:
>>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>> +    case VEC_SERIES_EXPR:
>>        for (p = get_tree_code_name (code); *p; p++)
>>         pp_character (buffer, TOUPPER (*p));
>>        pp_string (buffer, " <");
>> Index: gcc/tree-inline.c
>> ===================================================================
>> --- gcc/tree-inline.c   2017-10-23 11:41:51.771059237 +0100
>> +++ gcc/tree-inline.c   2017-10-23 11:42:34.921720660 +0100
>> @@ -4003,6 +4003,7 @@ estimate_operator_cost (enum tree_code c
>>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>>      case VEC_DUPLICATE_EXPR:
>> +    case VEC_SERIES_EXPR:
>>
>>        return 1;
>>
>> Index: gcc/expr.c
>> ===================================================================
>> --- gcc/expr.c  2017-10-23 11:41:51.764306890 +0100
>> +++ gcc/expr.c  2017-10-23 11:42:34.915720660 +0100
>> @@ -7704,7 +7704,7 @@ expand_operands (tree exp0, tree exp1, r
>>
>>
>>  /* Expand constant vector element ELT, which has mode MODE.  This is used
>> -   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
>> +   for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST.  */
>>
>>  static rtx
>>  const_vector_element (scalar_mode mode, const_tree elt)
>> @@ -9587,6 +9587,10 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>>        gcc_assert (target);
>>        return target;
>>
>> +    case VEC_SERIES_EXPR:
>> +      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
>> +      return expand_vec_series_expr (mode, op0, op1, target);
>> +
>>      case BIT_INSERT_EXPR:
>>        {
>>         unsigned bitpos = tree_to_uhwi (treeop2);
>> @@ -10044,6 +10048,13 @@ expand_expr_real_1 (tree exp, rtx target
>>                                   VEC_DUPLICATE_CST_ELT (exp));
>>        return gen_const_vec_duplicate (mode, op0);
>>
>> +    case VEC_SERIES_CST:
>> +      op0 = const_vector_element (GET_MODE_INNER (mode),
>> +                                 VEC_SERIES_CST_BASE (exp));
>> +      op1 = const_vector_element (GET_MODE_INNER (mode),
>> +                                 VEC_SERIES_CST_STEP (exp));
>> +      return gen_const_vec_series (mode, op0, op1);
>> +
>>      case CONST_DECL:
>>        if (modifier == EXPAND_WRITE)
>>         {
>> Index: gcc/optabs.def
>> ===================================================================
>> --- gcc/optabs.def      2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/optabs.def      2017-10-23 11:42:34.919720660 +0100
>> @@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
>>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>>
>>  OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
>> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>> Index: gcc/optabs.h
>> ===================================================================
>> --- gcc/optabs.h        2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/optabs.h        2017-10-23 11:42:34.919720660 +0100
>> @@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
>>  /* Generate code for VEC_COND_EXPR.  */
>>  extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>>
>> +/* Generate code for VEC_SERIES_EXPR.  */
>> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
>> +
>>  /* Generate code for MULT_HIGHPART_EXPR.  */
>>  extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>>
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c        2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/optabs.c        2017-10-23 11:42:34.919720660 +0100
>> @@ -5693,6 +5693,27 @@ expand_vec_cond_expr (tree vec_cond_type
>>    return ops[0].value;
>>  }
>>
>> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
>> +   Use TARGET for the result if nonnull and convenient.  */
>> +
>> +rtx
>> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
>> +{
>> +  struct expand_operand ops[3];
>> +  enum insn_code icode;
>> +  machine_mode emode = GET_MODE_INNER (vmode);
>> +
>> +  icode = direct_optab_handler (vec_series_optab, vmode);
>> +  gcc_assert (icode != CODE_FOR_nothing);
>> +
>> +  create_output_operand (&ops[0], target, vmode);
>> +  create_input_operand (&ops[1], op0, emode);
>> +  create_input_operand (&ops[2], op1, emode);
>> +
>> +  expand_insn (icode, 3, ops);
>> +  return ops[0].value;
>> +}
>> +
>>  /* Generate insns for a vector comparison into a mask.  */
>>
>>  rtx
>> Index: gcc/optabs-tree.c
>> ===================================================================
>> --- gcc/optabs-tree.c   2017-10-23 11:41:51.768165374 +0100
>> +++ gcc/optabs-tree.c   2017-10-23 11:42:34.918720660 +0100
>> @@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
>>      case VEC_DUPLICATE_EXPR:
>>        return vec_duplicate_optab;
>>
>> +    case VEC_SERIES_EXPR:
>> +      return vec_series_optab;
>> +
>>      default:
>>        break;
>>      }
>> Index: gcc/tree-cfg.c
>> ===================================================================
>> --- gcc/tree-cfg.c      2017-10-23 11:41:51.770094616 +0100
>> +++ gcc/tree-cfg.c      2017-10-23 11:42:34.920720660 +0100
>> @@ -4119,6 +4119,23 @@ verify_gimple_assign_binary (gassign *st
>>        /* Continue with generic binary expression handling.  */
>>        break;
>>
>> +    case VEC_SERIES_EXPR:
>> +      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
>> +       {
>> +         error ("type mismatch in series expression");
>> +         debug_generic_expr (rhs1_type);
>> +         debug_generic_expr (rhs2_type);
>> +         return true;
>> +       }
>> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
>> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
>> +       {
>> +         error ("vector type expected in series expression");
>> +         debug_generic_expr (lhs_type);
>> +         return true;
>> +       }
>> +      return false;
>> +
>>      default:
>>        gcc_unreachable ();
>>      }
>> @@ -4485,6 +4502,7 @@ verify_gimple_assign_single (gassign *st
>>      case COMPLEX_CST:
>>      case VECTOR_CST:
>>      case VEC_DUPLICATE_CST:
>> +    case VEC_SERIES_CST:
>>      case STRING_CST:
>>        return res;
>>
>> Index: gcc/tree-vect-generic.c
>> ===================================================================
>> --- gcc/tree-vect-generic.c     2017-10-23 11:41:51.773953100 +0100
>> +++ gcc/tree-vect-generic.c     2017-10-23 11:42:34.922720660 +0100
>> @@ -1595,7 +1595,8 @@ expand_vector_operations_1 (gimple_stmt_
>>    if (rhs_class == GIMPLE_BINARY_RHS)
>>      rhs2 = gimple_assign_rhs2 (stmt);
>>
>> -  if (TREE_CODE (type) != VECTOR_TYPE)
>> +  if (!VECTOR_TYPE_P (type)
>> +      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
>>      return;
>>
>>    /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 12:18     ` Richard Sandiford
@ 2017-10-26 12:46       ` Richard Biener
  2017-10-26 19:42         ` Eric Botcazou
                           ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:46 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds a POD version of fixed_size_mode.  The only current use
>>> is for storing the __builtin_apply and __builtin_result register modes,
>>> which were made fixed_size_modes by the previous patch.
>>
>> Bah - can we update our host compiler to C++11/14 please ...?
>> (maybe requiring that build with GCC 4.8 as host compiler works,
>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>
> That'd be great :-)  It would avoid all the poly_int_pod stuff too,
> and allow some clean-up of wide-int.h.

Can you figure what oldest GCC release supports the C++11/14 POD handling
that would be required?

Richard.

> Thanks for the reviews,
> Richard
>
>
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>>             Alan Hayward  <alan.hayward@arm.com>
>>>             David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>>         * coretypes.h (fixed_size_mode): Declare.
>>>         (fixed_size_mode_pod): New typedef.
>>>         * builtins.h (target_builtins::x_apply_args_mode)
>>>         (target_builtins::x_apply_result_mode): Change type to
>>>         fixed_size_mode_pod.
>>>         * builtins.c (apply_args_size, apply_result_size, result_vector)
>>>         (expand_builtin_apply_args_1, expand_builtin_apply)
>>>         (expand_builtin_return): Update accordingly.
>>>
>>> Index: gcc/coretypes.h
>>> ===================================================================
>>> --- gcc/coretypes.h     2017-09-11 17:10:58.656085547 +0100
>>> +++ gcc/coretypes.h     2017-10-23 11:42:57.592545063 +0100
>>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>>  class scalar_int_mode;
>>>  class scalar_float_mode;
>>>  class complex_mode;
>>> +class fixed_size_mode;
>>>  template<typename> class opt_mode;
>>>  typedef opt_mode<scalar_mode> opt_scalar_mode;
>>>  typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
>>> @@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
>>>  template<typename> class pod_mode;
>>>  typedef pod_mode<scalar_mode> scalar_mode_pod;
>>>  typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
>>> +typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
>>>
>>>  /* Subclasses of rtx_def, using indentation to show the class
>>>     hierarchy, along with the relevant invariant.
>>> Index: gcc/builtins.h
>>> ===================================================================
>>> --- gcc/builtins.h      2017-08-30 12:18:46.602740973 +0100
>>> +++ gcc/builtins.h      2017-10-23 11:42:57.592545063 +0100
>>> @@ -29,14 +29,14 @@ struct target_builtins {
>>>       the register is not used for calling a function.  If the machine
>>>       has register windows, this gives only the outbound registers.
>>>       INCOMING_REGNO gives the corresponding inbound register.  */
>>> -  machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>> +  fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>>
>>>    /* For each register that may be used for returning values, this gives
>>>       a mode used to copy the register's value.  VOIDmode indicates the
>>>       register is not used for returning values.  If the machine has
>>>       register windows, this gives only the outbound registers.
>>>       INCOMING_REGNO gives the corresponding inbound register.  */
>>> -  machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>> +  fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>>  };
>>>
>>>  extern struct target_builtins default_target_builtins;
>>> Index: gcc/builtins.c
>>> ===================================================================
>>> --- gcc/builtins.c      2017-10-23 11:41:23.140260335 +0100
>>> +++ gcc/builtins.c      2017-10-23 11:42:57.592545063 +0100
>>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>>    static int size = -1;
>>>    int align;
>>>    unsigned int regno;
>>> -  machine_mode mode;
>>>
>>>    /* The values computed by this function never change.  */
>>>    if (size < 0)
>>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>>        for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>         if (FUNCTION_ARG_REGNO_P (regno))
>>>           {
>>> -           mode = targetm.calls.get_raw_arg_mode (regno);
>>> +           fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>>
>>>             gcc_assert (mode != VOIDmode);
>>>
>>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>>           }
>>>         else
>>>           {
>>> -           apply_args_mode[regno] = VOIDmode;
>>> +           apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>>           }
>>>      }
>>>    return size;
>>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>>  {
>>>    static int size = -1;
>>>    int align, regno;
>>> -  machine_mode mode;
>>>
>>>    /* The values computed by this function never change.  */
>>>    if (size < 0)
>>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>>>        for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>>         if (targetm.calls.function_value_regno_p (regno))
>>>           {
>>> -           mode = targetm.calls.get_raw_result_mode (regno);
>>> +           fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>>>
>>>             gcc_assert (mode != VOIDmode);
>>>
>>> @@ -1421,7 +1419,7 @@ apply_result_size (void)
>>>             apply_result_mode[regno] = mode;
>>>           }
>>>         else
>>> -         apply_result_mode[regno] = VOIDmode;
>>> +         apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>>
>>>        /* Allow targets that use untyped_call and untyped_return to override
>>>          the size so that machine-specific information can be stored here.  */
>>> @@ -1440,7 +1438,7 @@ apply_result_size (void)
>>>  result_vector (int savep, rtx result)
>>>  {
>>>    int regno, size, align, nelts;
>>> -  machine_mode mode;
>>> +  fixed_size_mode mode;
>>>    rtx reg, mem;
>>>    rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
>>>
>>> @@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
>>>  {
>>>    rtx registers, tem;
>>>    int size, align, regno;
>>> -  machine_mode mode;
>>> +  fixed_size_mode mode;
>>>    rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
>>>
>>>    /* Create a block where the arg-pointer, structure value address,
>>> @@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
>>>  expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
>>>  {
>>>    int size, align, regno;
>>> -  machine_mode mode;
>>> +  fixed_size_mode mode;
>>>    rtx incoming_args, result, reg, dest, src;
>>>    rtx_call_insn *call_insn;
>>>    rtx old_stack_level = 0;
>>> @@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
>>>  expand_builtin_return (rtx result)
>>>  {
>>>    int size, align, regno;
>>> -  machine_mode mode;
>>> +  fixed_size_mode mode;
>>>    rtx reg;
>>>    rtx_insn *call_fusage = 0;
>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 12:46       ` Richard Biener
@ 2017-10-26 19:42         ` Eric Botcazou
  2017-10-27  8:34           ` Richard Biener
  2017-10-30  3:14           ` Trevor Saunders
  2017-10-26 19:44         ` Richard Sandiford
  2017-10-26 19:45         ` Jakub Jelinek
  2 siblings, 2 replies; 90+ messages in thread
From: Eric Botcazou @ 2017-10-26 19:42 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford

> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?

GCC needs to be buildable by other compilers than itself though.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 12:46       ` Richard Biener
  2017-10-26 19:42         ` Eric Botcazou
@ 2017-10-26 19:44         ` Richard Sandiford
  2017-10-26 19:45         ` Jakub Jelinek
  2 siblings, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-26 19:44 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch adds a POD version of fixed_size_mode.  The only current use
>>>> is for storing the __builtin_apply and __builtin_result register modes,
>>>> which were made fixed_size_modes by the previous patch.
>>>
>>> Bah - can we update our host compiler to C++11/14 please ...?
>>> (maybe requiring that build with GCC 4.8 as host compiler works,
>>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>>
>> That'd be great :-)  It would avoid all the poly_int_pod stuff too,
>> and allow some clean-up of wide-int.h.
>
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?

Looks like GCC 4.7, which was also the first to support -std=c++11
as an option.  I could bootstrap with that (after s/std=gnu..98/std=c++11/
in configure) without all the POD types.  It also supports "= default"
and template using, which would get rid of some wide-int.h ugliness.

Being able to construct poly_int::coeffs directly might also allow
some optimisations, and should help avoid the POLY_SET_COEFF bug that
Martin found, but I haven't looked at that yet.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 12:46       ` Richard Biener
  2017-10-26 19:42         ` Eric Botcazou
  2017-10-26 19:44         ` Richard Sandiford
@ 2017-10-26 19:45         ` Jakub Jelinek
  2017-10-27  8:43           ` Richard Biener
  2 siblings, 1 reply; 90+ messages in thread
From: Jakub Jelinek @ 2017-10-26 19:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford

On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
> > Richard Biener <richard.guenther@gmail.com> writes:
> >> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >>> This patch adds a POD version of fixed_size_mode.  The only current use
> >>> is for storing the __builtin_apply and __builtin_result register modes,
> >>> which were made fixed_size_modes by the previous patch.
> >>
> >> Bah - can we update our host compiler to C++11/14 please ...?
> >> (maybe requiring that build with GCC 4.8 as host compiler works,
> >> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
> >
> > That'd be great :-)  It would avoid all the poly_int_pod stuff too,
> > and allow some clean-up of wide-int.h.
> 
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?

I think it is too early for that, we aren't LLVM or Rust that don't really
care about what build requirements they impose on users.

	Jakub

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 19:42         ` Eric Botcazou
@ 2017-10-27  8:34           ` Richard Biener
  2017-10-27  9:28             ` Eric Botcazou
  2017-10-30  3:14           ` Trevor Saunders
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-27  8:34 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: GCC Patches, Richard Sandiford

On Thu, Oct 26, 2017 at 9:37 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>> that would be required?
>
> GCC needs to be buildable by other compilers than itself though.

There's always the possibility of building GCC 4.8 with the other compiler and
then GCC 9+ (?) with GCC 4.8.

What's the list of other compilers people routinely use?  I see various comments
on other compilers in install.texi but those are already saying those cannot be
used to build GCC but you need to build an older GCC first (like xlc or the HP
compiler).

Richard.

> --
> Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 19:45         ` Jakub Jelinek
@ 2017-10-27  8:43           ` Richard Biener
  2017-10-27  8:45             ` Jakub Jelinek
                               ` (2 more replies)
  0 siblings, 3 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-27  8:43 UTC (permalink / raw)
  To: Jakub Jelinek; +Cc: GCC Patches, Richard Sandiford

On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>> > Richard Biener <richard.guenther@gmail.com> writes:
>> >> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>> >> <richard.sandiford@linaro.org> wrote:
>> >>> This patch adds a POD version of fixed_size_mode.  The only current use
>> >>> is for storing the __builtin_apply and __builtin_result register modes,
>> >>> which were made fixed_size_modes by the previous patch.
>> >>
>> >> Bah - can we update our host compiler to C++11/14 please ...?
>> >> (maybe requiring that build with GCC 4.8 as host compiler works,
>> >> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>> >
>> > That'd be great :-)  It would avoid all the poly_int_pod stuff too,
>> > and allow some clean-up of wide-int.h.
>>
>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>> that would be required?
>
> I think it is too early for that, we aren't LLVM or Rust that don't really
> care about what build requirements they impose on users.

That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
would be a blocker given that's the system compiler on our latest server
(and "stable" OSS) product.

I guess it depends on the amount of pain we have going forward with C++
use in GCC.  Given that gdb already requires C++11 people building
GCC are likely already experiencing the "issue".

Richard.

>         Jakub

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-27  8:43           ` Richard Biener
@ 2017-10-27  8:45             ` Jakub Jelinek
  2017-10-27 10:19             ` Pedro Alves
  2017-10-27 15:23             ` Jeff Law
  2 siblings, 0 replies; 90+ messages in thread
From: Jakub Jelinek @ 2017-10-27  8:45 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches, Richard Sandiford

On Fri, Oct 27, 2017 at 10:35:56AM +0200, Richard Biener wrote:
> > I think it is too early for that, we aren't LLVM or Rust that don't really
> > care about what build requirements they impose on users.
> 
> That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
> 
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC.  Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".

Well, they can always start by building a new GCC and then build GDB with
it.  If they'd need to build an intermediate, already unsupported, GCC in
between as well, it might be bigger pain.
GCC 4.8 as system compiler certainly needs to be supported, it is still
heavily used in the wild, but I'd say even e.g. GCC 4.4 or 4.3 isn't
something that can be ignored.  And there are also non-GCC system compilers
we need to cope with.

	Jakub

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-27  8:34           ` Richard Biener
@ 2017-10-27  9:28             ` Eric Botcazou
  0 siblings, 0 replies; 90+ messages in thread
From: Eric Botcazou @ 2017-10-27  9:28 UTC (permalink / raw)
  To: Richard Biener; +Cc: gcc-patches, Richard Sandiford

> There's always the possibility of building GCC 4.8 with the other compiler
> and then GCC 9+ (?) with GCC 4.8.

What an user-friendly solution...

> What's the list of other compilers people routinely use?  I see various
> comments on other compilers in install.texi but those are already saying
> those cannot be used to build GCC but you need to build an older GCC first
> (like xlc or the HP compiler).

I read the opposite for XLC:

"GCC can bootstrap with recent versions of IBM XLC, but bootstrapping with an 
earlier release of GCC is recommended."

I think that the major supported compilers are IBM, Sun/Oracle and LLVM.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-27  8:43           ` Richard Biener
  2017-10-27  8:45             ` Jakub Jelinek
@ 2017-10-27 10:19             ` Pedro Alves
  2017-10-27 15:23             ` Jeff Law
  2 siblings, 0 replies; 90+ messages in thread
From: Pedro Alves @ 2017-10-27 10:19 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: GCC Patches, Richard Sandiford

On 10/27/2017 09:35 AM, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:

>>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>>> that would be required?
>>
>> I think it is too early for that, we aren't LLVM or Rust that don't really
>> care about what build requirements they impose on users.
> 
> That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
> 
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC.  Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".

Right, GDB's baseline is GCC 4.8 too.  When GDB was deciding whether
to start requiring full C++11 (about a year ago), we looked at the
latest stable release of all the "big" distros to see whether:

#1 - the system compiler was new enough (gcc >= 4.8), or failing
     that,
#2 - whether there's an easy to install package providing a
     new-enough compiler.

and it turns out that that was true for all.  Meanwhile another year
has passed and there have been no complaints.

Thanks,
Pedro Alves

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-27  8:43           ` Richard Biener
  2017-10-27  8:45             ` Jakub Jelinek
  2017-10-27 10:19             ` Pedro Alves
@ 2017-10-27 15:23             ` Jeff Law
  2 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-27 15:23 UTC (permalink / raw)
  To: Richard Biener, Jakub Jelinek; +Cc: GCC Patches, Richard Sandiford

On 10/27/2017 02:35 AM, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>>> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch adds a POD version of fixed_size_mode.  The only current use
>>>>>> is for storing the __builtin_apply and __builtin_result register modes,
>>>>>> which were made fixed_size_modes by the previous patch.
>>>>>
>>>>> Bah - can we update our host compiler to C++11/14 please ...?
>>>>> (maybe requiring that build with GCC 4.8 as host compiler works,
>>>>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>>>>
>>>> That'd be great :-)  It would avoid all the poly_int_pod stuff too,
>>>> and allow some clean-up of wide-int.h.
>>>
>>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>>> that would be required?
>>
>> I think it is too early for that, we aren't LLVM or Rust that don't really
>> care about what build requirements they impose on users.
> 
> That's true, which is why I asked.  For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
> 
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC.  Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".
It's always going to be a balancing act.  Clearly we don't want to go to
something like the Rust  model.  But we also don't want to limit
ourselves to such old tools that we end up hacking around compiler bugs
or avoiding features that can make the codebase easier to maintain and
improve or end up depending on dusty corners of C++98/C++03
implementations that nobody else uses/tests anymore because they've
moved on to C++11.


To be more concrete, if I had to put a stake in the ground.  I'd want to
pick a semi- recent version of Sun, IBM and Clang/LLVM as well as GCC.
Ideally it'd be something that supports C++11 as a language, even if the
runtime isn't fully compliant.   I suspect anything older than GCC 4.8
wouldn't have enough C++11 and anything newer to not work well for the
distros (Red Hat included).

Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [07/nn] Add unique CONSTs
  2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
@ 2017-10-27 15:51   ` Jeff Law
  2017-10-27 15:58     ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-27 15:51 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:21 AM, Richard Sandiford wrote:
> This patch adds a way of treating certain kinds of CONST as unique,
> so that pointer equality is equivalent to value equality.  For now it
> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
> generate them remains in the else arm of an "if (1)" until a later
> patch.
> 
> This is needed so that (const (vec_duplicate xx)) can used as the
> CONSTxx_RTX of a variable-length vector.
You're brave :-)  I know we looked at making CONST_INTs behave in this
manner eons ago in an effort to reduce memory consumption and it was
just plain painful.   There may still be comments from that project
littering the source code.

I do wonder if we might want to revisit this again as we have better
infrastructure in place.


> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (unique_const_p): New function.
> 	(gen_rtx_CONST): Declare.
> 	* emit-rtl.c (const_hasher): New struct.
> 	(const_htab): New variable.
> 	(init_emit_once): Initialize it.
> 	(const_hasher::hash, const_hasher::equal): New functions.
> 	(gen_rtx_CONST): New function.
> 	(spare_vec_duplicate, spare_vec_series): New variables.
> 	(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
> 	but disable it for now.
> 	(gen_const_vec_series): Likewise (const (vec_series)).
> 	* gengenrtl.c (special_rtx): Return true for CONST.
> 	* rtl.c (shared_const_p): Return true if unique_const_p.
ISTM that you need an update the rtl.texi's structure sharing
assumptions section to describe the new rules around CONSTs.

So what's the purpose of the sparc_vec_* stuff that you're going to use
in the future?  It looks like a single element cache to me.    Am I
missing something?

jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [07/nn] Add unique CONSTs
  2017-10-27 15:51   ` Jeff Law
@ 2017-10-27 15:58     ` Richard Sandiford
  2017-10-30 14:49       ` Jeff Law
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-27 15:58 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 05:21 AM, Richard Sandiford wrote:
>> This patch adds a way of treating certain kinds of CONST as unique,
>> so that pointer equality is equivalent to value equality.  For now it
>> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
>> generate them remains in the else arm of an "if (1)" until a later
>> patch.
>> 
>> This is needed so that (const (vec_duplicate xx)) can used as the
>> CONSTxx_RTX of a variable-length vector.
> You're brave :-)  I know we looked at making CONST_INTs behave in this
> manner eons ago in an effort to reduce memory consumption and it was
> just plain painful.   There may still be comments from that project
> littering the source code.
>
> I do wonder if we might want to revisit this again as we have better
> infrastructure in place.

For vectors it isn't so bad, since we already do the same thing
for CONST_VECTOR.  Fortunately CONST_VECTOR and CONST always have
a mode, so there's no awkward sharing between modes...

>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* rtl.h (unique_const_p): New function.
>> 	(gen_rtx_CONST): Declare.
>> 	* emit-rtl.c (const_hasher): New struct.
>> 	(const_htab): New variable.
>> 	(init_emit_once): Initialize it.
>> 	(const_hasher::hash, const_hasher::equal): New functions.
>> 	(gen_rtx_CONST): New function.
>> 	(spare_vec_duplicate, spare_vec_series): New variables.
>> 	(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
>> 	but disable it for now.
>> 	(gen_const_vec_series): Likewise (const (vec_series)).
>> 	* gengenrtl.c (special_rtx): Return true for CONST.
>> 	* rtl.c (shared_const_p): Return true if unique_const_p.
> ISTM that you need an update the rtl.texi's structure sharing
> assumptions section to describe the new rules around CONSTs.

Oops, yeah.  How about the attached?

> So what's the purpose of the sparc_vec_* stuff that you're going to use
> in the future?  It looks like a single element cache to me.    Am I
> missing something?

No, that's right.  When looking up the const for (vec_duplicate x), say,
it's easier to create the vec_duplicate rtx first.  But if the lookup
succeeds (and so we already have an rtx with that value), we keep the
discarded vec_duplicate around so that we can reuse it for the next
lookup.

Thanks for the reviews,

Richard


2017-10-27  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/rtl.texi: Document rtl sharing rules.
	* rtl.h (unique_const_p): New function.
	(gen_rtx_CONST): Declare.
	* emit-rtl.c (const_hasher): New struct.
	(const_htab): New variable.
	(init_emit_once): Initialize it.
	(const_hasher::hash, const_hasher::equal): New functions.
	(gen_rtx_CONST): New function.
	(spare_vec_duplicate, spare_vec_series): New variables.
	(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
	but disable it for now.
	(gen_const_vec_series): Likewise (const (vec_series)).
	* gengenrtl.c (special_rtx): Return true for CONST.
	* rtl.c (shared_const_p): Return true if unique_const_p.

Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi	2017-10-27 16:48:35.827706696 +0100
+++ gcc/doc/rtl.texi	2017-10-27 16:48:37.617270148 +0100
@@ -4197,6 +4197,20 @@ There is only one @code{pc} expression.
 @item
 There is only one @code{cc0} expression.
 
+@cindex @code{const}, RTL sharing
+@item
+There is only one instance of the following structures for a given
+@var{m}, @var{x} and @var{y}:
+@example
+(const:@var{m} (vec_duplicate:@var{m} @var{x}))
+(const:@var{m} (vec_series:@var{m} @var{x} @var{y}))
+@end example
+This means, for example, that for a given @var{n} there is only ever a
+single instance of an expression like:
+@example
+(const:V@var{n}DI (vec_duplicate:V@var{n}DI (const_int 0)))
+@end example
+
 @cindex @code{const_double}, RTL sharing
 @item
 There is only one @code{const_double} expression with value 0 for
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h	2017-10-27 16:48:37.433286940 +0100
+++ gcc/rtl.h	2017-10-27 16:48:37.619280894 +0100
@@ -2861,6 +2861,23 @@ vec_series_p (const_rtx x, rtx *base_out
   return const_vec_series_p (x, base_out, step_out);
 }
 
+/* Return true if there should only ever be one instance of (const X),
+   so that constants of this type can be compared using pointer equality.  */
+
+inline bool
+unique_const_p (const_rtx x)
+{
+  switch (GET_CODE (x))
+    {
+    case VEC_DUPLICATE:
+    case VEC_SERIES:
+      return true;
+
+    default:
+      return false;
+    }
+}
+
 /* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X.  */
 
 inline scalar_int_mode
@@ -3560,6 +3577,7 @@ extern rtx_insn_list *gen_rtx_INSN_LIST
 gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
 	      basic_block bb, rtx pattern, int location, int code,
 	      rtx reg_notes);
+extern rtx gen_rtx_CONST (machine_mode, rtx);
 extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
 extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
 extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-10-27 16:48:37.433286940 +0100
+++ gcc/emit-rtl.c	2017-10-27 16:48:37.618275521 +0100
@@ -175,6 +175,15 @@ struct const_fixed_hasher : ggc_cache_pt
 
 static GTY ((cache)) hash_table<const_fixed_hasher> *const_fixed_htab;
 
+/* A hash table storing unique CONSTs.  */
+struct const_hasher : ggc_cache_ptr_hash<rtx_def>
+{
+  static hashval_t hash (rtx x);
+  static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table<const_hasher> *const_htab;
+
 #define cur_insn_uid (crtl->emit.x_cur_insn_uid)
 #define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
 #define first_label_num (crtl->emit.x_first_label_num)
@@ -310,6 +319,28 @@ const_fixed_hasher::equal (rtx x, rtx y)
   return fixed_identical (CONST_FIXED_VALUE (a), CONST_FIXED_VALUE (b));
 }
 
+/* Returns a hash code for X (which is either an existing unique CONST
+   or an operand to gen_rtx_CONST).  */
+
+hashval_t
+const_hasher::hash (rtx x)
+{
+  if (GET_CODE (x) == CONST)
+    x = XEXP (x, 0);
+
+  int do_not_record_p = 0;
+  return hash_rtx (x, GET_MODE (x), &do_not_record_p, NULL, false);
+}
+
+/* Returns true if the operand of unique CONST X is equal to Y.  */
+
+bool
+const_hasher::equal (rtx x, rtx y)
+{
+  gcc_checking_assert (GET_CODE (x) == CONST);
+  return rtx_equal_p (XEXP (x, 0), y);
+}
+
 /* Return true if the given memory attributes are equal.  */
 
 bool
@@ -5772,16 +5803,55 @@ init_emit (void)
 #endif
 }
 
+rtx
+gen_rtx_CONST (machine_mode mode, rtx val)
+{
+  if (unique_const_p (val))
+    {
+      /* Look up the CONST in the hash table.  */
+      rtx *slot = const_htab->find_slot (val, INSERT);
+      if (*slot == 0)
+	*slot = gen_rtx_raw_CONST (mode, val);
+      return *slot;
+    }
+
+  return gen_rtx_raw_CONST (mode, val);
+}
+
+/* Temporary rtx used by gen_const_vec_duplicate_1.  */
+static GTY((deletable)) rtx spare_vec_duplicate;
+
 /* Like gen_const_vec_duplicate, but ignore const_tiny_rtx.  */
 
 static rtx
 gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
 {
   int nunits = GET_MODE_NUNITS (mode);
-  rtvec v = rtvec_alloc (nunits);
-  for (int i = 0; i < nunits; ++i)
-    RTVEC_ELT (v, i) = el;
-  return gen_rtx_raw_CONST_VECTOR (mode, v);
+  if (1)
+    {
+      rtvec v = rtvec_alloc (nunits);
+
+      for (int i = 0; i < nunits; ++i)
+	RTVEC_ELT (v, i) = el;
+
+      return gen_rtx_raw_CONST_VECTOR (mode, v);
+    }
+  else
+    {
+      if (spare_vec_duplicate)
+	{
+	  PUT_MODE (spare_vec_duplicate, mode);
+	  XEXP (spare_vec_duplicate, 0) = el;
+	}
+      else
+	spare_vec_duplicate = gen_rtx_VEC_DUPLICATE (mode, el);
+
+      rtx res = gen_rtx_CONST (mode, spare_vec_duplicate);
+      if (XEXP (res, 0) == spare_vec_duplicate)
+	spare_vec_duplicate = NULL_RTX;
+
+      return res;
+    }
 }
 
 /* Generate a vector constant of mode MODE in which every element has
@@ -5843,6 +5913,9 @@ const_vec_series_p_1 (const_rtx x, rtx *
   return true;
 }
 
+/* Temporary rtx used by gen_const_vec_series.  */
+static GTY((deletable)) rtx spare_vec_series;
+
 /* Generate a vector constant of mode MODE in which element I has
    the value BASE + I * STEP.  */
 
@@ -5852,13 +5925,33 @@ gen_const_vec_series (machine_mode mode,
   gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
 
   int nunits = GET_MODE_NUNITS (mode);
-  rtvec v = rtvec_alloc (nunits);
-  scalar_mode inner_mode = GET_MODE_INNER (mode);
-  RTVEC_ELT (v, 0) = base;
-  for (int i = 1; i < nunits; ++i)
-    RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
-					    RTVEC_ELT (v, i - 1), step);
-  return gen_rtx_raw_CONST_VECTOR (mode, v);
+  if (1)
+    {
+      rtvec v = rtvec_alloc (nunits);
+      scalar_mode inner_mode = GET_MODE_INNER (mode);
+      RTVEC_ELT (v, 0) = base;
+      for (int i = 1; i < nunits; ++i)
+	RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
+						RTVEC_ELT (v, i - 1), step);
+      return gen_rtx_raw_CONST_VECTOR (mode, v);
+    }
+  else
+    {
+      if (spare_vec_series)
+	{
+	  PUT_MODE (spare_vec_series, mode);
+	  XEXP (spare_vec_series, 0) = base;
+	  XEXP (spare_vec_series, 1) = step;
+	}
+      else
+	spare_vec_series = gen_rtx_VEC_SERIES (mode, base, step);
+
+      rtx res = gen_rtx_CONST (mode, spare_vec_series);
+      if (XEXP (res, 0) == spare_vec_series)
+	spare_vec_series = NULL_RTX;
+
+      return res;
+    }
 }
 
 /* Generate a vector of mode MODE in which element I has the value
@@ -6016,6 +6109,8 @@ init_emit_once (void)
 
   reg_attrs_htab = hash_table<reg_attr_hasher>::create_ggc (37);
 
+  const_htab = hash_table<const_hasher>::create_ggc (37);
+
 #ifdef INIT_EXPANDERS
   /* This is to initialize {init|mark|free}_machine_status before the first
      call to push_function_context_to.  This is needed by the Chill front
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c	2017-10-27 16:48:37.433286940 +0100
+++ gcc/gengenrtl.c	2017-10-27 16:48:37.618275521 +0100
@@ -143,7 +143,8 @@ special_rtx (int idx)
 	  || strcmp (defs[idx].enumname, "CC0") == 0
 	  || strcmp (defs[idx].enumname, "RETURN") == 0
 	  || strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
-	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
+	  || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0
+	  || strcmp (defs[idx].enumname, "CONST") == 0);
 }
 
 /* Return nonzero if the RTL code given by index IDX is one that we should
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c	2017-10-27 16:48:37.433286940 +0100
+++ gcc/rtl.c	2017-10-27 16:48:37.618275521 +0100
@@ -252,6 +252,9 @@ shared_const_p (const_rtx orig)
 {
   gcc_assert (GET_CODE (orig) == CONST);
 
+  if (unique_const_p (XEXP (orig, 0)))
+    return true;
+
   /* CONST can be shared if it contains a SYMBOL_REF.  If it contains
      a LABEL_REF, it isn't sharable.  */
   return (GET_CODE (XEXP (orig, 0)) == PLUS

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [01/nn] Add gen_(const_)vec_duplicate helpers
  2017-10-25 16:29   ` Jeff Law
@ 2017-10-27 16:12     ` Richard Sandiford
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-27 16:12 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 05:16 AM, Richard Sandiford wrote:
>> This patch adds helper functions for generating constant and
>> non-constant vector duplicates.  These routines help with SVE because
>> it is then easier to use:
>> 
>>    (const:M (vec_duplicate:M X))
>> 
>> for a broadcast of X, even if the number of elements in M isn't known
>> at compile time.  It also makes it easier for general rtx code to treat
>> constant and non-constant duplicates in the same way.
>> 
>> In the target code, the patch uses gen_vec_duplicate instead of
>> gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
>> useful.  It might be that some or all of the call sites only handle
>> non-constants in practice, in which case the change is a harmless
>> no-op (and a saving of a few characters).
>> 
>> Otherwise, the target changes use gen_const_vec_duplicate instead
>> of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
>> They also include some changes to use CONSTxx_RTX for easy global
>> constants.
>> 
>> 
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* emit-rtl.h (gen_const_vec_duplicate): Declare.
>> 	(gen_vec_duplicate): Likewise.
>> 	* emit-rtl.c (gen_const_vec_duplicate_1): New function, split
>> 	out from...
>> 	(gen_const_vector): ...here.
>> 	(gen_const_vec_duplicate, gen_vec_duplicate): New functions.
>> 	(gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
>> 	whose elements are all equal.
>> 	* optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
>> 	* simplify-rtx.c (simplify_const_unary_operation): Likewise.
>> 	(simplify_relational_operation): Likewise.
>> 	* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
>> 	Likewise.
>> 	(aarch64_simd_dup_constant): Use gen_vec_duplicate.
>> 	(aarch64_expand_vector_init): Likewise.
>> 	* config/arm/arm.c (neon_vdup_constant): Likewise.
>> 	(neon_expand_vector_init): Likewise.
>> 	(arm_expand_vec_perm): Use gen_const_vec_duplicate.
>> 	(arm_block_set_unaligned_vect): Likewise.
>> 	(arm_block_set_aligned_vect): Likewise.
>> 	* config/arm/neon.md (neon_copysignf<mode>): Likewise.
>> 	* config/i386/i386.c (ix86_expand_vec_perm): Likewise.
>> 	(expand_vec_perm_even_odd_pack): Likewise.
>> 	(ix86_vector_duplicate_value): Use gen_vec_duplicate.
>> 	* config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
>> 	* config/ia64/ia64.c (ia64_expand_vecint_compare): Use
>> 	gen_const_vec_duplicate.
>> 	* config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
>> 	* config/mips/mips.c (mips_gen_const_int_vector): Use
>> 	gen_const_vec_duplicate.
>> 	(mips_expand_vector_init): Use CONST0_RTX.
>> 	* config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
>> 	(define_split): Use gen_const_vec_duplicate.
>> 	* config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
>> 	(define_split): Use gen_const_vec_duplicate.
>> 	* config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
>> 	(vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
>> 	* config/spu/spu.c (spu_const): Likewise.
> I'd started looking at this a couple times when it was originally
> submitted, but never seemed to get through it.  It seems like a nice
> cleanup.
>
> So in gen_const_vector we had an assert to verify that const_tiny_rtx
> was set up.  That seems to have been lost.  It's probably not a big
> deal, but I mention it in case the loss was unintentional.

This morphed into:

+static rtx
+gen_const_vector (machine_mode mode, int constant)
+{
+  machine_mode inner = GET_MODE_INNER (mode);
+
+  gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+
+  rtx el = const_tiny_rtx[constant][(int) inner];
+  gcc_assert (el);

but it wasn't obvious due to the way the unified diff mixed up the
functions.  I should have posted that one as context, sorry...

Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [03/nn] Allow vector CONSTs
  2017-10-25 16:59   ` Jeff Law
@ 2017-10-27 16:19     ` Richard Sandiford
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-27 16:19 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches

Jeff Law <law@redhat.com> writes:
> On 10/23/2017 05:18 AM, Richard Sandiford wrote:
>> This patch allows (const ...) wrappers to be used for rtx vector
>> constants, as an alternative to const_vector.  This is useful
>> for SVE, where the number of elements isn't known until runtime.
> Right.  It's constant, but not knowable at compile time.  That seems an
> exact match for how we've used CONST.
>
>> 
>> It could also be useful in future for fixed-length vectors, to
>> reduce the amount of memory needed to represent simple constants
>> with high element counts.  However, one nice thing about keeping
>> it restricted to variable-length vectors is that there is never
>> any need to handle combinations of (const ...) and CONST_VECTOR.
> Yea, but is the memory consumption of these large vectors a real
> problem?  I suspect, relative to other memory issues they're in the noise.

Yeah, maybe not, especially since the elements themselves are shared.

>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>> 	    Alan Hayward  <alan.hayward@arm.com>
>> 	    David Sherwood  <david.sherwood@arm.com>
>> 
>> gcc/
>> 	* doc/rtl.texi (const): Update description of address constants.
>> 	Say that vector constants are allowed too.
>> 	* common.md (E, F): Use CONSTANT_P instead of checking for
>> 	CONST_VECTOR.
>> 	* emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
>> 	checking for CONST_VECTOR.
>> 	* expmed.c (make_tree): Use build_vector_from_val for a CONST
>> 	VEC_DUPLICATE.
>> 	* expr.c (expand_expr_real_2): Check for vector modes instead
>> 	of checking for CONST_VECTOR.
>> 	* rtl.h (const_vec_p): New function.
>> 	(const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
>> 	(unwrap_const_vec_duplicate): Handle them here too.
> My only worry here is code that is a bit loose in checking for a CONST,
> but not the innards and perhaps isn't prepared for for the new forms
> that appear inside the CONST.
>
> If we have such problems I'd expect it's in the targets as the targets
> have traditionally have had to validate the innards of a CONST to ensure
> it could be handled by the assembler/linker.  Hmm, that may save the
> targets since they'd likely need an update to LEGITIMATE_CONSTANT_P to
> ever see these new forms.
>
> Presumably an aarch64 specific patch to recognize these as valid
> constants in LEGITIMATE_CONSTANT_P is in the works?

Yeah, via the const_vec_duplicate_p helper.  For the default
variable-length mode of SVE we use the (const ...) while for the
fixed-length mode we use (const_vector ...) as normal.  Advanced SIMD
always uses (const_vector ...).

> OK for the trunk.
>
> jeff

Thanks,
Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-26 19:42         ` Eric Botcazou
  2017-10-27  8:34           ` Richard Biener
@ 2017-10-30  3:14           ` Trevor Saunders
  2017-10-30  8:52             ` Richard Sandiford
  2017-10-30 10:13             ` Eric Botcazou
  1 sibling, 2 replies; 90+ messages in thread
From: Trevor Saunders @ 2017-10-30  3:14 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: Richard Biener, gcc-patches, Richard Sandiford

On Thu, Oct 26, 2017 at 09:37:31PM +0200, Eric Botcazou wrote:
> > Can you figure what oldest GCC release supports the C++11/14 POD handling
> > that would be required?
> 
> GCC needs to be buildable by other compilers than itself though.

It sounds like people are mostly concerned about sun studio and xlc? It
doesn't seem that hard to provide precompiled binaries for those two
platforms, and maybe 4.8 binaries for people who want to compile theire
own gcc from source.  If that would be enough to deal with people
concerns it seems doable by next stage 1?

Trev

> 
> -- 
> Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-30  3:14           ` Trevor Saunders
@ 2017-10-30  8:52             ` Richard Sandiford
  2017-10-30 10:13             ` Eric Botcazou
  1 sibling, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-30  8:52 UTC (permalink / raw)
  To: Trevor Saunders; +Cc: Eric Botcazou, Richard Biener, gcc-patches

Trevor Saunders <tbsaunde@tbsaunde.org> writes:
> On Thu, Oct 26, 2017 at 09:37:31PM +0200, Eric Botcazou wrote:
>> > Can you figure what oldest GCC release supports the C++11/14 POD handling
>> > that would be required?
>> 
>> GCC needs to be buildable by other compilers than itself though.
>
> It sounds like people are mostly concerned about sun studio and xlc? It
> doesn't seem that hard to provide precompiled binaries for those two
> platforms, and maybe 4.8 binaries for people who want to compile theire
> own gcc from source.  If that would be enough to deal with people
> concerns it seems doable by next stage 1?

Would it be worth supporting a 4-stage bootstrap, with stage 0 being
built from older gcc sources?  We could include a contrib/ script that
downloads sources for gcc-4.7 or whatever and patches it to build with
modern as well as old compilers.  (When I tried gcc-4.7 last week,
I needed a couple of tweaks to get it to build.)

Not that I'd have time try that before GCC 9...

Thanks,
Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-30  3:14           ` Trevor Saunders
  2017-10-30  8:52             ` Richard Sandiford
@ 2017-10-30 10:13             ` Eric Botcazou
  2017-10-31 10:39               ` Trevor Saunders
  1 sibling, 1 reply; 90+ messages in thread
From: Eric Botcazou @ 2017-10-30 10:13 UTC (permalink / raw)
  To: Trevor Saunders; +Cc: gcc-patches, Richard Biener, Richard Sandiford

> It sounds like people are mostly concerned about sun studio and xlc? It
> doesn't seem that hard to provide precompiled binaries for those two
> platforms, and maybe 4.8 binaries for people who want to compile theire
> own gcc from source.

I'm not sure that we want to enter the business of precompiled binaries.
Moreover, if we want people to contribute to GCC's development, especially 
occasionally to fix a couple of bugs, we need to make it easier to build the 
compiler, not the other way around.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [07/nn] Add unique CONSTs
  2017-10-27 15:58     ` Richard Sandiford
@ 2017-10-30 14:49       ` Jeff Law
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 14:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/27/2017 09:56 AM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 10/23/2017 05:21 AM, Richard Sandiford wrote:
>>> This patch adds a way of treating certain kinds of CONST as unique,
>>> so that pointer equality is equivalent to value equality.  For now it
>>> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
>>> generate them remains in the else arm of an "if (1)" until a later
>>> patch.
>>>
>>> This is needed so that (const (vec_duplicate xx)) can used as the
>>> CONSTxx_RTX of a variable-length vector.
>> You're brave :-)  I know we looked at making CONST_INTs behave in this
>> manner eons ago in an effort to reduce memory consumption and it was
>> just plain painful.   There may still be comments from that project
>> littering the source code.
>>
>> I do wonder if we might want to revisit this again as we have better
>> infrastructure in place.
> 
> For vectors it isn't so bad, since we already do the same thing
> for CONST_VECTOR.  Fortunately CONST_VECTOR and CONST always have
> a mode, so there's no awkward sharing between modes...
> 
>>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>> 	    Alan Hayward  <alan.hayward@arm.com>
>>> 	    David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>> 	* rtl.h (unique_const_p): New function.
>>> 	(gen_rtx_CONST): Declare.
>>> 	* emit-rtl.c (const_hasher): New struct.
>>> 	(const_htab): New variable.
>>> 	(init_emit_once): Initialize it.
>>> 	(const_hasher::hash, const_hasher::equal): New functions.
>>> 	(gen_rtx_CONST): New function.
>>> 	(spare_vec_duplicate, spare_vec_series): New variables.
>>> 	(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
>>> 	but disable it for now.
>>> 	(gen_const_vec_series): Likewise (const (vec_series)).
>>> 	* gengenrtl.c (special_rtx): Return true for CONST.
>>> 	* rtl.c (shared_const_p): Return true if unique_const_p.
>> ISTM that you need an update the rtl.texi's structure sharing
>> assumptions section to describe the new rules around CONSTs.
> 
> Oops, yeah.  How about the attached?
OK.

> 
>> So what's the purpose of the sparc_vec_* stuff that you're going to use
>> in the future?  It looks like a single element cache to me.    Am I
>> missing something?
> 
> No, that's right.  When looking up the const for (vec_duplicate x), say,
> it's easier to create the vec_duplicate rtx first.  But if the lookup
> succeeds (and so we already have an rtx with that value), we keep the
> discarded vec_duplicate around so that we can reuse it for the next
> lookup.
OK.

Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-10-26 12:07   ` Richard Biener
  2017-10-26 12:07     ` Richard Biener
@ 2017-10-30 15:03     ` Jeff Law
  1 sibling, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 15:03 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On 10/26/2017 06:06 AM, Richard Biener wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern).  Is it the mode of the shifted
>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>> the corresponding target pattern says?  (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch.  As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted.  As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
> 
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
> 
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
> 
> Just throwing in some comments here, RTL isn't my primary
> expertise.
I wonder if encapsulation + a target hook to specify the mode would be
better?  We'd then have to argue over word_mode, vs QImode vs something
else for the default, but at least we'd have a way for the target to
specify the mode is generally best when working on shift counts.

In the end I doubt there's a single definition that is overall better.
Largely because I suspect there are times when the narrowest mode is
best, or the mode of the operand being shifted.

So thoughts on doing the encapsulation with a target hook to specify the
desired mode?  Does that get us what we need for SVE and does it provide
us a path forward on this issue if we were to try to move towards
CONST_INTs with modes?

jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [11/nn] Add narrower_subreg_mode helper function
  2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
@ 2017-10-30 15:06   ` Jeff Law
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 15:06 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:24 AM, Richard Sandiford wrote:
> This patch adds a narrowing equivalent of wider_subreg_mode.  At present
> there is only one user.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* rtl.h (narrower_subreg_mode): New function.
> 	* ira-color.c (update_costs_from_allocno): Use it.
OK.  I'm going to assume further uses will show up :-)

jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool
  2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
@ 2017-10-30 17:49   ` Jeff Law
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 17:49 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:29 AM, Richard Sandiford wrote:
> This patch moves the check for an overlapping byte to normalize_ref
> from its callers, so that it's easier to convert to poly_ints later.
> It's not really worth it on its own.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 
> gcc/
> 	* tree-ssa-dse.c (normalize_ref): Check whether the ranges overlap
> 	and return false if not.
> 	(clear_bytes_written_by, live_bytes_read): Update accordingly.
OK.
jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [10/nn] Widening optab cleanup
  2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
@ 2017-10-30 18:32   ` Jeff Law
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 18:32 UTC (permalink / raw)
  To: gcc-patches, richard.sandiford

On 10/23/2017 05:23 AM, Richard Sandiford wrote:
> widening_optab_handler had the comment:
> 
>       /* ??? Why does find_widening_optab_handler_and_mode attempt to
>          widen things that can't be widened?  E.g. add_optab... */
>       if (op > LAST_CONV_OPTAB)
>         return CODE_FOR_nothing;
> 
> I think it comes from expand_binop using
> find_widening_optab_handler_and_mode for two things: to test whether
> a "normal" optab like add_optab is supported for a standard binary
> operation and to test whether a "convert" optab is supported for a
> widening operation like umul_widen_optab.  In the former case from_mode
> and to_mode must be the same, in the latter from_mode must be narrower
> than to_mode.
> 
> For the former case, find_widening_optab_handler_and_mode is only really
> testing the modes that are passed in.  permit_non_widening must be true
> here.
> 
> For the latter case, find_widening_optab_handler_and_mode should only
> really consider new from_modes that are wider than the original
> from_mode and narrower than the original to_mode.  Logically
> permit_non_widening should be false, since widening optabs aren't
> supposed to take operands that are the same width as the destination.
> We get away with permit_non_widening being true because no target
> would/should define a widening .md pattern with matching modes.
> 
> But really, it seems better for expand_binop to handle these two
> cases itself rather than pushing them down.  With that change,
> find_widening_optab_handler_and_mode is only ever called with
> permit_non_widening set to false and is only ever called with
> a "proper" convert optab.  We then no longer need widening_optab_handler,
> we can just use convert_optab_handler directly.
> 
> The patch also passes the instruction code down to expand_binop_directly.
> This should be more efficient and removes an extra call to
> find_widening_optab_handler_and_mode.
> 
> 
> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* optabs-query.h (convert_optab_p): New function, split out from...
> 	(convert_optab_handler): ...here.
> 	(widening_optab_handler): Delete.
> 	(find_widening_optab_handler): Remove permit_non_widening parameter.
> 	(find_widening_optab_handler_and_mode): Likewise.  Provide an
> 	override that operates on mode class wrappers.
> 	* optabs-query.c (widening_optab_handler): Delete.
> 	(find_widening_optab_handler_and_mode): Remove permit_non_widening
> 	parameter.  Assert that the two modes are the same class and that
> 	the "from" mode is narrower than the "to" mode.  Use
> 	convert_optab_handler instead of widening_optab_handler.
> 	* expmed.c (expmed_mult_highpart_optab): Use convert_optab_handler
> 	instead of widening_optab_handler.
> 	* expr.c (expand_expr_real_2): Update calls to
> 	find_widening_optab_handler.
> 	* optabs.c (expand_widen_pattern_expr): Likewise.
> 	(expand_binop_directly): Take the insn_code as a parameter.
> 	(expand_binop): Only call find_widening_optab_handler for
> 	conversion optabs; use optab_handler otherwise.  Update calls
> 	to find_widening_optab_handler and expand_binop_directly.
> 	Use convert_optab_handler instead of widening_optab_handler.
> 	* tree-ssa-math-opts.c (convert_mult_to_widen): Update calls to
> 	find_widening_optab_handler and use scalar_mode rather than
> 	machine_mode.
> 	(convert_plusminus_to_widen): Likewise.
OK.
jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-30 10:13             ` Eric Botcazou
@ 2017-10-31 10:39               ` Trevor Saunders
  2017-10-31 17:29                 ` Eric Botcazou
  0 siblings, 1 reply; 90+ messages in thread
From: Trevor Saunders @ 2017-10-31 10:39 UTC (permalink / raw)
  To: Eric Botcazou; +Cc: gcc-patches, Richard Biener, Richard Sandiford

On Mon, Oct 30, 2017 at 11:11:12AM +0100, Eric Botcazou wrote:
> > It sounds like people are mostly concerned about sun studio and xlc? It
> > doesn't seem that hard to provide precompiled binaries for those two
> > platforms, and maybe 4.8 binaries for people who want to compile theire
> > own gcc from source.
> 
> I'm not sure that we want to enter the business of precompiled binaries.

I don't see a reason not to other than a pretty small amount of work
each time we make a release.

> Moreover, if we want people to contribute to GCC's development, especially 
> occasionally to fix a couple of bugs, we need to make it easier to build the 
> compiler, not the other way around.

Well first this would only matter to the 0.01% of people who want to do
that on AIX or Solaris machines, not the vast majority of possible
contributors who already use clang or gcc as there system compiler.
Secondly downloading a tarball isn't very difficult and arguably
providing them makes it easier for people to test gcc on those systems
without having to build it themselves.
Thirdly making it easier to work on the compiler and understand it makes
things easier for those possible contributors, so if being able to use
C++11 advances that goalthings could be better over all for possible
contributors with different system compilers.

Trev

> 
> -- 
> Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-31 10:39               ` Trevor Saunders
@ 2017-10-31 17:29                 ` Eric Botcazou
  2017-10-31 17:57                   ` Jeff Law
  0 siblings, 1 reply; 90+ messages in thread
From: Eric Botcazou @ 2017-10-31 17:29 UTC (permalink / raw)
  To: Trevor Saunders; +Cc: gcc-patches, Richard Biener, Richard Sandiford

> I don't see a reason not to other than a pretty small amount of work
> each time we make a release.

I'm not sure it would be so small an amount of work, especially on non-Linux 
platforms, so this would IMO divert our resources for little benefit.

> Well first this would only matter to the 0.01% of people who want to do
> that on AIX or Solaris machines, not the vast majority of possible
> contributors who already use clang or gcc as there system compiler.

Yes, but we're GCC, not Clang, and we support more than Linux and Darwin.

> Thirdly making it easier to work on the compiler and understand it makes
> things easier for those possible contributors, so if being able to use
> C++11 advances that goalthings could be better over all for possible
> contributors with different system compilers.

I don't buy this at all.  You don't need bleeding edge C++ features to build a 
compiler and people don't work on compilers to use bleeding edge C++.  Using a 
narrow and sensible set of C++ features was one of the conditions under which 
the switch to C++ as implementation language was accepted at the time.

-- 
Eric Botcazou

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-31 17:29                 ` Eric Botcazou
@ 2017-10-31 17:57                   ` Jeff Law
  2017-11-01  2:50                     ` Trevor Saunders
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-31 17:57 UTC (permalink / raw)
  To: Eric Botcazou, Trevor Saunders
  Cc: gcc-patches, Richard Biener, Richard Sandiford

On 10/31/2017 11:22 AM, Eric Botcazou wrote:
>> I don't see a reason not to other than a pretty small amount of work
>> each time we make a release.
> 
> I'm not sure it would be so small an amount of work, especially on non-Linux 
> platforms, so this would IMO divert our resources for little benefit.
Having done this for years on HPUX, yes, it takes more time than one
could imagine.  THen I went to work for a company that did this for
hpux, solaris, aix, irix and others and well, it was very painful.

> 
>> Well first this would only matter to the 0.01% of people who want to do
>> that on AIX or Solaris machines, not the vast majority of possible
>> contributors who already use clang or gcc as there system compiler.
> 
> Yes, but we're GCC, not Clang, and we support more than Linux and Darwin.
Very true.

> 
>> Thirdly making it easier to work on the compiler and understand it makes
>> things easier for those possible contributors, so if being able to use
>> C++11 advances that goalthings could be better over all for possible
>> contributors with different system compilers.
> 
> I don't buy this at all.  You don't need bleeding edge C++ features to build a 
> compiler and people don't work on compilers to use bleeding edge C++.  Using a 
> narrow and sensible set of C++ features was one of the conditions under which 
> the switch to C++ as implementation language was accepted at the time.
Agreed that we need to stick with a sensible set of features.  But the
sensible set isn't necessarily fixed forever.

Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-10-31 17:57                   ` Jeff Law
@ 2017-11-01  2:50                     ` Trevor Saunders
  2017-11-01 16:30                       ` Jeff Law
  0 siblings, 1 reply; 90+ messages in thread
From: Trevor Saunders @ 2017-11-01  2:50 UTC (permalink / raw)
  To: Jeff Law; +Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

On Tue, Oct 31, 2017 at 11:38:48AM -0600, Jeff Law wrote:
> On 10/31/2017 11:22 AM, Eric Botcazou wrote:
> >> I don't see a reason not to other than a pretty small amount of work
> >> each time we make a release.
> > 
> > I'm not sure it would be so small an amount of work, especially on non-Linux 
> > platforms, so this would IMO divert our resources for little benefit.
> Having done this for years on HPUX, yes, it takes more time than one
> could imagine.  THen I went to work for a company that did this for
> hpux, solaris, aix, irix and others and well, it was very painful.

I'm sure its a project one can spend arbitrary amounts of time on if one
wishes or is payed to do so.  That said I'm considering the scope here
limitted to running configure / make  / make install with the defaults
and taring up the result.  I'll admitt I've only done that on linux
where it was easy, but people do keep AIX and Solaris building and they
really are supposed to be buildable in a release.  However at some point
it can be less work to do this than to beat C++98 into doing what is
desired.

> >> Well first this would only matter to the 0.01% of people who want to do
> >> that on AIX or Solaris machines, not the vast majority of possible
> >> contributors who already use clang or gcc as there system compiler.
> > 
> > Yes, but we're GCC, not Clang, and we support more than Linux and Darwin.
> Very true.

certainly, but I think it makes sense to understand how many people
might be negatively effected by a change, and to what degree before
making that decision.

> >> Thirdly making it easier to work on the compiler and understand it makes
> >> things easier for those possible contributors, so if being able to use
> >> C++11 advances that goalthings could be better over all for possible
> >> contributors with different system compilers.
> > 
> > I don't buy this at all.  You don't need bleeding edge C++ features to build a 
> > compiler and people don't work on compilers to use bleeding edge C++.  Using a 
> > narrow and sensible set of C++ features was one of the conditions under which 
> > the switch to C++ as implementation language was accepted at the time.
> Agreed that we need to stick with a sensible set of features.  But the
> sensible set isn't necessarily fixed forever.

Also as a counter example what brought this thread up is Richard wanting
to use something from C++11.  So in that particular case it probably
would make something better.

thanks

Trev

> 
> Jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-11-01  2:50                     ` Trevor Saunders
@ 2017-11-01 16:30                       ` Jeff Law
  2017-11-02  4:28                         ` Trevor Saunders
  0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-11-01 16:30 UTC (permalink / raw)
  To: Trevor Saunders
  Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

On 10/31/2017 08:47 PM, Trevor Saunders wrote:
> On Tue, Oct 31, 2017 at 11:38:48AM -0600, Jeff Law wrote:
>> On 10/31/2017 11:22 AM, Eric Botcazou wrote:
>>>> I don't see a reason not to other than a pretty small amount of work
>>>> each time we make a release.
>>>
>>> I'm not sure it would be so small an amount of work, especially on non-Linux
>>> platforms, so this would IMO divert our resources for little benefit.
>> Having done this for years on HPUX, yes, it takes more time than one
>> could imagine.  THen I went to work for a company that did this for
>> hpux, solaris, aix, irix and others and well, it was very painful.
> 
> I'm sure its a project one can spend arbitrary amounts of time on if one
> wishes or is payed to do so.  That said I'm considering the scope here
> limitted to running configure / make  / make install with the defaults
> and taring up the result.  I'll admitt I've only done that on linux
> where it was easy, but people do keep AIX and Solaris building and they
> really are supposed to be buildable in a release.  However at some point
> it can be less work to do this than to beat C++98 into doing what is
> desired.
It sounds so easy, but it does get more complex than just build and tar 
the result up.  How (for example) do you handle DSOs that may or may not 
be on the system where the bits get installed.  Do you embed them or 
tell the user to go get them?    That's just one example of a gotcha, 
there's many.

It's really not something I'd suggest we pursue all that deeply.  Been 
there, done that wouldn't want to do it again.

>>>> Thirdly making it easier to work on the compiler and understand it makes
>>>> things easier for those possible contributors, so if being able to use
>>>> C++11 advances that goalthings could be better over all for possible
>>>> contributors with different system compilers.
>>>
>>> I don't buy this at all.  You don't need bleeding edge C++ features to build a
>>> compiler and people don't work on compilers to use bleeding edge C++.  Using a
>>> narrow and sensible set of C++ features was one of the conditions under which
>>> the switch to C++ as implementation language was accepted at the time.
>> Agreed that we need to stick with a sensible set of features.  But the
>> sensible set isn't necessarily fixed forever.
> 
> Also as a counter example what brought this thread up is Richard wanting
> to use something from C++11.  So in that particular case it probably
> would make something better.
In my particular case I could use certain C++11 features to make the 
code cleaner/easier to prove right -- particularly rvalue references and 
move semantics.  I've got an object with a chunk of allocated memory.  I 
want to move ownership of the memory to another object.

C++11 handles this cleanly and gracefully and in doing so makes it very 
hard to get it wrong.

However, I don't think  my case, in and of itself, is enough to push us 
into the C++11 world.  Nor am I convinced that the aggregate of these 
things is enough to push us into the C++11 world.  But I do think we'll 
be there at some point.

jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [09/nn] Add a fixed_size_mode_pod class
  2017-11-01 16:30                       ` Jeff Law
@ 2017-11-02  4:28                         ` Trevor Saunders
  0 siblings, 0 replies; 90+ messages in thread
From: Trevor Saunders @ 2017-11-02  4:28 UTC (permalink / raw)
  To: Jeff Law; +Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford

On Wed, Nov 01, 2017 at 10:30:29AM -0600, Jeff Law wrote:
> On 10/31/2017 08:47 PM, Trevor Saunders wrote:
> > On Tue, Oct 31, 2017 at 11:38:48AM -0600, Jeff Law wrote:
> > > On 10/31/2017 11:22 AM, Eric Botcazou wrote:
> > > > > I don't see a reason not to other than a pretty small amount of work
> > > > > each time we make a release.
> > > > 
> > > > I'm not sure it would be so small an amount of work, especially on non-Linux
> > > > platforms, so this would IMO divert our resources for little benefit.
> > > Having done this for years on HPUX, yes, it takes more time than one
> > > could imagine.  THen I went to work for a company that did this for
> > > hpux, solaris, aix, irix and others and well, it was very painful.
> > 
> > I'm sure its a project one can spend arbitrary amounts of time on if one
> > wishes or is payed to do so.  That said I'm considering the scope here
> > limitted to running configure / make  / make install with the defaults
> > and taring up the result.  I'll admitt I've only done that on linux
> > where it was easy, but people do keep AIX and Solaris building and they
> > really are supposed to be buildable in a release.  However at some point
> > it can be less work to do this than to beat C++98 into doing what is
> > desired.
> It sounds so easy, but it does get more complex than just build and tar the
> result up.  How (for example) do you handle DSOs that may or may not be on
> the system where the bits get installed.  Do you embed them or tell the user
> to go get them?    That's just one example of a gotcha, there's many.
> 
> It's really not something I'd suggest we pursue all that deeply.  Been
> there, done that wouldn't want to do it again.
> 
> > > > > Thirdly making it easier to work on the compiler and understand it makes
> > > > > things easier for those possible contributors, so if being able to use
> > > > > C++11 advances that goalthings could be better over all for possible
> > > > > contributors with different system compilers.
> > > > 
> > > > I don't buy this at all.  You don't need bleeding edge C++ features to build a
> > > > compiler and people don't work on compilers to use bleeding edge C++.  Using a
> > > > narrow and sensible set of C++ features was one of the conditions under which
> > > > the switch to C++ as implementation language was accepted at the time.
> > > Agreed that we need to stick with a sensible set of features.  But the
> > > sensible set isn't necessarily fixed forever.
> > 
> > Also as a counter example what brought this thread up is Richard wanting
> > to use something from C++11.  So in that particular case it probably
> > would make something better.
> In my particular case I could use certain C++11 features to make the code
> cleaner/easier to prove right -- particularly rvalue references and move
> semantics.  I've got an object with a chunk of allocated memory.  I want to
> move ownership of the memory to another object.
> 
> C++11 handles this cleanly and gracefully and in doing so makes it very hard
> to get it wrong.

You may want to look at how the unique_ptr shim deals with that, though
maybe you don't want to copy the ifdef hackery to actually use rval refs
when possible.

Trev

> 
> However, I don't think  my case, in and of itself, is enough to push us into
> the C++11 world.  Nor am I convinced that the aggregate of these things is
> enough to push us into the C++11 world.  But I do think we'll be there at
> some point.
> 
> jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-10-26 11:53   ` Richard Biener
@ 2017-11-06 15:09     ` Richard Sandiford
  2017-11-07 10:37       ` Richard Biener
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-11-06 15:09 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> SVE needs a way of broadcasting a scalar to a variable-length vector.
>> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
>> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
>> be used for fixed-length vectors.  VEC_DUPLICATE_EXPR is the tree
>> equivalent of the existing rtl code VEC_DUPLICATE.
>>
>> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
>> to mark constant nodes, but in response to last year's RFC, Richard B.
>> suggested it would be better to have separate codes for the constant
>> and non-constant cases.  This allows VEC_DUPLICATE_EXPR to be treated
>> as a normal unary operation and avoids the previous need for treating
>> it as a GIMPLE_SINGLE_RHS.
>>
>> It might make sense to use VEC_DUPLICATE_CST for all duplicated
>> vector constants, since it's a bit more compact than VECTOR_CST
>> in that case, and is potentially more efficient to process.
>> However, the nice thing about keeping it restricted to variable-length
>> vectors is that there is then no need to handle combinations of
>> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
>> VECTOR_CST or never use it.
>>
>> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-vect-generic.c     2017-10-23 11:41:51.773953100 +0100
> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>  ssa_uniform_vector_p (tree op)
>   {
>      if (TREE_CODE (op) == VECTOR_CST
>      +      || TREE_CODE (op) == VEC_DUPLICATE_CST
>             || TREE_CODE (op) == CONSTRUCTOR)
>                  return uniform_vector_p (op);
>
> VEC_DUPLICATE_EXPR handling?

Oops, yeah.  I could have sworn it was there at one time...

> Looks like for VEC_DUPLICATE_CST it could directly return true.

The function is a bit misnamed: it returns the duplicated tree value
rather than a bool.

> I didn't see uniform_vector_p being updated?

That part was there FWIW (for tree.c).

> Can you add verification to either verify_expr or build_vec_duplicate_cst
> that the type is one of variable size?  And amend tree.def docs
> accordingly.  Because otherwise we miss a lot of cases in constant
> folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).

OK, done in the patch below with a gcc_unreachable () bomb in
build_vec_duplicate_cst, which becomes a gcc_assert when variable-length
vectors are added.  This meant changing the selftests to use
build_vector_from_val rather than build_vec_duplicate_cst,
but to still get testing of VEC_DUPLICATE_*, we then need to use
the target's preferred vector length instead of always using 4.

Tested as before.  OK (given the slightly different selftests)?

Thanks,
Richard


2017-11-06  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hawyard@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
	(VEC_COND_EXPR): Add missing @tindex.
	* doc/md.texi (vec_duplicate@var{m}): Document.
	* tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
	* tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
	are used for VEC_DUPLICATE_CST as well.
	(tree_vector): Access base.n.nelts directly.
	* tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
	valid codes.
	(VEC_DUPLICATE_CST_ELT): New macro.
	* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
	(integer_zerop, integer_onep, integer_all_onesp, integer_truep)
	(real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
	(walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
	(build_vec_duplicate_cst): New function.
	(build_vector_from_val): Add stubbed-out handling of variable-length
	vectors, using build_vec_duplicate_cst and VEC_DUPLICATE_EXPR.
	(uniform_vector_p): Handle the new codes.
	(test_vec_duplicate_predicates_int): New function.
	(test_vec_duplicate_predicates_float): Likewise.
	(test_vec_duplicate_predicates): Likewise.
	(tree_c_tests): Call test_vec_duplicate_predicates.
	* cfgexpand.c (expand_debug_expr): Handle the new codes.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
	* dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
	* gimple-expr.h (is_gimple_constant): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* graphite-isl-ast-to-gimple.c
	(translate_isl_ast_to_gimple::is_constant): Likewise.
	* graphite-scop-detection.c (scan_tree_for_params): Likewise.
	* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
	(func_checker::compare_operand): Likewise.
	* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
	* match.pd (negate_expr_p): Likewise.
	* print-tree.c (print_node): Likewise.
	* tree-chkp.c (chkp_find_bounds_1): Likewise.
	* tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
	* tree-ssa-loop.c (for_each_index): Likewise.
	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
	(ao_ref_init_from_vn_reference): Likewise.
	* varasm.c (const_hash_1, compare_constant): Likewise.
	* fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
	(fold_convert_const, operand_equal_p, fold_view_convert_expr)
	(exact_inverse, fold_checksum_tree): Likewise.
	(const_unop): Likewise.  Fold VEC_DUPLICATE_EXPRs of a constant.
	(test_vec_duplicate_folding): New function.
	(fold_const_c_tests): Call it.
	* optabs.def (vec_duplicate_optab): New optab.
	* optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
	* optabs.h (expand_vector_broadcast): Declare.
	* optabs.c (expand_vector_broadcast): Make non-static.  Try using
	vec_duplicate_optab.
	* expr.c (store_constructor): Try using vec_duplicate_optab for
	uniform vectors.
	(const_vector_element): New function, split out from...
	(const_vector_from_tree): ...here.
	(expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
	(expand_expr_real_1): Handle VEC_DUPLICATE_CST.
	* internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
	instead of checking for VECTOR_CST.
	* tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
	(verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
	* tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-11-06 12:40:39.845713389 +0000
+++ gcc/doc/generic.texi	2017-11-06 12:40:40.277637153 +0000
@@ -1036,6 +1036,7 @@ As this example indicates, the operands
 @tindex FIXED_CST
 @tindex COMPLEX_CST
 @tindex VECTOR_CST
+@tindex VEC_DUPLICATE_CST
 @tindex STRING_CST
 @findex TREE_STRING_LENGTH
 @findex TREE_STRING_POINTER
@@ -1089,6 +1090,14 @@ constant nodes.  Each individual constan
 double constant node.  The first operand is a @code{TREE_LIST} of the
 constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
 
+@item VEC_DUPLICATE_CST
+These nodes represent a vector constant in which every element has the
+same scalar value.  At present only variable-length vectors use
+@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
+instead.  The scalar element value is given by
+@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
+element of a @code{VECTOR_CST}.
+
 @item STRING_CST
 These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
 returns the length of the string, as an @code{int}.  The
@@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
 
 @node Vectors
 @subsection Vectors
+@tindex VEC_DUPLICATE_EXPR
 @tindex VEC_LSHIFT_EXPR
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_COND_EXPR
 @tindex SAD_EXPR
 
 @table @code
+@item VEC_DUPLICATE_EXPR
+This node has a single operand and represents a vector in which every
+element is equal to that operand.
+
 @item VEC_LSHIFT_EXPR
 @itemx VEC_RSHIFT_EXPR
 These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-11-06 12:40:39.845713389 +0000
+++ gcc/doc/md.texi	2017-11-06 12:40:40.278630081 +0000
@@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
 the vector mode @var{m}, or a vector mode with the same element mode and
 smaller number of elements.
 
+@cindex @code{vec_duplicate@var{m}} instruction pattern
+@item @samp{vec_duplicate@var{m}}
+Initialize vector output operand 0 so that each element has the value given
+by scalar input operand 1.  The vector has mode @var{m} and the scalar has
+the mode appropriate for one element of @var{m}.
+
+This pattern only handles duplicates of non-constant inputs.  Constant
+vectors go through the @code{mov@var{m}} pattern instead.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
 Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree.def	2017-11-06 12:40:40.292531076 +0000
@@ -304,6 +304,11 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
 /* Contents are in VECTOR_CST_ELTS field.  */
 DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
 
+/* Represents a vector constant in which every element is equal to
+   VEC_DUPLICATE_CST_ELT.  This is only ever used for variable-length
+   vectors; fixed-length vectors must use VECTOR_CST instead.  */
+DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
+
 /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
 DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
 
@@ -534,6 +539,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
    1 and 2 are NULL.  The operands are then taken from the cfg edges. */
 DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
 
+/* Represents a vector in which every element is equal to operand 0.  */
+DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 
Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-core.h	2017-11-06 12:40:40.288559363 +0000
@@ -975,7 +975,8 @@ struct GTY(()) tree_base {
     /* VEC length.  This field is only used with TREE_VEC.  */
     int length;
 
-    /* Number of elements.  This field is only used with VECTOR_CST.  */
+    /* Number of elements.  This field is only used with VECTOR_CST
+       and VEC_DUPLICATE_CST.  It is always 1 for VEC_DUPLICATE_CST.  */
     unsigned int nelts;
 
     /* SSA version number.  This field is only used with SSA_NAME.  */
@@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
    public_flag:
 
        TREE_OVERFLOW in
-           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
+           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
 
        TREE_PUBLIC in
            VAR_DECL, FUNCTION_DECL
@@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
 
 struct GTY(()) tree_vector {
   struct tree_typed typed;
-  tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
+  tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
 };
 
 struct GTY(()) tree_identifier {
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree.h	2017-11-06 12:40:40.293524004 +0000
@@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
 #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
   (PTR_OR_REF_CHECK (NODE)->base.static_flag)
 
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
-   there was an overflow in folding.  */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
+   this means there was an overflow in folding.  */
 
 #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
 
@@ -1009,6 +1009,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
 #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
 #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
 
+/* In a VEC_DUPLICATE_CST node.  */
+#define VEC_DUPLICATE_CST_ELT(NODE) \
+  (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
+
 /* Define fields and accessors for some special-purpose tree nodes.  */
 
 #define IDENTIFIER_LENGTH(NODE) \
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree.c	2017-11-06 12:40:40.292531076 +0000
@@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
     case FIXED_CST:		return TS_FIXED_CST;
     case COMPLEX_CST:		return TS_COMPLEX;
     case VECTOR_CST:		return TS_VECTOR;
+    case VEC_DUPLICATE_CST:	return TS_VECTOR;
     case STRING_CST:		return TS_STRING;
       /* tcc_exceptional cases.  */
     case ERROR_MARK:		return TS_COMMON;
@@ -829,6 +830,7 @@ tree_code_size (enum tree_code code)
 	case FIXED_CST:		return sizeof (tree_fixed_cst);
 	case COMPLEX_CST:	return sizeof (tree_complex);
 	case VECTOR_CST:	return sizeof (tree_vector);
+	case VEC_DUPLICATE_CST:	return sizeof (tree_vector);
 	case STRING_CST:	gcc_unreachable ();
 	default:
 	  gcc_checking_assert (code >= NUM_TREE_CODES);
@@ -890,6 +892,9 @@ tree_size (const_tree node)
       return (sizeof (struct tree_vector)
 	      + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
 
+    case VEC_DUPLICATE_CST:
+      return sizeof (struct tree_vector);
+
     case STRING_CST:
       return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
 
@@ -1697,6 +1702,34 @@ cst_and_fits_in_hwi (const_tree x)
 	  && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
 }
 
+/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
+
+   This function is only suitable for callers that know TYPE is a
+   variable-length vector and specifically need a VEC_DUPLICATE_CST node.
+   Use build_vector_from_val to duplicate a general scalar into a general
+   vector type.  */
+
+static tree
+build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
+{
+  /* Shouldn't be used until we have variable-length vectors.  */
+  gcc_unreachable ();
+
+  int length = sizeof (struct tree_vector);
+
+  record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
+
+  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+  TREE_SET_CODE (t, VEC_DUPLICATE_CST);
+  TREE_TYPE (t) = type;
+  t->base.u.nelts = 1;
+  VEC_DUPLICATE_CST_ELT (t) = exp;
+  TREE_CONSTANT (t) = 1;
+
+  return t;
+}
+
 /* Build a newly constructed VECTOR_CST node of length LEN.  */
 
 tree
@@ -1790,6 +1823,13 @@ build_vector_from_val (tree vectype, tre
   gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)),
 					   TREE_TYPE (vectype)));
 
+  if (0)
+    {
+      if (CONSTANT_CLASS_P (sc))
+	return build_vec_duplicate_cst (vectype, sc);
+      return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
+    }
+
   if (CONSTANT_CLASS_P (sc))
     {
       auto_vec<tree, 32> v (nunits);
@@ -2358,6 +2398,8 @@ integer_zerop (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2384,6 +2426,8 @@ integer_onep (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2422,6 +2466,9 @@ integer_all_onesp (const_tree expr)
       return 1;
     }
 
+  else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
+    return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
+
   else if (TREE_CODE (expr) != INTEGER_CST)
     return 0;
 
@@ -2478,7 +2525,7 @@ integer_nonzerop (const_tree expr)
 int
 integer_truep (const_tree expr)
 {
-  if (TREE_CODE (expr) == VECTOR_CST)
+  if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
     return integer_all_onesp (expr);
   return integer_onep (expr);
 }
@@ -2649,6 +2696,8 @@ real_zerop (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2677,6 +2726,8 @@ real_onep (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return real_onep (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -2704,6 +2755,8 @@ real_minus_onep (const_tree expr)
 	    return false;
 	return true;
       }
+    case VEC_DUPLICATE_CST:
+      return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
     default:
       return false;
     }
@@ -7106,6 +7159,9 @@ add_expr (const_tree t, inchash::hash &h
 	  inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
 	return;
       }
+    case VEC_DUPLICATE_CST:
+      inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
+      return;
     case SSA_NAME:
       /* We can just compare by pointer.  */
       hstate.add_hwi (SSA_NAME_VERSION (t));
@@ -10367,6 +10423,9 @@ initializer_zerop (const_tree init)
 	return true;
       }
 
+    case VEC_DUPLICATE_CST:
+      return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
+
     case CONSTRUCTOR:
       {
 	unsigned HOST_WIDE_INT idx;
@@ -10412,7 +10471,13 @@ uniform_vector_p (const_tree vec)
 
   gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
 
-  if (TREE_CODE (vec) == VECTOR_CST)
+  if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
+    return VEC_DUPLICATE_CST_ELT (vec);
+
+  else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
+    return TREE_OPERAND (vec, 0);
+
+  else if (TREE_CODE (vec) == VECTOR_CST)
     {
       first = VECTOR_CST_ELT (vec, 0);
       for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
@@ -11144,6 +11209,7 @@ #define WALK_SUBTREE_TAIL(NODE)				\
     case REAL_CST:
     case FIXED_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
     case BLOCK:
     case PLACEHOLDER_EXPR:
@@ -12430,6 +12496,12 @@ drop_tree_overflow (tree t)
 	    elt = drop_tree_overflow (elt);
 	}
     }
+  if (TREE_CODE (t) == VEC_DUPLICATE_CST)
+    {
+      tree *elt = &VEC_DUPLICATE_CST_ELT (t);
+      if (TREE_OVERFLOW (*elt))
+	*elt = drop_tree_overflow (*elt);
+    }
   return t;
 }
 
@@ -13850,6 +13922,102 @@ test_integer_constants ()
   ASSERT_EQ (type, TREE_TYPE (zero));
 }
 
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
+   for integral type TYPE.  */
+
+static void
+test_vec_duplicate_predicates_int (tree type)
+{
+  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type);
+  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+  /* This will be 1 if VEC_MODE isn't a vector mode.  */
+  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+  tree vec_type = build_vector_type (type, nunits);
+
+  tree zero = build_zero_cst (type);
+  tree vec_zero = build_vector_from_val (vec_type, zero);
+  ASSERT_TRUE (integer_zerop (vec_zero));
+  ASSERT_FALSE (integer_onep (vec_zero));
+  ASSERT_FALSE (integer_minus_onep (vec_zero));
+  ASSERT_FALSE (integer_all_onesp (vec_zero));
+  ASSERT_FALSE (integer_truep (vec_zero));
+  ASSERT_TRUE (initializer_zerop (vec_zero));
+
+  tree one = build_one_cst (type);
+  tree vec_one = build_vector_from_val (vec_type, one);
+  ASSERT_FALSE (integer_zerop (vec_one));
+  ASSERT_TRUE (integer_onep (vec_one));
+  ASSERT_FALSE (integer_minus_onep (vec_one));
+  ASSERT_FALSE (integer_all_onesp (vec_one));
+  ASSERT_FALSE (integer_truep (vec_one));
+  ASSERT_FALSE (initializer_zerop (vec_one));
+
+  tree minus_one = build_minus_one_cst (type);
+  tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
+  ASSERT_FALSE (integer_zerop (vec_minus_one));
+  ASSERT_FALSE (integer_onep (vec_minus_one));
+  ASSERT_TRUE (integer_minus_onep (vec_minus_one));
+  ASSERT_TRUE (integer_all_onesp (vec_minus_one));
+  ASSERT_TRUE (integer_truep (vec_minus_one));
+  ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+  tree x = create_tmp_var_raw (type, "x");
+  tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
+  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+  ASSERT_EQ (uniform_vector_p (vec_one), one);
+  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+  ASSERT_EQ (uniform_vector_p (vec_x), x);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
+   type TYPE.  */
+
+static void
+test_vec_duplicate_predicates_float (tree type)
+{
+  scalar_float_mode float_mode = SCALAR_FLOAT_TYPE_MODE (type);
+  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (float_mode);
+  /* This will be 1 if VEC_MODE isn't a vector mode.  */
+  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+  tree vec_type = build_vector_type (type, nunits);
+
+  tree zero = build_zero_cst (type);
+  tree vec_zero = build_vector_from_val (vec_type, zero);
+  ASSERT_TRUE (real_zerop (vec_zero));
+  ASSERT_FALSE (real_onep (vec_zero));
+  ASSERT_FALSE (real_minus_onep (vec_zero));
+  ASSERT_TRUE (initializer_zerop (vec_zero));
+
+  tree one = build_one_cst (type);
+  tree vec_one = build_vector_from_val (vec_type, one);
+  ASSERT_FALSE (real_zerop (vec_one));
+  ASSERT_TRUE (real_onep (vec_one));
+  ASSERT_FALSE (real_minus_onep (vec_one));
+  ASSERT_FALSE (initializer_zerop (vec_one));
+
+  tree minus_one = build_minus_one_cst (type);
+  tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
+  ASSERT_FALSE (real_zerop (vec_minus_one));
+  ASSERT_FALSE (real_onep (vec_minus_one));
+  ASSERT_TRUE (real_minus_onep (vec_minus_one));
+  ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+  ASSERT_EQ (uniform_vector_p (vec_one), one);
+  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
+
+static void
+test_vec_duplicate_predicates ()
+{
+  test_vec_duplicate_predicates_int (integer_type_node);
+  test_vec_duplicate_predicates_float (float_type_node);
+}
+
 /* Verify identifiers.  */
 
 static void
@@ -13878,6 +14046,7 @@ test_labels ()
 tree_c_tests ()
 {
   test_integer_constants ();
+  test_vec_duplicate_predicates ();
   test_identifiers ();
   test_labels ();
 }
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/cfgexpand.c	2017-11-06 12:40:40.276644225 +0000
@@ -5068,6 +5068,8 @@ expand_debug_expr (tree exp)
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_PERM_EXPR:
+    case VEC_DUPLICATE_CST:
+    case VEC_DUPLICATE_EXPR:
       return NULL;
 
     /* Misc codes.  */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-pretty-print.c	2017-11-06 12:40:40.289552291 +0000
@@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
       }
       break;
 
+    case VEC_DUPLICATE_CST:
+      pp_string (pp, "{ ");
+      dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
+      pp_string (pp, ", ... }");
+      break;
+
     case FUNCTION_TYPE:
     case METHOD_TYPE:
       dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
+    case VEC_DUPLICATE_EXPR:
+      pp_space (pp);
+      for (str = get_tree_code_name (code); *str; str++)
+	pp_character (pp, TOUPPER (*str));
+      pp_string (pp, " < ");
+      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (pp, " > ");
+      break;
+
     case VEC_UNPACK_HI_EXPR:
       pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
       dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-vect-generic.c	2017-11-06 12:40:40.291538147 +0000
@@ -1419,6 +1419,8 @@ lower_vec_perm (gimple_stmt_iterator *gs
 ssa_uniform_vector_p (tree op)
 {
   if (TREE_CODE (op) == VECTOR_CST
+      || TREE_CODE (op) == VEC_DUPLICATE_CST
+      || TREE_CODE (op) == VEC_DUPLICATE_EXPR
       || TREE_CODE (op) == CONSTRUCTOR)
     return uniform_vector_p (op);
   if (TREE_CODE (op) == SSA_NAME)
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/dwarf2out.c	2017-11-06 12:40:40.280615937 +0000
@@ -18878,6 +18878,7 @@ rtl_for_decl_init (tree init, tree type)
 	switch (TREE_CODE (init))
 	  {
 	  case VECTOR_CST:
+	  case VEC_DUPLICATE_CST:
 	    break;
 	  case CONSTRUCTOR:
 	    if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h	2017-11-06 12:40:39.845713389 +0000
+++ gcc/gimple-expr.h	2017-11-06 12:40:40.282601794 +0000
@@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
     case FIXED_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
       return true;
 
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/gimplify.c	2017-11-06 12:40:40.283594722 +0000
@@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	case STRING_CST:
 	case COMPLEX_CST:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	  /* Drop the overflow flag on constants, we do not want
 	     that in the GIMPLE IL.  */
 	  if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-isl-ast-to-gimple.c
===================================================================
--- gcc/graphite-isl-ast-to-gimple.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/graphite-isl-ast-to-gimple.c	2017-11-06 12:40:40.284587650 +0000
@@ -211,7 +211,8 @@ enum phi_node_kind
     return TREE_CODE (op) == INTEGER_CST
       || TREE_CODE (op) == REAL_CST
       || TREE_CODE (op) == COMPLEX_CST
-      || TREE_CODE (op) == VECTOR_CST;
+      || TREE_CODE (op) == VECTOR_CST
+      || TREE_CODE (op) == VEC_DUPLICATE_CST;
   }
 
 private:
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/graphite-scop-detection.c	2017-11-06 12:40:40.284587650 +0000
@@ -1212,6 +1212,7 @@ scan_tree_for_params (sese_info_p s, tre
     case REAL_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
       break;
 
    default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/ipa-icf-gimple.c	2017-11-06 12:40:40.285580578 +0000
@@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
     case REAL_CST:
       {
@@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
     case REAL_CST:
     case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/ipa-icf.c	2017-11-06 12:40:40.285580578 +0000
@@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
     case STRING_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
       inchash::add_expr (exp, hstate);
       break;
     case CONSTRUCTOR:
@@ -2036,6 +2037,9 @@ sem_variable::equals (tree t1, tree t2)
 
 	return 1;
       }
+    case VEC_DUPLICATE_CST:
+      return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
+				   VEC_DUPLICATE_CST_ELT (t2));
     case ARRAY_REF:
     case ARRAY_RANGE_REF:
       {
Index: gcc/match.pd
===================================================================
--- gcc/match.pd	2017-11-06 12:40:39.845713389 +0000
+++ gcc/match.pd	2017-11-06 12:40:40.285580578 +0000
@@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match negate_expr_p
  VECTOR_CST
  (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
+(match negate_expr_p
+ VEC_DUPLICATE_CST
+ (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
 
 /* (-A) * (-B) -> A * B  */
 (simplify
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/print-tree.c	2017-11-06 12:40:40.287566435 +0000
@@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
 	  }
 	  break;
 
+	case VEC_DUPLICATE_CST:
+	  print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
+	  break;
+
 	case COMPLEX_CST:
 	  print_node (file, "real", TREE_REALPART (node), indent + 4);
 	  print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-chkp.c
===================================================================
--- gcc/tree-chkp.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-chkp.c	2017-11-06 12:40:40.288559363 +0000
@@ -3799,6 +3799,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
       if (integer_zerop (ptr_src))
 	bounds = chkp_get_none_bounds ();
       else
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-loop-distribution.c	2017-11-06 12:40:40.289552291 +0000
@@ -927,6 +927,9 @@ const_with_all_bytes_same (tree val)
           && CONSTRUCTOR_NELTS (val) == 0))
     return 0;
 
+  if (TREE_CODE (val) == VEC_DUPLICATE_CST)
+    return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
+
   if (real_zerop (val))
     {
       /* Only return 0 for +0.0, not for -0.0, which doesn't have
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-ssa-loop.c	2017-11-06 12:40:40.290545219 +0000
@@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
 	case STRING_CST:
 	case RESULT_DECL:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	case COMPLEX_CST:
 	case INTEGER_CST:
 	case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-ssa-pre.c	2017-11-06 12:40:40.290545219 +0000
@@ -2627,6 +2627,7 @@ create_component_ref_by_pieces_1 (basic_
     case INTEGER_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case REAL_CST:
     case CONSTRUCTOR:
     case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-ssa-sccvn.c	2017-11-06 12:40:40.291538147 +0000
@@ -866,6 +866,7 @@ copy_reference_ops_from_ref (tree ref, v
 	case INTEGER_CST:
 	case COMPLEX_CST:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	case REAL_CST:
 	case FIXED_CST:
 	case CONSTRUCTOR:
@@ -1058,6 +1059,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	case INTEGER_CST:
 	case COMPLEX_CST:
 	case VECTOR_CST:
+	case VEC_DUPLICATE_CST:
 	case REAL_CST:
 	case CONSTRUCTOR:
 	case CONST_DECL:
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/varasm.c	2017-11-06 12:40:40.293524004 +0000
@@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
     CASE_CONVERT:
       return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
 
+    case VEC_DUPLICATE_CST:
+      return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
+
     default:
       /* A language specific constant. Just hash the code.  */
       return code;
@@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
 	return 1;
       }
 
+    case VEC_DUPLICATE_CST:
+      return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
+			       VEC_DUPLICATE_CST_ELT (t2));
+
     case CONSTRUCTOR:
       {
 	vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/fold-const.c	2017-11-06 12:40:40.282601794 +0000
@@ -418,6 +418,9 @@ negate_expr_p (tree t)
 	return true;
       }
 
+    case VEC_DUPLICATE_CST:
+      return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
+
     case COMPLEX_EXPR:
       return negate_expr_p (TREE_OPERAND (t, 0))
 	     && negate_expr_p (TREE_OPERAND (t, 1));
@@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
 	return build_vector (type, elts);
       }
 
+    case VEC_DUPLICATE_CST:
+      {
+	tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
+	if (!sub)
+	  return NULL_TREE;
+	return build_vector_from_val (type, sub);
+      }
+
     case COMPLEX_EXPR:
       if (negate_expr_p (t))
 	return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
       return build_vector (type, elts);
     }
 
+  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+      && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
+    {
+      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
+			      VEC_DUPLICATE_CST_ELT (arg2));
+      if (!sub)
+	return NULL_TREE;
+      return build_vector_from_val (TREE_TYPE (arg1), sub);
+    }
+
   /* Shifts allow a scalar offset for a vector.  */
   if (TREE_CODE (arg1) == VECTOR_CST
       && TREE_CODE (arg2) == INTEGER_CST)
@@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
 
       return build_vector (type, elts);
     }
+
+  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+      && TREE_CODE (arg2) == INTEGER_CST)
+    {
+      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
+      if (!sub)
+	return NULL_TREE;
+      return build_vector_from_val (TREE_TYPE (arg1), sub);
+    }
   return NULL_TREE;
 }
 
@@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
 	  if (i == count)
 	    return build_vector (type, elements);
 	}
+      else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
+	{
+	  tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
+				 VEC_DUPLICATE_CST_ELT (arg0));
+	  if (sub)
+	    return build_vector_from_val (type, sub);
+	}
       break;
 
     case TRUTH_NOT_EXPR:
@@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
 	return res;
       }
 
+    case VEC_DUPLICATE_EXPR:
+      if (CONSTANT_CLASS_P (arg0))
+	return build_vector_from_val (type, arg0);
+      return NULL_TREE;
+
     default:
       break;
     }
@@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
 	    }
 	  return build_vector (type, v);
 	}
+      if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+	  && (TYPE_VECTOR_SUBPARTS (type)
+	      == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
+	{
+	  tree sub = fold_convert_const (code, TREE_TYPE (type),
+					 VEC_DUPLICATE_CST_ELT (arg1));
+	  if (sub)
+	    return build_vector_from_val (type, sub);
+	}
     }
   return NULL_TREE;
 }
@@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
 	  return 1;
 	}
 
+      case VEC_DUPLICATE_CST:
+	return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
+				VEC_DUPLICATE_CST_ELT (arg1), flags);
+
       case COMPLEX_CST:
 	return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
 				 flags)
@@ -7475,6 +7530,20 @@ can_native_interpret_type_p (tree type)
 static tree
 fold_view_convert_expr (tree type, tree expr)
 {
+  /* Recurse on duplicated vectors if the target type is also a vector
+     and if the elements line up.  */
+  tree expr_type = TREE_TYPE (expr);
+  if (TREE_CODE (expr) == VEC_DUPLICATE_CST
+      && VECTOR_TYPE_P (type)
+      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
+      && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
+    {
+      tree sub = fold_view_convert_expr (TREE_TYPE (type),
+					 VEC_DUPLICATE_CST_ELT (expr));
+      if (sub)
+	return build_vector_from_val (type, sub);
+    }
+
   /* We support up to 512-bit values (for V8DFmode).  */
   unsigned char buffer[64];
   int len;
@@ -8874,6 +8943,15 @@ exact_inverse (tree type, tree cst)
 	return build_vector (type, elts);
       }
 
+    case VEC_DUPLICATE_CST:
+      {
+	tree sub = exact_inverse (TREE_TYPE (type),
+				  VEC_DUPLICATE_CST_ELT (cst));
+	if (!sub)
+	  return NULL_TREE;
+	return build_vector_from_val (type, sub);
+      }
+
     default:
       return NULL_TREE;
     }
@@ -11939,6 +12017,9 @@ fold_checksum_tree (const_tree expr, str
 	  for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
 	    fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
 	  break;
+	case VEC_DUPLICATE_CST:
+	  fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
+	  break;
 	default:
 	  break;
 	}
@@ -14412,6 +14493,41 @@ test_vector_folding ()
   ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
 }
 
+/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
+
+static void
+test_vec_duplicate_folding ()
+{
+  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
+  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+  /* This will be 1 if VEC_MODE isn't a vector mode.  */
+  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+  tree type = build_vector_type (ssizetype, nunits);
+  tree dup5 = build_vector_from_val (type, ssize_int (5));
+  tree dup3 = build_vector_from_val (type, ssize_int (3));
+
+  tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
+  ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
+
+  tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
+  ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
+
+  tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
+  ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
+
+  tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
+  ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
+
+  tree size_vector = build_vector_type (sizetype, nunits);
+  tree size_dup5 = fold_convert (size_vector, dup5);
+  ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
+
+  tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
+  tree dup5_cst = build_vector_from_val (type, ssize_int (5));
+  ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -14419,6 +14535,7 @@ fold_const_c_tests ()
 {
   test_arithmetic_folding ();
   test_vector_folding ();
+  test_vec_duplicate_folding ();
 }
 
 } // namespace selftest
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs.def	2017-11-06 12:40:40.286573506 +0000
@@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
 
 OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
 OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
+
+OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs-tree.c	2017-11-06 12:40:40.286573506 +0000
@@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
       return TYPE_UNSIGNED (type) ?
 	vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
 
+    case VEC_DUPLICATE_EXPR:
+      return vec_duplicate_optab;
+
     default:
       break;
     }
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs.h	2017-11-06 12:40:40.287566435 +0000
@@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
 				  enum optab_methods methods);
 extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
 				enum optab_methods);
+extern rtx expand_vector_broadcast (machine_mode, rtx);
 
 /* Generate code for a simple binary or unary operation.  "Simple" in
    this case means "can be unambiguously described by a (mode, code)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs.c	2017-11-06 12:40:40.286573506 +0000
@@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
    mode of OP must be the element mode of VMODE.  If OP is a constant,
    then the return value will be a constant.  */
 
-static rtx
+rtx
 expand_vector_broadcast (machine_mode vmode, rtx op)
 {
   enum insn_code icode;
@@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
   if (valid_for_const_vec_duplicate_p (vmode, op))
     return gen_const_vec_duplicate (vmode, op);
 
+  icode = optab_handler (vec_duplicate_optab, vmode);
+  if (icode != CODE_FOR_nothing)
+    {
+      struct expand_operand ops[2];
+      create_output_operand (&ops[0], NULL_RTX, vmode);
+      create_input_operand (&ops[1], op, GET_MODE (op));
+      expand_insn (icode, 2, ops);
+      return ops[0].value;
+    }
+
   /* ??? If the target doesn't have a vec_init, then we have no easy way
      of performing this operation.  Most of this sort of generic support
      is hidden away in the vector lowering support in gimple.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/expr.c	2017-11-06 12:40:40.281608865 +0000
@@ -6576,7 +6576,8 @@ store_constructor (tree exp, rtx target,
 	constructor_elt *ce;
 	int i;
 	int need_to_clear;
-	int icode = CODE_FOR_nothing;
+	insn_code icode = CODE_FOR_nothing;
+	tree elt;
 	tree elttype = TREE_TYPE (type);
 	int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
 	machine_mode eltmode = TYPE_MODE (elttype);
@@ -6586,13 +6587,30 @@ store_constructor (tree exp, rtx target,
 	unsigned n_elts;
 	alias_set_type alias;
 	bool vec_vec_init_p = false;
+	machine_mode mode = GET_MODE (target);
 
 	gcc_assert (eltmode != BLKmode);
 
+	/* Try using vec_duplicate_optab for uniform vectors.  */
+	if (!TREE_SIDE_EFFECTS (exp)
+	    && VECTOR_MODE_P (mode)
+	    && eltmode == GET_MODE_INNER (mode)
+	    && ((icode = optab_handler (vec_duplicate_optab, mode))
+		!= CODE_FOR_nothing)
+	    && (elt = uniform_vector_p (exp)))
+	  {
+	    struct expand_operand ops[2];
+	    create_output_operand (&ops[0], target, mode);
+	    create_input_operand (&ops[1], expand_normal (elt), eltmode);
+	    expand_insn (icode, 2, ops);
+	    if (!rtx_equal_p (target, ops[0].value))
+	      emit_move_insn (target, ops[0].value);
+	    break;
+	  }
+
 	n_elts = TYPE_VECTOR_SUBPARTS (type);
-	if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
+	if (REG_P (target) && VECTOR_MODE_P (mode))
 	  {
-	    machine_mode mode = GET_MODE (target);
 	    machine_mode emode = eltmode;
 
 	    if (CONSTRUCTOR_NELTS (exp)
@@ -6604,7 +6622,7 @@ store_constructor (tree exp, rtx target,
 			    == n_elts);
 		emode = TYPE_MODE (etype);
 	      }
-	    icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
+	    icode = convert_optab_handler (vec_init_optab, mode, emode);
 	    if (icode != CODE_FOR_nothing)
 	      {
 		unsigned int i, n = n_elts;
@@ -6652,7 +6670,7 @@ store_constructor (tree exp, rtx target,
 	if (need_to_clear && size > 0 && !vector)
 	  {
 	    if (REG_P (target))
-	      emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+	      emit_move_insn (target, CONST0_RTX (mode));
 	    else
 	      clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
 	    cleared = 1;
@@ -6660,7 +6678,7 @@ store_constructor (tree exp, rtx target,
 
 	/* Inform later passes that the old value is dead.  */
 	if (!cleared && !vector && REG_P (target))
-	  emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+	  emit_move_insn (target, CONST0_RTX (mode));
 
         if (MEM_P (target))
 	  alias = MEM_ALIAS_SET (target);
@@ -6711,8 +6729,7 @@ store_constructor (tree exp, rtx target,
 
 	if (vector)
 	  emit_insn (GEN_FCN (icode) (target,
-				      gen_rtx_PARALLEL (GET_MODE (target),
-							vector)));
+				      gen_rtx_PARALLEL (mode, vector)));
 	break;
       }
 
@@ -7690,6 +7707,19 @@ expand_operands (tree exp0, tree exp1, r
 }
 
 \f
+/* Expand constant vector element ELT, which has mode MODE.  This is used
+   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
+
+static rtx
+const_vector_element (scalar_mode mode, const_tree elt)
+{
+  if (TREE_CODE (elt) == REAL_CST)
+    return const_double_from_real_value (TREE_REAL_CST (elt), mode);
+  if (TREE_CODE (elt) == FIXED_CST)
+    return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
+  return immed_wide_int_const (wi::to_wide (elt), mode);
+}
+
 /* Return a MEM that contains constant EXP.  DEFER is as for
    output_constant_def and MODIFIER is as for expand_expr.  */
 
@@ -9555,6 +9585,12 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
 
+    case VEC_DUPLICATE_EXPR:
+      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
+      target = expand_vector_broadcast (mode, op0);
+      gcc_assert (target);
+      return target;
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);
@@ -9988,6 +10024,11 @@ expand_expr_real_1 (tree exp, rtx target
 			    tmode, modifier);
       }
 
+    case VEC_DUPLICATE_CST:
+      op0 = const_vector_element (GET_MODE_INNER (mode),
+				  VEC_DUPLICATE_CST_ELT (exp));
+      return gen_const_vec_duplicate (mode, op0);
+
     case CONST_DECL:
       if (modifier == EXPAND_WRITE)
 	{
@@ -11749,8 +11790,7 @@ const_vector_from_tree (tree exp)
 {
   rtvec v;
   unsigned i, units;
-  tree elt;
-  machine_mode inner, mode;
+  machine_mode mode;
 
   mode = TYPE_MODE (TREE_TYPE (exp));
 
@@ -11761,23 +11801,12 @@ const_vector_from_tree (tree exp)
     return const_vector_mask_from_tree (exp);
 
   units = VECTOR_CST_NELTS (exp);
-  inner = GET_MODE_INNER (mode);
 
   v = rtvec_alloc (units);
 
   for (i = 0; i < units; ++i)
-    {
-      elt = VECTOR_CST_ELT (exp, i);
-
-      if (TREE_CODE (elt) == REAL_CST)
-	RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
-							 inner);
-      else if (TREE_CODE (elt) == FIXED_CST)
-	RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
-							 inner);
-      else
-	RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
-    }
+    RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
+					     VECTOR_CST_ELT (exp, i));
 
   return gen_rtx_CONST_VECTOR (mode, v);
 }
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/internal-fn.c	2017-11-06 12:40:40.284587650 +0000
@@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
       emit_move_insn (cntvar, const0_rtx);
       emit_label (loop_lab);
     }
-  if (TREE_CODE (arg0) != VECTOR_CST)
+  if (!CONSTANT_CLASS_P (arg0))
     {
       rtx arg0r = expand_normal (arg0);
       arg0 = make_tree (TREE_TYPE (arg0), arg0r);
     }
-  if (TREE_CODE (arg1) != VECTOR_CST)
+  if (!CONSTANT_CLASS_P (arg1))
     {
       rtx arg1r = expand_normal (arg1);
       arg1 = make_tree (TREE_TYPE (arg1), arg1r);
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-cfg.c	2017-11-06 12:40:40.287566435 +0000
@@ -3798,6 +3798,17 @@ verify_gimple_assign_unary (gassign *stm
     case CONJ_EXPR:
       break;
 
+    case VEC_DUPLICATE_EXPR:
+      if (TREE_CODE (lhs_type) != VECTOR_TYPE
+	  || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+	{
+	  error ("vec_duplicate should be from a scalar to a like vector");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  return true;
+	}
+      return false;
+
     default:
       gcc_unreachable ();
     }
@@ -4468,6 +4479,7 @@ verify_gimple_assign_single (gassign *st
     case FIXED_CST:
     case COMPLEX_CST:
     case VECTOR_CST:
+    case VEC_DUPLICATE_CST:
     case STRING_CST:
       return res;
 
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-inline.c	2017-11-06 12:40:40.289552291 +0000
@@ -3930,6 +3930,7 @@ estimate_operator_cost (enum tree_code c
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_DUPLICATE_EXPR:
 
       return 1;
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-10-26 12:43     ` Richard Biener
@ 2017-11-06 15:21       ` Richard Sandiford
  2017-11-07 10:38         ` Richard Biener
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-11-06 15:21 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>>> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
>>> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
>>> is a tcc_constant.
>>>
>>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>>> vectors.  This avoids the need to handle combinations of VECTOR_CST
>>> and VEC_SERIES_CST.
>>
>> Similar to the other patch can you document and verify that VEC_SERIES_CST
>> is only used on variable length vectors?

OK, done with the below, which also makes build_vec_series create
a VECTOR_CST for fixed-length vectors.  I also added some selftests.

>> Ok with that change.
>
> Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
> via setting step == 0?  I think you can do {1, 1, 1, 1... } + {1, 2,3
> ,4,5 } constant
> folding but you don't implement that.

That was done via vec_series_equivalent_p.

The problem with using VEC_SERIES with a step of zero is that we'd
then have to define VEC_SERIES for floats too (even in strict math
modes), but probably only for the special case of a zero step.
I think that'd end up being more complicated overall.

> Propagation can also turn
> VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
> into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).

VEC_SERIES_EXPR -> VEC_SERIES_CST/VECTOR_CST was done by const_binop.
And yeah, VEC_DUPLICATE_EXPR -> VEC_DUPLICATE_CST/VECTOR_CST was done 
by const_unop in the VEC_DUPLICATE patch.

Tested as before.  OK to install?

Thanks,
Richard


2017-11-06  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
	* doc/md.texi (vec_series@var{m}): Document.
	* tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
	* tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
	codes.
	(VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
	(build_vec_series): Declare.
	* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
	(add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
	(build_vec_series_cst, build_vec_series): New functions.
	* cfgexpand.c (expand_debug_expr): Handle the new codes.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
	* gimple-expr.h (is_gimple_constant): Likewise.
	* gimplify.c (gimplify_expr): Likewise.
	* graphite-scop-detection.c (scan_tree_for_params): Likewise.
	* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
	(func_checker::compare_operand): Likewise.
	* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
	* print-tree.c (print_node): Likewise.
	* tree-ssa-loop.c (for_each_index): Likewise.
	* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
	* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
	(ao_ref_init_from_vn_reference): Likewise.
	* varasm.c (const_hash_1, compare_constant): Likewise.
	* fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
	(fold_checksum_tree): Likewise.
	(vec_series_equivalent_p): New function.
	(const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
	(test_vec_series_folding): New function.
	(fold_const_c_tests): Call it.
	* expmed.c (make_tree): Handle VEC_SERIES.
	* gimple-pretty-print.c (dump_binary_rhs): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
	(expand_expr_real_2): Handle VEC_SERIES_EXPR.
	(expand_expr_real_1): Handle VEC_SERIES_CST.
	* optabs.def (vec_series_optab): New optab.
	* optabs.h (expand_vec_series_expr): Declare.
	* optabs.c (expand_vec_series_expr): New function.
	* optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
	* tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
	(verify_gimple_assign_single): Handle VEC_SERIES_CST.
	* tree-vect-generic.c (expand_vector_operations_1): Check that
	the operands also have vector type.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-11-06 12:20:31.075167123 +0000
+++ gcc/doc/generic.texi	2017-11-06 12:21:29.321209826 +0000
@@ -1037,6 +1037,7 @@ As this example indicates, the operands
 @tindex COMPLEX_CST
 @tindex VECTOR_CST
 @tindex VEC_DUPLICATE_CST
+@tindex VEC_SERIES_CST
 @tindex STRING_CST
 @findex TREE_STRING_LENGTH
 @findex TREE_STRING_POINTER
@@ -1098,6 +1099,18 @@ instead.  The scalar element value is gi
 @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
 element of a @code{VECTOR_CST}.
 
+@item VEC_SERIES_CST
+These nodes represent a vector constant in which element @var{i}
+has the value @samp{@var{base} + @var{i} * @var{step}}, for some
+constant @var{base} and @var{step}.  The value of @var{base} is
+given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
+given by @code{VEC_SERIES_CST_STEP}.
+
+At present only variable-length vectors use @code{VEC_SERIES_CST};
+constant-length vectors use @code{VECTOR_CST} instead.  The nodes
+are also restricted to integral types, in order to avoid specifying
+the rounding behavior for floating-point types.
+
 @item STRING_CST
 These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
 returns the length of the string, as an @code{int}.  The
@@ -1702,6 +1715,7 @@ a value from @code{enum annot_expr_kind}
 @node Vectors
 @subsection Vectors
 @tindex VEC_DUPLICATE_EXPR
+@tindex VEC_SERIES_EXPR
 @tindex VEC_LSHIFT_EXPR
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1721,6 +1735,14 @@ a value from @code{enum annot_expr_kind}
 This node has a single operand and represents a vector in which every
 element is equal to that operand.
 
+@item VEC_SERIES_EXPR
+This node represents a vector formed from a scalar base and step,
+given as the first and second operands respectively.  Element @var{i}
+of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
+
+This node is restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
 @item VEC_LSHIFT_EXPR
 @itemx VEC_RSHIFT_EXPR
 These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-11-06 12:20:31.076995065 +0000
+++ gcc/doc/md.texi	2017-11-06 12:21:29.322209826 +0000
@@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{vec_series@var{m}} instruction pattern
+@item @samp{vec_series@var{m}}
+Initialize vector output operand 0 so that element @var{i} is equal to
+operand 1 plus @var{i} times operand 2.  In other words, create a linear
+series whose base value is operand 1 and whose step is operand 2.
+
+The vector output has mode @var{m} and the scalar inputs have the mode
+appropriate for one element of @var{m}.  This pattern is not used for
+floating-point vectors, in order to avoid having to specify the
+rounding behavior for @var{i} > 1.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
 Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-11-06 12:20:31.098930366 +0000
+++ gcc/tree.def	2017-11-06 12:21:29.335209826 +0000
@@ -309,6 +309,12 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
    vectors; fixed-length vectors must use VECTOR_CST instead.  */
 DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
 
+/* Represents a vector constant in which element i is equal to
+   VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP.  This is only ever
+   used for variable-length vectors; fixed-length vectors must use
+   VECTOR_CST instead.  */
+DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
+
 /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
 DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
 
@@ -542,6 +548,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 /* Represents a vector in which every element is equal to operand 0.  */
 DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
 
+/* Vector series created from a start (base) value and a step.
+
+   A = VEC_SERIES_EXPR (B, C)
+
+   means
+
+   for (i = 0; i < N; i++)
+     A[i] = B + C * i;  */
+DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-11-06 12:20:31.099844337 +0000
+++ gcc/tree.h	2017-11-06 12:21:29.336209826 +0000
@@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
 #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
   (PTR_OR_REF_CHECK (NODE)->base.static_flag)
 
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
-   this means there was an overflow in folding.  */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
+   or VEC_SERES_CST, this means there was an overflow in folding.  */
 
 #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
 
@@ -1013,6 +1013,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
 #define VEC_DUPLICATE_CST_ELT(NODE) \
   (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
 
+/* In a VEC_SERIES_CST node.  */
+#define VEC_SERIES_CST_BASE(NODE) \
+  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
+#define VEC_SERIES_CST_STEP(NODE) \
+  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
+
 /* Define fields and accessors for some special-purpose tree nodes.  */
 
 #define IDENTIFIER_LENGTH(NODE) \
@@ -4017,6 +4023,7 @@ extern tree make_vector (unsigned CXX_ME
 extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
+extern tree build_vec_series (tree, tree, tree);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-11-06 12:20:31.098930366 +0000
+++ gcc/tree.c	2017-11-06 12:21:29.335209826 +0000
@@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
     case COMPLEX_CST:		return TS_COMPLEX;
     case VECTOR_CST:		return TS_VECTOR;
     case VEC_DUPLICATE_CST:	return TS_VECTOR;
+    case VEC_SERIES_CST:	return TS_VECTOR;
     case STRING_CST:		return TS_STRING;
       /* tcc_exceptional cases.  */
     case ERROR_MARK:		return TS_COMMON;
@@ -831,6 +832,7 @@ tree_code_size (enum tree_code code)
 	case COMPLEX_CST:	return sizeof (tree_complex);
 	case VECTOR_CST:	return sizeof (tree_vector);
 	case VEC_DUPLICATE_CST:	return sizeof (tree_vector);
+	case VEC_SERIES_CST:	return sizeof (tree_vector) + sizeof (tree);
 	case STRING_CST:	gcc_unreachable ();
 	default:
 	  gcc_checking_assert (code >= NUM_TREE_CODES);
@@ -895,6 +897,9 @@ tree_size (const_tree node)
     case VEC_DUPLICATE_CST:
       return sizeof (struct tree_vector);
 
+    case VEC_SERIES_CST:
+      return sizeof (struct tree_vector) + sizeof (tree);
+
     case STRING_CST:
       return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
 
@@ -1730,6 +1735,34 @@ build_vec_duplicate_cst (tree type, tree
   return t;
 }
 
+/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
+
+   Note that this function is only suitable for callers that specifically
+   need a VEC_SERIES_CST node.  Use build_vec_series to build a general
+   series vector from a general base and step.  */
+
+static tree
+build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
+{
+  /* Shouldn't be used until we have variable-length vectors.  */
+  gcc_unreachable ();
+
+  int length = sizeof (struct tree_vector) + sizeof (tree);
+
+  record_node_allocation_statistics (VEC_SERIES_CST, length);
+
+  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+  TREE_SET_CODE (t, VEC_SERIES_CST);
+  TREE_TYPE (t) = type;
+  t->base.u.nelts = 2;
+  VEC_SERIES_CST_BASE (t) = base;
+  VEC_SERIES_CST_STEP (t) = step;
+  TREE_CONSTANT (t) = 1;
+
+  return t;
+}
+
 /* Build a newly constructed VECTOR_CST node of length LEN.  */
 
 tree
@@ -1847,6 +1880,33 @@ build_vector_from_val (tree vectype, tre
     }
 }
 
+/* Build a vector series of type TYPE in which element I has the value
+   BASE + I * STEP.  The result is a constant if BASE and STEP are constant
+   and a VEC_SERIES_EXPR otherwise.  */
+
+tree
+build_vec_series (tree type, tree base, tree step)
+{
+  if (integer_zerop (step))
+    return build_vector_from_val (type, base);
+  if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
+    {
+      unsigned int nunits = TYPE_VECTOR_SUBPARTS (type);
+      if (0)
+	return build_vec_series_cst (type, base, step);
+
+      auto_vec<tree, 32> v (nunits);
+      v.quick_push (base);
+      for (unsigned int i = 1; i < nunits; ++i)
+	{
+	  base = const_binop (PLUS_EXPR, TREE_TYPE (base), base, step);
+	  v.quick_push (base);
+	}
+      return build_vector (type, v);
+    }
+  return build2 (VEC_SERIES_EXPR, type, base, step);
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
    calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
@@ -7162,6 +7222,10 @@ add_expr (const_tree t, inchash::hash &h
     case VEC_DUPLICATE_CST:
       inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
       return;
+    case VEC_SERIES_CST:
+      inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
+      inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
+      return;
     case SSA_NAME:
       /* We can just compare by pointer.  */
       hstate.add_hwi (SSA_NAME_VERSION (t));
@@ -11210,6 +11274,7 @@ #define WALK_SUBTREE_TAIL(NODE)				\
     case FIXED_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
     case BLOCK:
     case PLACEHOLDER_EXPR:
@@ -12502,6 +12567,15 @@ drop_tree_overflow (tree t)
       if (TREE_OVERFLOW (*elt))
 	*elt = drop_tree_overflow (*elt);
     }
+  if (TREE_CODE (t) == VEC_SERIES_CST)
+    {
+      tree *elt = &VEC_SERIES_CST_BASE (t);
+      if (TREE_OVERFLOW (*elt))
+	*elt = drop_tree_overflow (*elt);
+      elt = &VEC_SERIES_CST_STEP (t);
+      if (TREE_OVERFLOW (*elt))
+	*elt = drop_tree_overflow (*elt);
+    }
   return t;
 }
 
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-11-06 12:20:31.074253152 +0000
+++ gcc/cfgexpand.c	2017-11-06 12:21:29.321209826 +0000
@@ -5070,6 +5070,8 @@ expand_debug_expr (tree exp)
     case VEC_PERM_EXPR:
     case VEC_DUPLICATE_CST:
     case VEC_DUPLICATE_EXPR:
+    case VEC_SERIES_CST:
+    case VEC_SERIES_EXPR:
       return NULL;
 
     /* Misc codes.  */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-11-06 12:20:31.093446541 +0000
+++ gcc/tree-pretty-print.c	2017-11-06 12:21:29.333209826 +0000
@@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, ", ... }");
       break;
 
+    case VEC_SERIES_CST:
+      pp_string (pp, "{ ");
+      dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
+      pp_string (pp, ", +, ");
+      dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
+      pp_string (pp, "}");
+      break;
+
     case FUNCTION_TYPE:
     case METHOD_TYPE:
       dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
+    case VEC_SERIES_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c	2017-11-06 12:20:31.080650948 +0000
+++ gcc/dwarf2out.c	2017-11-06 12:21:29.325209826 +0000
@@ -18879,6 +18879,7 @@ rtl_for_decl_init (tree init, tree type)
 	  {
 	  case VECTOR_CST:
 	  case VEC_DUPLICATE_CST:
+	  case VEC_SERIES_CST:
 	    break;
 	  case CONSTRUCTOR:
 	    if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h	2017-11-06 12:20:31.087048745 +0000
+++ gcc/gimple-expr.h	2017-11-06 12:21:29.328209826 +0000
@@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
       return true;
 
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c	2017-11-06 12:20:31.088876686 +0000
+++ gcc/gimplify.c	2017-11-06 12:21:29.329209826 +0000
@@ -11508,6 +11508,7 @@ gimplify_expr (tree *expr_p, gimple_seq
 	case COMPLEX_CST:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	  /* Drop the overflow flag on constants, we do not want
 	     that in the GIMPLE IL.  */
 	  if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c	2017-11-06 12:20:31.088876686 +0000
+++ gcc/graphite-scop-detection.c	2017-11-06 12:21:29.329209826 +0000
@@ -1213,6 +1213,7 @@ scan_tree_for_params (sese_info_p s, tre
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
       break;
 
    default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c	2017-11-06 12:20:31.088876686 +0000
+++ gcc/ipa-icf-gimple.c	2017-11-06 12:21:29.329209826 +0000
@@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
     case REAL_CST:
       {
@@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
     case REAL_CST:
     case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c	2017-11-06 12:20:31.089790657 +0000
+++ gcc/ipa-icf.c	2017-11-06 12:21:29.330209826 +0000
@@ -1480,6 +1480,7 @@ sem_item::add_expr (const_tree exp, inch
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
       inchash::add_expr (exp, hstate);
       break;
     case CONSTRUCTOR:
@@ -2040,6 +2041,11 @@ sem_variable::equals (tree t1, tree t2)
     case VEC_DUPLICATE_CST:
       return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
 				   VEC_DUPLICATE_CST_ELT (t2));
+     case VEC_SERIES_CST:
+       return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
+				     VEC_SERIES_CST_BASE (t2))
+	       && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
+					VEC_SERIES_CST_STEP (t2)));
     case ARRAY_REF:
     case ARRAY_RANGE_REF:
       {
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c	2017-11-06 12:20:31.090704628 +0000
+++ gcc/print-tree.c	2017-11-06 12:21:29.331209826 +0000
@@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
 	  print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
 	  break;
 
+	case VEC_SERIES_CST:
+	  print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
+	  print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
+	  break;
+
 	case COMPLEX_CST:
 	  print_node (file, "real", TREE_REALPART (node), indent + 4);
 	  print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c	2017-11-06 12:20:31.093446541 +0000
+++ gcc/tree-ssa-loop.c	2017-11-06 12:21:29.333209826 +0000
@@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
 	case RESULT_DECL:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	case COMPLEX_CST:
 	case INTEGER_CST:
 	case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c	2017-11-06 12:20:31.093446541 +0000
+++ gcc/tree-ssa-pre.c	2017-11-06 12:21:29.333209826 +0000
@@ -2628,6 +2628,7 @@ create_component_ref_by_pieces_1 (basic_
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case REAL_CST:
     case CONSTRUCTOR:
     case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c	2017-11-06 12:20:31.094360512 +0000
+++ gcc/tree-ssa-sccvn.c	2017-11-06 12:21:29.334209826 +0000
@@ -867,6 +867,7 @@ copy_reference_ops_from_ref (tree ref, v
 	case COMPLEX_CST:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	case REAL_CST:
 	case FIXED_CST:
 	case CONSTRUCTOR:
@@ -1060,6 +1061,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
 	case COMPLEX_CST:
 	case VECTOR_CST:
 	case VEC_DUPLICATE_CST:
+	case VEC_SERIES_CST:
 	case REAL_CST:
 	case CONSTRUCTOR:
 	case CONST_DECL:
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c	2017-11-06 12:20:31.100758308 +0000
+++ gcc/varasm.c	2017-11-06 12:21:29.337209826 +0000
@@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
       return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
 	      + const_hash_1 (TREE_OPERAND (exp, 1)));
 
+    case VEC_SERIES_CST:
+      return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
+	      + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
+
     CASE_CONVERT:
       return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
 
@@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
       return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
 			       VEC_DUPLICATE_CST_ELT (t2));
 
+    case VEC_SERIES_CST:
+      return (compare_constant (VEC_SERIES_CST_BASE (t1),
+				VEC_SERIES_CST_BASE (t2))
+	      && compare_constant (VEC_SERIES_CST_STEP (t1),
+				   VEC_SERIES_CST_STEP (t2)));
+
     case CONSTRUCTOR:
       {
 	vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-11-06 12:20:31.087048745 +0000
+++ gcc/fold-const.c	2017-11-06 12:21:29.328209826 +0000
@@ -421,6 +421,10 @@ negate_expr_p (tree t)
     case VEC_DUPLICATE_CST:
       return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
 
+    case VEC_SERIES_CST:
+      return (negate_expr_p (VEC_SERIES_CST_BASE (t))
+	      && negate_expr_p (VEC_SERIES_CST_STEP (t)));
+
     case COMPLEX_EXPR:
       return negate_expr_p (TREE_OPERAND (t, 0))
 	     && negate_expr_p (TREE_OPERAND (t, 1));
@@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
 	return build_vector_from_val (type, sub);
       }
 
+    case VEC_SERIES_CST:
+      {
+	tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
+	if (!neg_base)
+	  return NULL_TREE;
+	tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
+	if (!neg_step)
+	  return NULL_TREE;
+	return build_vec_series (type, neg_base, neg_step);
+      }
+
     case COMPLEX_EXPR:
       if (negate_expr_p (t))
 	return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
   return int_const_binop_1 (code, arg1, arg2, 1);
 }
 
+/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
+   and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
+   The step will be zero for VEC_DUPLICATE_CST.  */
+
+static bool
+vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
+{
+  if (TREE_CODE (exp) == VEC_SERIES_CST)
+    {
+      *base_out = VEC_SERIES_CST_BASE (exp);
+      *step_out = VEC_SERIES_CST_STEP (exp);
+      return true;
+    }
+  if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
+    {
+      *base_out = VEC_DUPLICATE_CST_ELT (exp);
+      *step_out = build_zero_cst (TREE_TYPE (*base_out));
+      return true;
+    }
+  return false;
+}
+
 /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
    constant.  We assume ARG1 and ARG2 have the same data type, or at least
    are the same kind of constant and the same machine mode.  Return zero if
@@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
       return build_vector_from_val (TREE_TYPE (arg1), sub);
     }
 
+  tree base1, step1, base2, step2;
+  if ((code == PLUS_EXPR || code == MINUS_EXPR)
+      && vec_series_equivalent_p (arg1, &base1, &step1)
+      && vec_series_equivalent_p (arg2, &base2, &step2))
+    {
+      tree new_base = const_binop (code, base1, base2);
+      if (!new_base)
+	return NULL_TREE;
+      tree new_step = const_binop (code, step1, step2);
+      if (!new_step)
+	return NULL_TREE;
+      return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
+    }
+
   /* Shifts allow a scalar offset for a vector.  */
   if (TREE_CODE (arg1) == VECTOR_CST
       && TREE_CODE (arg2) == INTEGER_CST)
@@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
      result as argument put those cases that need it here.  */
   switch (code)
     {
+    case VEC_SERIES_EXPR:
+      if (CONSTANT_CLASS_P (arg1)
+	  && CONSTANT_CLASS_P (arg2))
+	return build_vec_series (type, arg1, arg2);
+      return NULL_TREE;
+
     case COMPLEX_EXPR:
       if ((TREE_CODE (arg1) == REAL_CST
 	   && TREE_CODE (arg2) == REAL_CST)
@@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
 	return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
 				VEC_DUPLICATE_CST_ELT (arg1), flags);
 
+      case VEC_SERIES_CST:
+	return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
+				 VEC_SERIES_CST_BASE (arg1), flags)
+		&& operand_equal_p (VEC_SERIES_CST_STEP (arg0),
+				    VEC_SERIES_CST_STEP (arg1), flags));
+
       case COMPLEX_CST:
 	return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
 				 flags)
@@ -12020,6 +12083,10 @@ fold_checksum_tree (const_tree expr, str
 	case VEC_DUPLICATE_CST:
 	  fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
 	  break;
+	case VEC_SERIES_CST:
+	  fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
+	  fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
+	  break;
 	default:
 	  break;
 	}
@@ -14528,6 +14595,54 @@ test_vec_duplicate_folding ()
   ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
 }
 
+/* Verify folding of VEC_SERIES_CSTs and VEC_SERIES_EXPRs.  */
+
+static void
+test_vec_series_folding ()
+{
+  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
+  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+  if (nunits == 1)
+    nunits = 4;
+
+  tree type = build_vector_type (ssizetype, nunits);
+  tree s5_4 = build_vec_series (type, ssize_int (5), ssize_int (4));
+  tree s3_9 = build_vec_series (type, ssize_int (3), ssize_int (9));
+
+  tree neg_s5_4_a = fold_unary (NEGATE_EXPR, type, s5_4);
+  tree neg_s5_4_b = build_vec_series (type, ssize_int (-5), ssize_int (-4));
+  ASSERT_TRUE (operand_equal_p (neg_s5_4_a, neg_s5_4_b, 0));
+
+  tree s8_s13_a = fold_binary (PLUS_EXPR, type, s5_4, s3_9);
+  tree s8_s13_b = build_vec_series (type, ssize_int (8), ssize_int (13));
+  ASSERT_TRUE (operand_equal_p (s8_s13_a, s8_s13_b, 0));
+
+  tree s2_m5_a = fold_binary (MINUS_EXPR, type, s5_4, s3_9);
+  tree s2_m5_b = build_vec_series (type, ssize_int (2), ssize_int (-5));
+  ASSERT_TRUE (operand_equal_p (s2_m5_a, s2_m5_b, 0));
+
+  tree s11 = build_vector_from_val (type, ssize_int (11));
+  tree s16_4_a = fold_binary (PLUS_EXPR, type, s5_4, s11);
+  tree s16_4_b = fold_binary (PLUS_EXPR, type, s11, s5_4);
+  tree s16_4_c = build_vec_series (type, ssize_int (16), ssize_int (4));
+  ASSERT_TRUE (operand_equal_p (s16_4_a, s16_4_c, 0));
+  ASSERT_TRUE (operand_equal_p (s16_4_b, s16_4_c, 0));
+
+  tree sm6_4_a = fold_binary (MINUS_EXPR, type, s5_4, s11);
+  tree sm6_4_b = build_vec_series (type, ssize_int (-6), ssize_int (4));
+  ASSERT_TRUE (operand_equal_p (sm6_4_a, sm6_4_b, 0));
+
+  tree s6_m4_a = fold_binary (MINUS_EXPR, type, s11, s5_4);
+  tree s6_m4_b = build_vec_series (type, ssize_int (6), ssize_int (-4));
+  ASSERT_TRUE (operand_equal_p (s6_m4_a, s6_m4_b, 0));
+
+  tree s5_4_expr = fold_binary (VEC_SERIES_EXPR, type,
+				ssize_int (5), ssize_int (4));
+  ASSERT_TRUE (operand_equal_p (s5_4_expr, s5_4, 0));
+  ASSERT_FALSE (operand_equal_p (s5_4_expr, s3_9, 0));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -14536,6 +14651,7 @@ fold_const_c_tests ()
   test_arithmetic_folding ();
   test_vector_folding ();
   test_vec_duplicate_folding ();
+  test_vec_series_folding ();
 }
 
 } // namespace selftest
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-11-06 12:20:31.081564919 +0000
+++ gcc/expmed.c	2017-11-06 12:21:29.325209826 +0000
@@ -5252,6 +5252,13 @@ make_tree (tree type, rtx x)
 	    tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
 	    return build_vector_from_val (type, elt_tree);
 	  }
+	if (GET_CODE (op) == VEC_SERIES)
+	  {
+	    tree itype = TREE_TYPE (type);
+	    tree base_tree = make_tree (itype, XEXP (op, 0));
+	    tree step_tree = make_tree (itype, XEXP (op, 1));
+	    return build_vec_series (type, base_tree, step_tree);
+	  }
 	return make_tree (type, op);
       }
 
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	2017-11-06 12:20:31.087048745 +0000
+++ gcc/gimple-pretty-print.c	2017-11-06 12:21:29.328209826 +0000
@@ -431,6 +431,7 @@ dump_binary_rhs (pretty_printer *buffer,
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_SERIES_EXPR:
       for (p = get_tree_code_name (code); *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-11-06 12:20:31.092532570 +0000
+++ gcc/tree-inline.c	2017-11-06 12:21:29.332209826 +0000
@@ -3931,6 +3931,7 @@ estimate_operator_cost (enum tree_code c
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_DUPLICATE_EXPR:
+    case VEC_SERIES_EXPR:
 
       return 1;
 
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-11-06 12:20:31.082478890 +0000
+++ gcc/expr.c	2017-11-06 12:21:29.326209826 +0000
@@ -7708,7 +7708,7 @@ expand_operands (tree exp0, tree exp1, r
 
 \f
 /* Expand constant vector element ELT, which has mode MODE.  This is used
-   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
+   for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST.  */
 
 static rtx
 const_vector_element (scalar_mode mode, const_tree elt)
@@ -9591,6 +9591,10 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       gcc_assert (target);
       return target;
 
+    case VEC_SERIES_EXPR:
+      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
+      return expand_vec_series_expr (mode, op0, op1, target);
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);
@@ -10029,6 +10033,13 @@ expand_expr_real_1 (tree exp, rtx target
 				  VEC_DUPLICATE_CST_ELT (exp));
       return gen_const_vec_duplicate (mode, op0);
 
+    case VEC_SERIES_CST:
+      op0 = const_vector_element (GET_MODE_INNER (mode),
+				  VEC_SERIES_CST_BASE (exp));
+      op1 = const_vector_element (GET_MODE_INNER (mode),
+				  VEC_SERIES_CST_STEP (exp));
+      return gen_const_vec_series (mode, op0, op1);
+
     case CONST_DECL:
       if (modifier == EXPAND_WRITE)
 	{
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-11-06 12:20:31.090704628 +0000
+++ gcc/optabs.def	2017-11-06 12:21:29.331209826 +0000
@@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
 OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
 
 OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
+OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-11-06 12:20:31.090704628 +0000
+++ gcc/optabs.h	2017-11-06 12:21:29.331209826 +0000
@@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
 
+/* Generate code for VEC_SERIES_EXPR.  */
+extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
+
 /* Generate code for MULT_HIGHPART_EXPR.  */
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-11-06 12:20:31.090704628 +0000
+++ gcc/optabs.c	2017-11-06 12:21:29.330209826 +0000
@@ -5703,6 +5703,27 @@ expand_vec_cond_expr (tree vec_cond_type
   return ops[0].value;
 }
 
+/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
+   Use TARGET for the result if nonnull and convenient.  */
+
+rtx
+expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
+{
+  struct expand_operand ops[3];
+  enum insn_code icode;
+  machine_mode emode = GET_MODE_INNER (vmode);
+
+  icode = direct_optab_handler (vec_series_optab, vmode);
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  create_output_operand (&ops[0], target, vmode);
+  create_input_operand (&ops[1], op0, emode);
+  create_input_operand (&ops[2], op1, emode);
+
+  expand_insn (icode, 3, ops);
+  return ops[0].value;
+}
+
 /* Generate insns for a vector comparison into a mask.  */
 
 rtx
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-11-06 12:20:31.089790657 +0000
+++ gcc/optabs-tree.c	2017-11-06 12:21:29.330209826 +0000
@@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
     case VEC_DUPLICATE_EXPR:
       return vec_duplicate_optab;
 
+    case VEC_SERIES_EXPR:
+      return vec_series_optab;
+
     default:
       break;
     }
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-11-06 12:20:31.091618599 +0000
+++ gcc/tree-cfg.c	2017-11-06 12:21:29.332209826 +0000
@@ -4114,6 +4114,23 @@ verify_gimple_assign_binary (gassign *st
       /* Continue with generic binary expression handling.  */
       break;
 
+    case VEC_SERIES_EXPR:
+      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
+	{
+	  error ("type mismatch in series expression");
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  return true;
+	}
+      if (TREE_CODE (lhs_type) != VECTOR_TYPE
+	  || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+	{
+	  error ("vector type expected in series expression");
+	  debug_generic_expr (lhs_type);
+	  return true;
+	}
+      return false;
+
     default:
       gcc_unreachable ();
     }
@@ -4480,6 +4497,7 @@ verify_gimple_assign_single (gassign *st
     case COMPLEX_CST:
     case VECTOR_CST:
     case VEC_DUPLICATE_CST:
+    case VEC_SERIES_CST:
     case STRING_CST:
       return res;
 
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-11-06 12:20:31.094360512 +0000
+++ gcc/tree-vect-generic.c	2017-11-06 12:21:29.334209826 +0000
@@ -1596,7 +1596,8 @@ expand_vector_operations_1 (gimple_stmt_
   if (rhs_class == GIMPLE_BINARY_RHS)
     rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (type) != VECTOR_TYPE)
+  if (!VECTOR_TYPE_P (type)
+      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
     return;
 
   /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-11-06 15:09     ` Richard Sandiford
@ 2017-11-07 10:37       ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-11-07 10:37 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Mon, Nov 6, 2017 at 4:09 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> SVE needs a way of broadcasting a scalar to a variable-length vector.
>>> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
>>> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
>>> be used for fixed-length vectors.  VEC_DUPLICATE_EXPR is the tree
>>> equivalent of the existing rtl code VEC_DUPLICATE.
>>>
>>> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
>>> to mark constant nodes, but in response to last year's RFC, Richard B.
>>> suggested it would be better to have separate codes for the constant
>>> and non-constant cases.  This allows VEC_DUPLICATE_EXPR to be treated
>>> as a normal unary operation and avoids the previous need for treating
>>> it as a GIMPLE_SINGLE_RHS.
>>>
>>> It might make sense to use VEC_DUPLICATE_CST for all duplicated
>>> vector constants, since it's a bit more compact than VECTOR_CST
>>> in that case, and is potentially more efficient to process.
>>> However, the nice thing about keeping it restricted to variable-length
>>> vectors is that there is then no need to handle combinations of
>>> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
>>> VECTOR_CST or never use it.
>>>
>>> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.
>>
>> Index: gcc/tree-vect-generic.c
>> ===================================================================
>> --- gcc/tree-vect-generic.c     2017-10-23 11:38:53.934094740 +0100
>> +++ gcc/tree-vect-generic.c     2017-10-23 11:41:51.773953100 +0100
>> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>>  ssa_uniform_vector_p (tree op)
>>   {
>>      if (TREE_CODE (op) == VECTOR_CST
>>      +      || TREE_CODE (op) == VEC_DUPLICATE_CST
>>             || TREE_CODE (op) == CONSTRUCTOR)
>>                  return uniform_vector_p (op);
>>
>> VEC_DUPLICATE_EXPR handling?
>
> Oops, yeah.  I could have sworn it was there at one time...
>
>> Looks like for VEC_DUPLICATE_CST it could directly return true.
>
> The function is a bit misnamed: it returns the duplicated tree value
> rather than a bool.
>
>> I didn't see uniform_vector_p being updated?
>
> That part was there FWIW (for tree.c).
>
>> Can you add verification to either verify_expr or build_vec_duplicate_cst
>> that the type is one of variable size?  And amend tree.def docs
>> accordingly.  Because otherwise we miss a lot of cases in constant
>> folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).
>
> OK, done in the patch below with a gcc_unreachable () bomb in
> build_vec_duplicate_cst, which becomes a gcc_assert when variable-length
> vectors are added.  This meant changing the selftests to use
> build_vector_from_val rather than build_vec_duplicate_cst,
> but to still get testing of VEC_DUPLICATE_*, we then need to use
> the target's preferred vector length instead of always using 4.
>
> Tested as before.  OK (given the slightly different selftests)?

Ok.  I'll leave the missed constant foldings to you to figure out.

Richard.

> Thanks,
> Richard
>
>
> 2017-11-06  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hawyard@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
>         (VEC_COND_EXPR): Add missing @tindex.
>         * doc/md.texi (vec_duplicate@var{m}): Document.
>         * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
>         * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
>         are used for VEC_DUPLICATE_CST as well.
>         (tree_vector): Access base.n.nelts directly.
>         * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
>         valid codes.
>         (VEC_DUPLICATE_CST_ELT): New macro.
>         * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>         (integer_zerop, integer_onep, integer_all_onesp, integer_truep)
>         (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
>         (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
>         (build_vec_duplicate_cst): New function.
>         (build_vector_from_val): Add stubbed-out handling of variable-length
>         vectors, using build_vec_duplicate_cst and VEC_DUPLICATE_EXPR.
>         (uniform_vector_p): Handle the new codes.
>         (test_vec_duplicate_predicates_int): New function.
>         (test_vec_duplicate_predicates_float): Likewise.
>         (test_vec_duplicate_predicates): Likewise.
>         (tree_c_tests): Call test_vec_duplicate_predicates.
>         * cfgexpand.c (expand_debug_expr): Handle the new codes.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
>         * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
>         * gimple-expr.h (is_gimple_constant): Likewise.
>         * gimplify.c (gimplify_expr): Likewise.
>         * graphite-isl-ast-to-gimple.c
>         (translate_isl_ast_to_gimple::is_constant): Likewise.
>         * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>         * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>         (func_checker::compare_operand): Likewise.
>         * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>         * match.pd (negate_expr_p): Likewise.
>         * print-tree.c (print_node): Likewise.
>         * tree-chkp.c (chkp_find_bounds_1): Likewise.
>         * tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
>         * tree-ssa-loop.c (for_each_index): Likewise.
>         * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>         * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>         (ao_ref_init_from_vn_reference): Likewise.
>         * varasm.c (const_hash_1, compare_constant): Likewise.
>         * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
>         (fold_convert_const, operand_equal_p, fold_view_convert_expr)
>         (exact_inverse, fold_checksum_tree): Likewise.
>         (const_unop): Likewise.  Fold VEC_DUPLICATE_EXPRs of a constant.
>         (test_vec_duplicate_folding): New function.
>         (fold_const_c_tests): Call it.
>         * optabs.def (vec_duplicate_optab): New optab.
>         * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
>         * optabs.h (expand_vector_broadcast): Declare.
>         * optabs.c (expand_vector_broadcast): Make non-static.  Try using
>         vec_duplicate_optab.
>         * expr.c (store_constructor): Try using vec_duplicate_optab for
>         uniform vectors.
>         (const_vector_element): New function, split out from...
>         (const_vector_from_tree): ...here.
>         (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
>         (expand_expr_real_1): Handle VEC_DUPLICATE_CST.
>         * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
>         instead of checking for VECTOR_CST.
>         * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
>         (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
>         * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/doc/generic.texi        2017-11-06 12:40:40.277637153 +0000
> @@ -1036,6 +1036,7 @@ As this example indicates, the operands
>  @tindex FIXED_CST
>  @tindex COMPLEX_CST
>  @tindex VECTOR_CST
> +@tindex VEC_DUPLICATE_CST
>  @tindex STRING_CST
>  @findex TREE_STRING_LENGTH
>  @findex TREE_STRING_POINTER
> @@ -1089,6 +1090,14 @@ constant nodes.  Each individual constan
>  double constant node.  The first operand is a @code{TREE_LIST} of the
>  constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
>
> +@item VEC_DUPLICATE_CST
> +These nodes represent a vector constant in which every element has the
> +same scalar value.  At present only variable-length vectors use
> +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
> +instead.  The scalar element value is given by
> +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
> +element of a @code{VECTOR_CST}.
> +
>  @item STRING_CST
>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>  returns the length of the string, as an @code{int}.  The
> @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
>
>  @node Vectors
>  @subsection Vectors
> +@tindex VEC_DUPLICATE_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
>  @tindex VEC_PACK_TRUNC_EXPR
>  @tindex VEC_PACK_SAT_EXPR
>  @tindex VEC_PACK_FIX_TRUNC_EXPR
> +@tindex VEC_COND_EXPR
>  @tindex SAD_EXPR
>
>  @table @code
> +@item VEC_DUPLICATE_EXPR
> +This node has a single operand and represents a vector in which every
> +element is equal to that operand.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/doc/md.texi     2017-11-06 12:40:40.278630081 +0000
> @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
>  the vector mode @var{m}, or a vector mode with the same element mode and
>  smaller number of elements.
>
> +@cindex @code{vec_duplicate@var{m}} instruction pattern
> +@item @samp{vec_duplicate@var{m}}
> +Initialize vector output operand 0 so that each element has the value given
> +by scalar input operand 1.  The vector has mode @var{m} and the scalar has
> +the mode appropriate for one element of @var{m}.
> +
> +This pattern only handles duplicates of non-constant inputs.  Constant
> +vectors go through the @code{mov@var{m}} pattern instead.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree.def        2017-11-06 12:40:40.292531076 +0000
> @@ -304,6 +304,11 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
>  /* Contents are in VECTOR_CST_ELTS field.  */
>  DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which every element is equal to
> +   VEC_DUPLICATE_CST_ELT.  This is only ever used for variable-length
> +   vectors; fixed-length vectors must use VECTOR_CST instead.  */
> +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
> +
>  /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
>  DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -534,6 +539,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
>     1 and 2 are NULL.  The operands are then taken from the cfg edges. */
>  DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Represents a vector in which every element is equal to operand 0.  */
> +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>
> Index: gcc/tree-core.h
> ===================================================================
> --- gcc/tree-core.h     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-core.h     2017-11-06 12:40:40.288559363 +0000
> @@ -975,7 +975,8 @@ struct GTY(()) tree_base {
>      /* VEC length.  This field is only used with TREE_VEC.  */
>      int length;
>
> -    /* Number of elements.  This field is only used with VECTOR_CST.  */
> +    /* Number of elements.  This field is only used with VECTOR_CST
> +       and VEC_DUPLICATE_CST.  It is always 1 for VEC_DUPLICATE_CST.  */
>      unsigned int nelts;
>
>      /* SSA version number.  This field is only used with SSA_NAME.  */
> @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
>     public_flag:
>
>         TREE_OVERFLOW in
> -           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
> +           INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
>
>         TREE_PUBLIC in
>             VAR_DECL, FUNCTION_DECL
> @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
>
>  struct GTY(()) tree_vector {
>    struct tree_typed typed;
> -  tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
> +  tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
>  };
>
>  struct GTY(()) tree_identifier {
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree.h  2017-11-06 12:40:40.293524004 +0000
> @@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
>  #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
>    (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
> -   there was an overflow in folding.  */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> +   this means there was an overflow in folding.  */
>
>  #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1009,6 +1009,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
>  #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
>  #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
>
> +/* In a VEC_DUPLICATE_CST node.  */
> +#define VEC_DUPLICATE_CST_ELT(NODE) \
> +  (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
> +
>  /* Define fields and accessors for some special-purpose tree nodes.  */
>
>  #define IDENTIFIER_LENGTH(NODE) \
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree.c  2017-11-06 12:40:40.292531076 +0000
> @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
>      case FIXED_CST:            return TS_FIXED_CST;
>      case COMPLEX_CST:          return TS_COMPLEX;
>      case VECTOR_CST:           return TS_VECTOR;
> +    case VEC_DUPLICATE_CST:    return TS_VECTOR;
>      case STRING_CST:           return TS_STRING;
>        /* tcc_exceptional cases.  */
>      case ERROR_MARK:           return TS_COMMON;
> @@ -829,6 +830,7 @@ tree_code_size (enum tree_code code)
>         case FIXED_CST:         return sizeof (tree_fixed_cst);
>         case COMPLEX_CST:       return sizeof (tree_complex);
>         case VECTOR_CST:        return sizeof (tree_vector);
> +       case VEC_DUPLICATE_CST: return sizeof (tree_vector);
>         case STRING_CST:        gcc_unreachable ();
>         default:
>           gcc_checking_assert (code >= NUM_TREE_CODES);
> @@ -890,6 +892,9 @@ tree_size (const_tree node)
>        return (sizeof (struct tree_vector)
>               + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
>
> +    case VEC_DUPLICATE_CST:
> +      return sizeof (struct tree_vector);
> +
>      case STRING_CST:
>        return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1697,6 +1702,34 @@ cst_and_fits_in_hwi (const_tree x)
>           && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
>  }
>
> +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
> +
> +   This function is only suitable for callers that know TYPE is a
> +   variable-length vector and specifically need a VEC_DUPLICATE_CST node.
> +   Use build_vector_from_val to duplicate a general scalar into a general
> +   vector type.  */
> +
> +static tree
> +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
> +{
> +  /* Shouldn't be used until we have variable-length vectors.  */
> +  gcc_unreachable ();
> +
> +  int length = sizeof (struct tree_vector);
> +
> +  record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
> +
> +  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> +  TREE_SET_CODE (t, VEC_DUPLICATE_CST);
> +  TREE_TYPE (t) = type;
> +  t->base.u.nelts = 1;
> +  VEC_DUPLICATE_CST_ELT (t) = exp;
> +  TREE_CONSTANT (t) = 1;
> +
> +  return t;
> +}
> +
>  /* Build a newly constructed VECTOR_CST node of length LEN.  */
>
>  tree
> @@ -1790,6 +1823,13 @@ build_vector_from_val (tree vectype, tre
>    gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)),
>                                            TREE_TYPE (vectype)));
>
> +  if (0)
> +    {
> +      if (CONSTANT_CLASS_P (sc))
> +       return build_vec_duplicate_cst (vectype, sc);
> +      return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
> +    }
> +
>    if (CONSTANT_CLASS_P (sc))
>      {
>        auto_vec<tree, 32> v (nunits);
> @@ -2358,6 +2398,8 @@ integer_zerop (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2384,6 +2426,8 @@ integer_onep (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2422,6 +2466,9 @@ integer_all_onesp (const_tree expr)
>        return 1;
>      }
>
> +  else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
> +    return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
> +
>    else if (TREE_CODE (expr) != INTEGER_CST)
>      return 0;
>
> @@ -2478,7 +2525,7 @@ integer_nonzerop (const_tree expr)
>  int
>  integer_truep (const_tree expr)
>  {
> -  if (TREE_CODE (expr) == VECTOR_CST)
> +  if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
>      return integer_all_onesp (expr);
>    return integer_onep (expr);
>  }
> @@ -2649,6 +2696,8 @@ real_zerop (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2677,6 +2726,8 @@ real_onep (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return real_onep (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -2704,6 +2755,8 @@ real_minus_onep (const_tree expr)
>             return false;
>         return true;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
>      default:
>        return false;
>      }
> @@ -7106,6 +7159,9 @@ add_expr (const_tree t, inchash::hash &h
>           inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
>         return;
>        }
> +    case VEC_DUPLICATE_CST:
> +      inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
> +      return;
>      case SSA_NAME:
>        /* We can just compare by pointer.  */
>        hstate.add_hwi (SSA_NAME_VERSION (t));
> @@ -10367,6 +10423,9 @@ initializer_zerop (const_tree init)
>         return true;
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
> +
>      case CONSTRUCTOR:
>        {
>         unsigned HOST_WIDE_INT idx;
> @@ -10412,7 +10471,13 @@ uniform_vector_p (const_tree vec)
>
>    gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
>
> -  if (TREE_CODE (vec) == VECTOR_CST)
> +  if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
> +    return VEC_DUPLICATE_CST_ELT (vec);
> +
> +  else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
> +    return TREE_OPERAND (vec, 0);
> +
> +  else if (TREE_CODE (vec) == VECTOR_CST)
>      {
>        first = VECTOR_CST_ELT (vec, 0);
>        for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
> @@ -11144,6 +11209,7 @@ #define WALK_SUBTREE_TAIL(NODE)                         \
>      case REAL_CST:
>      case FIXED_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>      case BLOCK:
>      case PLACEHOLDER_EXPR:
> @@ -12430,6 +12496,12 @@ drop_tree_overflow (tree t)
>             elt = drop_tree_overflow (elt);
>         }
>      }
> +  if (TREE_CODE (t) == VEC_DUPLICATE_CST)
> +    {
> +      tree *elt = &VEC_DUPLICATE_CST_ELT (t);
> +      if (TREE_OVERFLOW (*elt))
> +       *elt = drop_tree_overflow (*elt);
> +    }
>    return t;
>  }
>
> @@ -13850,6 +13922,102 @@ test_integer_constants ()
>    ASSERT_EQ (type, TREE_TYPE (zero));
>  }
>
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
> +   for integral type TYPE.  */
> +
> +static void
> +test_vec_duplicate_predicates_int (tree type)
> +{
> +  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type);
> +  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> +  /* This will be 1 if VEC_MODE isn't a vector mode.  */
> +  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> +  tree vec_type = build_vector_type (type, nunits);
> +
> +  tree zero = build_zero_cst (type);
> +  tree vec_zero = build_vector_from_val (vec_type, zero);
> +  ASSERT_TRUE (integer_zerop (vec_zero));
> +  ASSERT_FALSE (integer_onep (vec_zero));
> +  ASSERT_FALSE (integer_minus_onep (vec_zero));
> +  ASSERT_FALSE (integer_all_onesp (vec_zero));
> +  ASSERT_FALSE (integer_truep (vec_zero));
> +  ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> +  tree one = build_one_cst (type);
> +  tree vec_one = build_vector_from_val (vec_type, one);
> +  ASSERT_FALSE (integer_zerop (vec_one));
> +  ASSERT_TRUE (integer_onep (vec_one));
> +  ASSERT_FALSE (integer_minus_onep (vec_one));
> +  ASSERT_FALSE (integer_all_onesp (vec_one));
> +  ASSERT_FALSE (integer_truep (vec_one));
> +  ASSERT_FALSE (initializer_zerop (vec_one));
> +
> +  tree minus_one = build_minus_one_cst (type);
> +  tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
> +  ASSERT_FALSE (integer_zerop (vec_minus_one));
> +  ASSERT_FALSE (integer_onep (vec_minus_one));
> +  ASSERT_TRUE (integer_minus_onep (vec_minus_one));
> +  ASSERT_TRUE (integer_all_onesp (vec_minus_one));
> +  ASSERT_TRUE (integer_truep (vec_minus_one));
> +  ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> +  tree x = create_tmp_var_raw (type, "x");
> +  tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
> +  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> +  ASSERT_EQ (uniform_vector_p (vec_one), one);
> +  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> +  ASSERT_EQ (uniform_vector_p (vec_x), x);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
> +   type TYPE.  */
> +
> +static void
> +test_vec_duplicate_predicates_float (tree type)
> +{
> +  scalar_float_mode float_mode = SCALAR_FLOAT_TYPE_MODE (type);
> +  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (float_mode);
> +  /* This will be 1 if VEC_MODE isn't a vector mode.  */
> +  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> +  tree vec_type = build_vector_type (type, nunits);
> +
> +  tree zero = build_zero_cst (type);
> +  tree vec_zero = build_vector_from_val (vec_type, zero);
> +  ASSERT_TRUE (real_zerop (vec_zero));
> +  ASSERT_FALSE (real_onep (vec_zero));
> +  ASSERT_FALSE (real_minus_onep (vec_zero));
> +  ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> +  tree one = build_one_cst (type);
> +  tree vec_one = build_vector_from_val (vec_type, one);
> +  ASSERT_FALSE (real_zerop (vec_one));
> +  ASSERT_TRUE (real_onep (vec_one));
> +  ASSERT_FALSE (real_minus_onep (vec_one));
> +  ASSERT_FALSE (initializer_zerop (vec_one));
> +
> +  tree minus_one = build_minus_one_cst (type);
> +  tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
> +  ASSERT_FALSE (real_zerop (vec_minus_one));
> +  ASSERT_FALSE (real_onep (vec_minus_one));
> +  ASSERT_TRUE (real_minus_onep (vec_minus_one));
> +  ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> +  ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> +  ASSERT_EQ (uniform_vector_p (vec_one), one);
> +  ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
> +
> +static void
> +test_vec_duplicate_predicates ()
> +{
> +  test_vec_duplicate_predicates_int (integer_type_node);
> +  test_vec_duplicate_predicates_float (float_type_node);
> +}
> +
>  /* Verify identifiers.  */
>
>  static void
> @@ -13878,6 +14046,7 @@ test_labels ()
>  tree_c_tests ()
>  {
>    test_integer_constants ();
> +  test_vec_duplicate_predicates ();
>    test_identifiers ();
>    test_labels ();
>  }
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/cfgexpand.c     2017-11-06 12:40:40.276644225 +0000
> @@ -5068,6 +5068,8 @@ expand_debug_expr (tree exp)
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_PERM_EXPR:
> +    case VEC_DUPLICATE_CST:
> +    case VEC_DUPLICATE_EXPR:
>        return NULL;
>
>      /* Misc codes.  */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-pretty-print.c     2017-11-06 12:40:40.289552291 +0000
> @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
>        }
>        break;
>
> +    case VEC_DUPLICATE_CST:
> +      pp_string (pp, "{ ");
> +      dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
> +      pp_string (pp, ", ... }");
> +      break;
> +
>      case FUNCTION_TYPE:
>      case METHOD_TYPE:
>        dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, " > ");
>        break;
>
> +    case VEC_DUPLICATE_EXPR:
> +      pp_space (pp);
> +      for (str = get_tree_code_name (code); *str; str++)
> +       pp_character (pp, TOUPPER (*str));
> +      pp_string (pp, " < ");
> +      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> +      pp_string (pp, " > ");
> +      break;
> +
>      case VEC_UNPACK_HI_EXPR:
>        pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
>        dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-vect-generic.c     2017-11-06 12:40:40.291538147 +0000
> @@ -1419,6 +1419,8 @@ lower_vec_perm (gimple_stmt_iterator *gs
>  ssa_uniform_vector_p (tree op)
>  {
>    if (TREE_CODE (op) == VECTOR_CST
> +      || TREE_CODE (op) == VEC_DUPLICATE_CST
> +      || TREE_CODE (op) == VEC_DUPLICATE_EXPR
>        || TREE_CODE (op) == CONSTRUCTOR)
>      return uniform_vector_p (op);
>    if (TREE_CODE (op) == SSA_NAME)
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/dwarf2out.c     2017-11-06 12:40:40.280615937 +0000
> @@ -18878,6 +18878,7 @@ rtl_for_decl_init (tree init, tree type)
>         switch (TREE_CODE (init))
>           {
>           case VECTOR_CST:
> +         case VEC_DUPLICATE_CST:
>             break;
>           case CONSTRUCTOR:
>             if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h   2017-11-06 12:40:39.845713389 +0000
> +++ gcc/gimple-expr.h   2017-11-06 12:40:40.282601794 +0000
> @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
>      case FIXED_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>        return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2017-11-06 12:40:39.845713389 +0000
> +++ gcc/gimplify.c      2017-11-06 12:40:40.283594722 +0000
> @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>         case STRING_CST:
>         case COMPLEX_CST:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>           /* Drop the overflow flag on constants, we do not want
>              that in the GIMPLE IL.  */
>           if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-isl-ast-to-gimple.c
> ===================================================================
> --- gcc/graphite-isl-ast-to-gimple.c    2017-11-06 12:40:39.845713389 +0000
> +++ gcc/graphite-isl-ast-to-gimple.c    2017-11-06 12:40:40.284587650 +0000
> @@ -211,7 +211,8 @@ enum phi_node_kind
>      return TREE_CODE (op) == INTEGER_CST
>        || TREE_CODE (op) == REAL_CST
>        || TREE_CODE (op) == COMPLEX_CST
> -      || TREE_CODE (op) == VECTOR_CST;
> +      || TREE_CODE (op) == VECTOR_CST
> +      || TREE_CODE (op) == VEC_DUPLICATE_CST;
>    }
>
>  private:
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c       2017-11-06 12:40:39.845713389 +0000
> +++ gcc/graphite-scop-detection.c       2017-11-06 12:40:40.284587650 +0000
> @@ -1212,6 +1212,7 @@ scan_tree_for_params (sese_info_p s, tre
>      case REAL_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>        break;
>
>     default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/ipa-icf-gimple.c        2017-11-06 12:40:40.285580578 +0000
> @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>      case REAL_CST:
>        {
> @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>      case REAL_CST:
>      case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c       2017-11-06 12:40:39.845713389 +0000
> +++ gcc/ipa-icf.c       2017-11-06 12:40:40.285580578 +0000
> @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
>      case STRING_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>        inchash::add_expr (exp, hstate);
>        break;
>      case CONSTRUCTOR:
> @@ -2036,6 +2037,9 @@ sem_variable::equals (tree t1, tree t2)
>
>         return 1;
>        }
> +    case VEC_DUPLICATE_CST:
> +      return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
> +                                  VEC_DUPLICATE_CST_ELT (t2));
>      case ARRAY_REF:
>      case ARRAY_RANGE_REF:
>        {
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/match.pd        2017-11-06 12:40:40.285580578 +0000
> @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match negate_expr_p
>   VECTOR_CST
>   (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
> +(match negate_expr_p
> + VEC_DUPLICATE_CST
> + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
>
>  /* (-A) * (-B) -> A * B  */
>  (simplify
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c    2017-11-06 12:40:39.845713389 +0000
> +++ gcc/print-tree.c    2017-11-06 12:40:40.287566435 +0000
> @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
>           }
>           break;
>
> +       case VEC_DUPLICATE_CST:
> +         print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
> +         break;
> +
>         case COMPLEX_CST:
>           print_node (file, "real", TREE_REALPART (node), indent + 4);
>           print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-chkp.c
> ===================================================================
> --- gcc/tree-chkp.c     2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-chkp.c     2017-11-06 12:40:40.288559363 +0000
> @@ -3799,6 +3799,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>        if (integer_zerop (ptr_src))
>         bounds = chkp_get_none_bounds ();
>        else
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-loop-distribution.c        2017-11-06 12:40:40.289552291 +0000
> @@ -927,6 +927,9 @@ const_with_all_bytes_same (tree val)
>            && CONSTRUCTOR_NELTS (val) == 0))
>      return 0;
>
> +  if (TREE_CODE (val) == VEC_DUPLICATE_CST)
> +    return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
> +
>    if (real_zerop (val))
>      {
>        /* Only return 0 for +0.0, not for -0.0, which doesn't have
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-ssa-loop.c 2017-11-06 12:40:40.290545219 +0000
> @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
>         case STRING_CST:
>         case RESULT_DECL:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>         case COMPLEX_CST:
>         case INTEGER_CST:
>         case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c  2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-ssa-pre.c  2017-11-06 12:40:40.290545219 +0000
> @@ -2627,6 +2627,7 @@ create_component_ref_by_pieces_1 (basic_
>      case INTEGER_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case REAL_CST:
>      case CONSTRUCTOR:
>      case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-ssa-sccvn.c        2017-11-06 12:40:40.291538147 +0000
> @@ -866,6 +866,7 @@ copy_reference_ops_from_ref (tree ref, v
>         case INTEGER_CST:
>         case COMPLEX_CST:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>         case REAL_CST:
>         case FIXED_CST:
>         case CONSTRUCTOR:
> @@ -1058,6 +1059,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
>         case INTEGER_CST:
>         case COMPLEX_CST:
>         case VECTOR_CST:
> +       case VEC_DUPLICATE_CST:
>         case REAL_CST:
>         case CONSTRUCTOR:
>         case CONST_DECL:
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/varasm.c        2017-11-06 12:40:40.293524004 +0000
> @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
>      CASE_CONVERT:
>        return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> +    case VEC_DUPLICATE_CST:
> +      return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
> +
>      default:
>        /* A language specific constant. Just hash the code.  */
>        return code;
> @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
>         return 1;
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
> +                              VEC_DUPLICATE_CST_ELT (t2));
> +
>      case CONSTRUCTOR:
>        {
>         vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-11-06 12:40:39.845713389 +0000
> +++ gcc/fold-const.c    2017-11-06 12:40:40.282601794 +0000
> @@ -418,6 +418,9 @@ negate_expr_p (tree t)
>         return true;
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
> +
>      case COMPLEX_EXPR:
>        return negate_expr_p (TREE_OPERAND (t, 0))
>              && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
>         return build_vector (type, elts);
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      {
> +       tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
> +       if (!sub)
> +         return NULL_TREE;
> +       return build_vector_from_val (type, sub);
> +      }
> +
>      case COMPLEX_EXPR:
>        if (negate_expr_p (t))
>         return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
>        return build_vector (type, elts);
>      }
>
> +  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> +      && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
> +    {
> +      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
> +                             VEC_DUPLICATE_CST_ELT (arg2));
> +      if (!sub)
> +       return NULL_TREE;
> +      return build_vector_from_val (TREE_TYPE (arg1), sub);
> +    }
> +
>    /* Shifts allow a scalar offset for a vector.  */
>    if (TREE_CODE (arg1) == VECTOR_CST
>        && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
>
>        return build_vector (type, elts);
>      }
> +
> +  if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> +      && TREE_CODE (arg2) == INTEGER_CST)
> +    {
> +      tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
> +      if (!sub)
> +       return NULL_TREE;
> +      return build_vector_from_val (TREE_TYPE (arg1), sub);
> +    }
>    return NULL_TREE;
>  }
>
> @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
>           if (i == count)
>             return build_vector (type, elements);
>         }
> +      else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
> +       {
> +         tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
> +                                VEC_DUPLICATE_CST_ELT (arg0));
> +         if (sub)
> +           return build_vector_from_val (type, sub);
> +       }
>        break;
>
>      case TRUTH_NOT_EXPR:
> @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
>         return res;
>        }
>
> +    case VEC_DUPLICATE_EXPR:
> +      if (CONSTANT_CLASS_P (arg0))
> +       return build_vector_from_val (type, arg0);
> +      return NULL_TREE;
> +
>      default:
>        break;
>      }
> @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
>             }
>           return build_vector (type, v);
>         }
> +      if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> +         && (TYPE_VECTOR_SUBPARTS (type)
> +             == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
> +       {
> +         tree sub = fold_convert_const (code, TREE_TYPE (type),
> +                                        VEC_DUPLICATE_CST_ELT (arg1));
> +         if (sub)
> +           return build_vector_from_val (type, sub);
> +       }
>      }
>    return NULL_TREE;
>  }
> @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
>           return 1;
>         }
>
> +      case VEC_DUPLICATE_CST:
> +       return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
> +                               VEC_DUPLICATE_CST_ELT (arg1), flags);
> +
>        case COMPLEX_CST:
>         return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
>                                  flags)
> @@ -7475,6 +7530,20 @@ can_native_interpret_type_p (tree type)
>  static tree
>  fold_view_convert_expr (tree type, tree expr)
>  {
> +  /* Recurse on duplicated vectors if the target type is also a vector
> +     and if the elements line up.  */
> +  tree expr_type = TREE_TYPE (expr);
> +  if (TREE_CODE (expr) == VEC_DUPLICATE_CST
> +      && VECTOR_TYPE_P (type)
> +      && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
> +      && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
> +    {
> +      tree sub = fold_view_convert_expr (TREE_TYPE (type),
> +                                        VEC_DUPLICATE_CST_ELT (expr));
> +      if (sub)
> +       return build_vector_from_val (type, sub);
> +    }
> +
>    /* We support up to 512-bit values (for V8DFmode).  */
>    unsigned char buffer[64];
>    int len;
> @@ -8874,6 +8943,15 @@ exact_inverse (tree type, tree cst)
>         return build_vector (type, elts);
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      {
> +       tree sub = exact_inverse (TREE_TYPE (type),
> +                                 VEC_DUPLICATE_CST_ELT (cst));
> +       if (!sub)
> +         return NULL_TREE;
> +       return build_vector_from_val (type, sub);
> +      }
> +
>      default:
>        return NULL_TREE;
>      }
> @@ -11939,6 +12017,9 @@ fold_checksum_tree (const_tree expr, str
>           for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
>             fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
>           break;
> +       case VEC_DUPLICATE_CST:
> +         fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
> +         break;
>         default:
>           break;
>         }
> @@ -14412,6 +14493,41 @@ test_vector_folding ()
>    ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
>  }
>
> +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs.  */
> +
> +static void
> +test_vec_duplicate_folding ()
> +{
> +  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
> +  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> +  /* This will be 1 if VEC_MODE isn't a vector mode.  */
> +  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> +  tree type = build_vector_type (ssizetype, nunits);
> +  tree dup5 = build_vector_from_val (type, ssize_int (5));
> +  tree dup3 = build_vector_from_val (type, ssize_int (3));
> +
> +  tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
> +  ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
> +
> +  tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
> +  ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
> +
> +  tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
> +  ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
> +
> +  tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
> +  ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
> +
> +  tree size_vector = build_vector_type (sizetype, nunits);
> +  tree size_dup5 = fold_convert (size_vector, dup5);
> +  ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
> +
> +  tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
> +  tree dup5_cst = build_vector_from_val (type, ssize_int (5));
> +  ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
> +}
> +
>  /* Run all of the selftests within this file.  */
>
>  void
> @@ -14419,6 +14535,7 @@ fold_const_c_tests ()
>  {
>    test_arithmetic_folding ();
>    test_vector_folding ();
> +  test_vec_duplicate_folding ();
>  }
>
>  } // namespace selftest
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs.def      2017-11-06 12:40:40.286573506 +0000
> @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
>
>  OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
> +
> +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs-tree.c   2017-11-06 12:40:40.286573506 +0000
> @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
>        return TYPE_UNSIGNED (type) ?
>         vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
>
> +    case VEC_DUPLICATE_EXPR:
> +      return vec_duplicate_optab;
> +
>      default:
>        break;
>      }
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs.h        2017-11-06 12:40:40.287566435 +0000
> @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
>                                   enum optab_methods methods);
>  extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
>                                 enum optab_methods);
> +extern rtx expand_vector_broadcast (machine_mode, rtx);
>
>  /* Generate code for a simple binary or unary operation.  "Simple" in
>     this case means "can be unambiguously described by a (mode, code)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs.c        2017-11-06 12:40:40.286573506 +0000
> @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
>     mode of OP must be the element mode of VMODE.  If OP is a constant,
>     then the return value will be a constant.  */
>
> -static rtx
> +rtx
>  expand_vector_broadcast (machine_mode vmode, rtx op)
>  {
>    enum insn_code icode;
> @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
>    if (valid_for_const_vec_duplicate_p (vmode, op))
>      return gen_const_vec_duplicate (vmode, op);
>
> +  icode = optab_handler (vec_duplicate_optab, vmode);
> +  if (icode != CODE_FOR_nothing)
> +    {
> +      struct expand_operand ops[2];
> +      create_output_operand (&ops[0], NULL_RTX, vmode);
> +      create_input_operand (&ops[1], op, GET_MODE (op));
> +      expand_insn (icode, 2, ops);
> +      return ops[0].value;
> +    }
> +
>    /* ??? If the target doesn't have a vec_init, then we have no easy way
>       of performing this operation.  Most of this sort of generic support
>       is hidden away in the vector lowering support in gimple.  */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-11-06 12:40:39.845713389 +0000
> +++ gcc/expr.c  2017-11-06 12:40:40.281608865 +0000
> @@ -6576,7 +6576,8 @@ store_constructor (tree exp, rtx target,
>         constructor_elt *ce;
>         int i;
>         int need_to_clear;
> -       int icode = CODE_FOR_nothing;
> +       insn_code icode = CODE_FOR_nothing;
> +       tree elt;
>         tree elttype = TREE_TYPE (type);
>         int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
>         machine_mode eltmode = TYPE_MODE (elttype);
> @@ -6586,13 +6587,30 @@ store_constructor (tree exp, rtx target,
>         unsigned n_elts;
>         alias_set_type alias;
>         bool vec_vec_init_p = false;
> +       machine_mode mode = GET_MODE (target);
>
>         gcc_assert (eltmode != BLKmode);
>
> +       /* Try using vec_duplicate_optab for uniform vectors.  */
> +       if (!TREE_SIDE_EFFECTS (exp)
> +           && VECTOR_MODE_P (mode)
> +           && eltmode == GET_MODE_INNER (mode)
> +           && ((icode = optab_handler (vec_duplicate_optab, mode))
> +               != CODE_FOR_nothing)
> +           && (elt = uniform_vector_p (exp)))
> +         {
> +           struct expand_operand ops[2];
> +           create_output_operand (&ops[0], target, mode);
> +           create_input_operand (&ops[1], expand_normal (elt), eltmode);
> +           expand_insn (icode, 2, ops);
> +           if (!rtx_equal_p (target, ops[0].value))
> +             emit_move_insn (target, ops[0].value);
> +           break;
> +         }
> +
>         n_elts = TYPE_VECTOR_SUBPARTS (type);
> -       if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
> +       if (REG_P (target) && VECTOR_MODE_P (mode))
>           {
> -           machine_mode mode = GET_MODE (target);
>             machine_mode emode = eltmode;
>
>             if (CONSTRUCTOR_NELTS (exp)
> @@ -6604,7 +6622,7 @@ store_constructor (tree exp, rtx target,
>                             == n_elts);
>                 emode = TYPE_MODE (etype);
>               }
> -           icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
> +           icode = convert_optab_handler (vec_init_optab, mode, emode);
>             if (icode != CODE_FOR_nothing)
>               {
>                 unsigned int i, n = n_elts;
> @@ -6652,7 +6670,7 @@ store_constructor (tree exp, rtx target,
>         if (need_to_clear && size > 0 && !vector)
>           {
>             if (REG_P (target))
> -             emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> +             emit_move_insn (target, CONST0_RTX (mode));
>             else
>               clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
>             cleared = 1;
> @@ -6660,7 +6678,7 @@ store_constructor (tree exp, rtx target,
>
>         /* Inform later passes that the old value is dead.  */
>         if (!cleared && !vector && REG_P (target))
> -         emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> +         emit_move_insn (target, CONST0_RTX (mode));
>
>          if (MEM_P (target))
>           alias = MEM_ALIAS_SET (target);
> @@ -6711,8 +6729,7 @@ store_constructor (tree exp, rtx target,
>
>         if (vector)
>           emit_insn (GEN_FCN (icode) (target,
> -                                     gen_rtx_PARALLEL (GET_MODE (target),
> -                                                       vector)));
> +                                     gen_rtx_PARALLEL (mode, vector)));
>         break;
>        }
>
> @@ -7690,6 +7707,19 @@ expand_operands (tree exp0, tree exp1, r
>  }
>
>
> +/* Expand constant vector element ELT, which has mode MODE.  This is used
> +   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
> +
> +static rtx
> +const_vector_element (scalar_mode mode, const_tree elt)
> +{
> +  if (TREE_CODE (elt) == REAL_CST)
> +    return const_double_from_real_value (TREE_REAL_CST (elt), mode);
> +  if (TREE_CODE (elt) == FIXED_CST)
> +    return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
> +  return immed_wide_int_const (wi::to_wide (elt), mode);
> +}
> +
>  /* Return a MEM that contains constant EXP.  DEFER is as for
>     output_constant_def and MODIFIER is as for expand_expr.  */
>
> @@ -9555,6 +9585,12 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
>        return target;
>
> +    case VEC_DUPLICATE_EXPR:
> +      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
> +      target = expand_vector_broadcast (mode, op0);
> +      gcc_assert (target);
> +      return target;
> +
>      case BIT_INSERT_EXPR:
>        {
>         unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -9988,6 +10024,11 @@ expand_expr_real_1 (tree exp, rtx target
>                             tmode, modifier);
>        }
>
> +    case VEC_DUPLICATE_CST:
> +      op0 = const_vector_element (GET_MODE_INNER (mode),
> +                                 VEC_DUPLICATE_CST_ELT (exp));
> +      return gen_const_vec_duplicate (mode, op0);
> +
>      case CONST_DECL:
>        if (modifier == EXPAND_WRITE)
>         {
> @@ -11749,8 +11790,7 @@ const_vector_from_tree (tree exp)
>  {
>    rtvec v;
>    unsigned i, units;
> -  tree elt;
> -  machine_mode inner, mode;
> +  machine_mode mode;
>
>    mode = TYPE_MODE (TREE_TYPE (exp));
>
> @@ -11761,23 +11801,12 @@ const_vector_from_tree (tree exp)
>      return const_vector_mask_from_tree (exp);
>
>    units = VECTOR_CST_NELTS (exp);
> -  inner = GET_MODE_INNER (mode);
>
>    v = rtvec_alloc (units);
>
>    for (i = 0; i < units; ++i)
> -    {
> -      elt = VECTOR_CST_ELT (exp, i);
> -
> -      if (TREE_CODE (elt) == REAL_CST)
> -       RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
> -                                                        inner);
> -      else if (TREE_CODE (elt) == FIXED_CST)
> -       RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
> -                                                        inner);
> -      else
> -       RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
> -    }
> +    RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
> +                                            VECTOR_CST_ELT (exp, i));
>
>    return gen_rtx_CONST_VECTOR (mode, v);
>  }
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c   2017-11-06 12:40:39.845713389 +0000
> +++ gcc/internal-fn.c   2017-11-06 12:40:40.284587650 +0000
> @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
>        emit_move_insn (cntvar, const0_rtx);
>        emit_label (loop_lab);
>      }
> -  if (TREE_CODE (arg0) != VECTOR_CST)
> +  if (!CONSTANT_CLASS_P (arg0))
>      {
>        rtx arg0r = expand_normal (arg0);
>        arg0 = make_tree (TREE_TYPE (arg0), arg0r);
>      }
> -  if (TREE_CODE (arg1) != VECTOR_CST)
> +  if (!CONSTANT_CLASS_P (arg1))
>      {
>        rtx arg1r = expand_normal (arg1);
>        arg1 = make_tree (TREE_TYPE (arg1), arg1r);
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-cfg.c      2017-11-06 12:40:40.287566435 +0000
> @@ -3798,6 +3798,17 @@ verify_gimple_assign_unary (gassign *stm
>      case CONJ_EXPR:
>        break;
>
> +    case VEC_DUPLICATE_EXPR:
> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> +       {
> +         error ("vec_duplicate should be from a scalar to a like vector");
> +         debug_generic_expr (lhs_type);
> +         debug_generic_expr (rhs1_type);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> @@ -4468,6 +4479,7 @@ verify_gimple_assign_single (gassign *st
>      case FIXED_CST:
>      case COMPLEX_CST:
>      case VECTOR_CST:
> +    case VEC_DUPLICATE_CST:
>      case STRING_CST:
>        return res;
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-inline.c   2017-11-06 12:40:40.289552291 +0000
> @@ -3930,6 +3930,7 @@ estimate_operator_cost (enum tree_code c
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
> +    case VEC_DUPLICATE_EXPR:
>
>        return 1;
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-11-06 15:21       ` Richard Sandiford
@ 2017-11-07 10:38         ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-11-07 10:38 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Mon, Nov 6, 2017 at 4:21 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>>>> tree code equivalents of the VEC_SERIES rtx code.  VEC_SERIES_EXPR
>>>> is for non-constant inputs and is a normal tcc_binary.  VEC_SERIES_CST
>>>> is a tcc_constant.
>>>>
>>>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>>>> vectors.  This avoids the need to handle combinations of VECTOR_CST
>>>> and VEC_SERIES_CST.
>>>
>>> Similar to the other patch can you document and verify that VEC_SERIES_CST
>>> is only used on variable length vectors?
>
> OK, done with the below, which also makes build_vec_series create
> a VECTOR_CST for fixed-length vectors.  I also added some selftests.
>
>>> Ok with that change.
>>
>> Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
>> via setting step == 0?  I think you can do {1, 1, 1, 1... } + {1, 2,3
>> ,4,5 } constant
>> folding but you don't implement that.
>
> That was done via vec_series_equivalent_p.

Constant folding of VEC_DUPLICATE_CST + VEC_SERIES_CST?  Didn't see that.

> The problem with using VEC_SERIES with a step of zero is that we'd
> then have to define VEC_SERIES for floats too (even in strict math
> modes), but probably only for the special case of a zero step.
> I think that'd end up being more complicated overall.
>
>> Propagation can also turn
>> VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
>> into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).
>
> VEC_SERIES_EXPR -> VEC_SERIES_CST/VECTOR_CST was done by const_binop.

Ok, must have missed that.  Would be nice to add comments before the
"transform".

> And yeah, VEC_DUPLICATE_EXPR -> VEC_DUPLICATE_CST/VECTOR_CST was done
> by const_unop in the VEC_DUPLICATE patch.
>
> Tested as before.  OK to install?

Ok.

Thanks,
Richard.

> Thanks,
> Richard
>
>
> 2017-11-06  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
>         * doc/md.texi (vec_series@var{m}): Document.
>         * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
>         * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
>         codes.
>         (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
>         (build_vec_series): Declare.
>         * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>         (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
>         (build_vec_series_cst, build_vec_series): New functions.
>         * cfgexpand.c (expand_debug_expr): Handle the new codes.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
>         * gimple-expr.h (is_gimple_constant): Likewise.
>         * gimplify.c (gimplify_expr): Likewise.
>         * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>         * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>         (func_checker::compare_operand): Likewise.
>         * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>         * print-tree.c (print_node): Likewise.
>         * tree-ssa-loop.c (for_each_index): Likewise.
>         * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>         * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>         (ao_ref_init_from_vn_reference): Likewise.
>         * varasm.c (const_hash_1, compare_constant): Likewise.
>         * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
>         (fold_checksum_tree): Likewise.
>         (vec_series_equivalent_p): New function.
>         (const_binop): Use it.  Fold VEC_SERIES_EXPRs of constants.
>         (test_vec_series_folding): New function.
>         (fold_const_c_tests): Call it.
>         * expmed.c (make_tree): Handle VEC_SERIES.
>         * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
>         (expand_expr_real_2): Handle VEC_SERIES_EXPR.
>         (expand_expr_real_1): Handle VEC_SERIES_CST.
>         * optabs.def (vec_series_optab): New optab.
>         * optabs.h (expand_vec_series_expr): Declare.
>         * optabs.c (expand_vec_series_expr): New function.
>         * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
>         * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
>         (verify_gimple_assign_single): Handle VEC_SERIES_CST.
>         * tree-vect-generic.c (expand_vector_operations_1): Check that
>         the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi        2017-11-06 12:20:31.075167123 +0000
> +++ gcc/doc/generic.texi        2017-11-06 12:21:29.321209826 +0000
> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>  @tindex COMPLEX_CST
>  @tindex VECTOR_CST
>  @tindex VEC_DUPLICATE_CST
> +@tindex VEC_SERIES_CST
>  @tindex STRING_CST
>  @findex TREE_STRING_LENGTH
>  @findex TREE_STRING_POINTER
> @@ -1098,6 +1099,18 @@ instead.  The scalar element value is gi
>  @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>  element of a @code{VECTOR_CST}.
>
> +@item VEC_SERIES_CST
> +These nodes represent a vector constant in which element @var{i}
> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
> +constant @var{base} and @var{step}.  The value of @var{base} is
> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
> +given by @code{VEC_SERIES_CST_STEP}.
> +
> +At present only variable-length vectors use @code{VEC_SERIES_CST};
> +constant-length vectors use @code{VECTOR_CST} instead.  The nodes
> +are also restricted to integral types, in order to avoid specifying
> +the rounding behavior for floating-point types.
> +
>  @item STRING_CST
>  These nodes represent string-constants.  The @code{TREE_STRING_LENGTH}
>  returns the length of the string, as an @code{int}.  The
> @@ -1702,6 +1715,7 @@ a value from @code{enum annot_expr_kind}
>  @node Vectors
>  @subsection Vectors
>  @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1721,6 +1735,14 @@ a value from @code{enum annot_expr_kind}
>  This node has a single operand and represents a vector in which every
>  element is equal to that operand.
>
> +@item VEC_SERIES_EXPR
> +This node represents a vector formed from a scalar base and step,
> +given as the first and second operands respectively.  Element @var{i}
> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
> +
> +This node is restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-11-06 12:20:31.076995065 +0000
> +++ gcc/doc/md.texi     2017-11-06 12:21:29.322209826 +0000
> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{vec_series@var{m}} instruction pattern
> +@item @samp{vec_series@var{m}}
> +Initialize vector output operand 0 so that element @var{i} is equal to
> +operand 1 plus @var{i} times operand 2.  In other words, create a linear
> +series whose base value is operand 1 and whose step is operand 2.
> +
> +The vector output has mode @var{m} and the scalar inputs have the mode
> +appropriate for one element of @var{m}.  This pattern is not used for
> +floating-point vectors, in order to avoid having to specify the
> +rounding behavior for @var{i} > 1.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2017-11-06 12:20:31.098930366 +0000
> +++ gcc/tree.def        2017-11-06 12:21:29.335209826 +0000
> @@ -309,6 +309,12 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
>     vectors; fixed-length vectors must use VECTOR_CST instead.  */
>  DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which element i is equal to
> +   VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP.  This is only ever
> +   used for variable-length vectors; fixed-length vectors must use
> +   VECTOR_CST instead.  */
> +DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
> +
>  /* Contents are TREE_STRING_LENGTH and the actual contents of the string.  */
>  DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -542,6 +548,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
>  /* Represents a vector in which every element is equal to operand 0.  */
>  DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>
> +/* Vector series created from a start (base) value and a step.
> +
> +   A = VEC_SERIES_EXPR (B, C)
> +
> +   means
> +
> +   for (i = 0; i < N; i++)
> +     A[i] = B + C * i;  */
> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2017-11-06 12:20:31.099844337 +0000
> +++ gcc/tree.h  2017-11-06 12:21:29.336209826 +0000
> @@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
>  #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
>    (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> -   this means there was an overflow in folding.  */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
> +   or VEC_SERES_CST, this means there was an overflow in folding.  */
>
>  #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1013,6 +1013,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
>  #define VEC_DUPLICATE_CST_ELT(NODE) \
>    (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
>
> +/* In a VEC_SERIES_CST node.  */
> +#define VEC_SERIES_CST_BASE(NODE) \
> +  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
> +#define VEC_SERIES_CST_STEP(NODE) \
> +  (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
> +
>  /* Define fields and accessors for some special-purpose tree nodes.  */
>
>  #define IDENTIFIER_LENGTH(NODE) \
> @@ -4017,6 +4023,7 @@ extern tree make_vector (unsigned CXX_ME
>  extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
>  extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
>  extern tree build_vector_from_val (tree, tree);
> +extern tree build_vec_series (tree, tree, tree);
>  extern void recompute_constructor_flags (tree);
>  extern void verify_constructor_flags (tree);
>  extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-11-06 12:20:31.098930366 +0000
> +++ gcc/tree.c  2017-11-06 12:21:29.335209826 +0000
> @@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
>      case COMPLEX_CST:          return TS_COMPLEX;
>      case VECTOR_CST:           return TS_VECTOR;
>      case VEC_DUPLICATE_CST:    return TS_VECTOR;
> +    case VEC_SERIES_CST:       return TS_VECTOR;
>      case STRING_CST:           return TS_STRING;
>        /* tcc_exceptional cases.  */
>      case ERROR_MARK:           return TS_COMMON;
> @@ -831,6 +832,7 @@ tree_code_size (enum tree_code code)
>         case COMPLEX_CST:       return sizeof (tree_complex);
>         case VECTOR_CST:        return sizeof (tree_vector);
>         case VEC_DUPLICATE_CST: return sizeof (tree_vector);
> +       case VEC_SERIES_CST:    return sizeof (tree_vector) + sizeof (tree);
>         case STRING_CST:        gcc_unreachable ();
>         default:
>           gcc_checking_assert (code >= NUM_TREE_CODES);
> @@ -895,6 +897,9 @@ tree_size (const_tree node)
>      case VEC_DUPLICATE_CST:
>        return sizeof (struct tree_vector);
>
> +    case VEC_SERIES_CST:
> +      return sizeof (struct tree_vector) + sizeof (tree);
> +
>      case STRING_CST:
>        return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1730,6 +1735,34 @@ build_vec_duplicate_cst (tree type, tree
>    return t;
>  }
>
> +/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
> +
> +   Note that this function is only suitable for callers that specifically
> +   need a VEC_SERIES_CST node.  Use build_vec_series to build a general
> +   series vector from a general base and step.  */
> +
> +static tree
> +build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
> +{
> +  /* Shouldn't be used until we have variable-length vectors.  */
> +  gcc_unreachable ();
> +
> +  int length = sizeof (struct tree_vector) + sizeof (tree);
> +
> +  record_node_allocation_statistics (VEC_SERIES_CST, length);
> +
> +  tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> +  TREE_SET_CODE (t, VEC_SERIES_CST);
> +  TREE_TYPE (t) = type;
> +  t->base.u.nelts = 2;
> +  VEC_SERIES_CST_BASE (t) = base;
> +  VEC_SERIES_CST_STEP (t) = step;
> +  TREE_CONSTANT (t) = 1;
> +
> +  return t;
> +}
> +
>  /* Build a newly constructed VECTOR_CST node of length LEN.  */
>
>  tree
> @@ -1847,6 +1880,33 @@ build_vector_from_val (tree vectype, tre
>      }
>  }
>
> +/* Build a vector series of type TYPE in which element I has the value
> +   BASE + I * STEP.  The result is a constant if BASE and STEP are constant
> +   and a VEC_SERIES_EXPR otherwise.  */
> +
> +tree
> +build_vec_series (tree type, tree base, tree step)
> +{
> +  if (integer_zerop (step))
> +    return build_vector_from_val (type, base);
> +  if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
> +    {
> +      unsigned int nunits = TYPE_VECTOR_SUBPARTS (type);
> +      if (0)
> +       return build_vec_series_cst (type, base, step);
> +
> +      auto_vec<tree, 32> v (nunits);
> +      v.quick_push (base);
> +      for (unsigned int i = 1; i < nunits; ++i)
> +       {
> +         base = const_binop (PLUS_EXPR, TREE_TYPE (base), base, step);
> +         v.quick_push (base);
> +       }
> +      return build_vector (type, v);
> +    }
> +  return build2 (VEC_SERIES_EXPR, type, base, step);
> +}
> +
>  /* Something has messed with the elements of CONSTRUCTOR C after it was built;
>     calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
>
> @@ -7162,6 +7222,10 @@ add_expr (const_tree t, inchash::hash &h
>      case VEC_DUPLICATE_CST:
>        inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
>        return;
> +    case VEC_SERIES_CST:
> +      inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
> +      inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
> +      return;
>      case SSA_NAME:
>        /* We can just compare by pointer.  */
>        hstate.add_hwi (SSA_NAME_VERSION (t));
> @@ -11210,6 +11274,7 @@ #define WALK_SUBTREE_TAIL(NODE)                         \
>      case FIXED_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>      case BLOCK:
>      case PLACEHOLDER_EXPR:
> @@ -12502,6 +12567,15 @@ drop_tree_overflow (tree t)
>        if (TREE_OVERFLOW (*elt))
>         *elt = drop_tree_overflow (*elt);
>      }
> +  if (TREE_CODE (t) == VEC_SERIES_CST)
> +    {
> +      tree *elt = &VEC_SERIES_CST_BASE (t);
> +      if (TREE_OVERFLOW (*elt))
> +       *elt = drop_tree_overflow (*elt);
> +      elt = &VEC_SERIES_CST_STEP (t);
> +      if (TREE_OVERFLOW (*elt))
> +       *elt = drop_tree_overflow (*elt);
> +    }
>    return t;
>  }
>
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-11-06 12:20:31.074253152 +0000
> +++ gcc/cfgexpand.c     2017-11-06 12:21:29.321209826 +0000
> @@ -5070,6 +5070,8 @@ expand_debug_expr (tree exp)
>      case VEC_PERM_EXPR:
>      case VEC_DUPLICATE_CST:
>      case VEC_DUPLICATE_EXPR:
> +    case VEC_SERIES_CST:
> +    case VEC_SERIES_EXPR:
>        return NULL;
>
>      /* Misc codes.  */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2017-11-06 12:20:31.093446541 +0000
> +++ gcc/tree-pretty-print.c     2017-11-06 12:21:29.333209826 +0000
> @@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, ", ... }");
>        break;
>
> +    case VEC_SERIES_CST:
> +      pp_string (pp, "{ ");
> +      dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
> +      pp_string (pp, ", +, ");
> +      dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
> +      pp_string (pp, "}");
> +      break;
> +
>      case FUNCTION_TYPE:
>      case METHOD_TYPE:
>        dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, " > ");
>        break;
>
> +    case VEC_SERIES_EXPR:
>      case VEC_WIDEN_MULT_HI_EXPR:
>      case VEC_WIDEN_MULT_LO_EXPR:
>      case VEC_WIDEN_MULT_EVEN_EXPR:
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c     2017-11-06 12:20:31.080650948 +0000
> +++ gcc/dwarf2out.c     2017-11-06 12:21:29.325209826 +0000
> @@ -18879,6 +18879,7 @@ rtl_for_decl_init (tree init, tree type)
>           {
>           case VECTOR_CST:
>           case VEC_DUPLICATE_CST:
> +         case VEC_SERIES_CST:
>             break;
>           case CONSTRUCTOR:
>             if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h   2017-11-06 12:20:31.087048745 +0000
> +++ gcc/gimple-expr.h   2017-11-06 12:21:29.328209826 +0000
> @@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>        return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c      2017-11-06 12:20:31.088876686 +0000
> +++ gcc/gimplify.c      2017-11-06 12:21:29.329209826 +0000
> @@ -11508,6 +11508,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>         case COMPLEX_CST:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>           /* Drop the overflow flag on constants, we do not want
>              that in the GIMPLE IL.  */
>           if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c       2017-11-06 12:20:31.088876686 +0000
> +++ gcc/graphite-scop-detection.c       2017-11-06 12:21:29.329209826 +0000
> @@ -1213,6 +1213,7 @@ scan_tree_for_params (sese_info_p s, tre
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>        break;
>
>     default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c        2017-11-06 12:20:31.088876686 +0000
> +++ gcc/ipa-icf-gimple.c        2017-11-06 12:21:29.329209826 +0000
> @@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>      case REAL_CST:
>        {
> @@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>      case REAL_CST:
>      case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c       2017-11-06 12:20:31.089790657 +0000
> +++ gcc/ipa-icf.c       2017-11-06 12:21:29.330209826 +0000
> @@ -1480,6 +1480,7 @@ sem_item::add_expr (const_tree exp, inch
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>        inchash::add_expr (exp, hstate);
>        break;
>      case CONSTRUCTOR:
> @@ -2040,6 +2041,11 @@ sem_variable::equals (tree t1, tree t2)
>      case VEC_DUPLICATE_CST:
>        return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
>                                    VEC_DUPLICATE_CST_ELT (t2));
> +     case VEC_SERIES_CST:
> +       return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
> +                                    VEC_SERIES_CST_BASE (t2))
> +              && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
> +                                       VEC_SERIES_CST_STEP (t2)));
>      case ARRAY_REF:
>      case ARRAY_RANGE_REF:
>        {
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c    2017-11-06 12:20:31.090704628 +0000
> +++ gcc/print-tree.c    2017-11-06 12:21:29.331209826 +0000
> @@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
>           print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
>           break;
>
> +       case VEC_SERIES_CST:
> +         print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
> +         print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
> +         break;
> +
>         case COMPLEX_CST:
>           print_node (file, "real", TREE_REALPART (node), indent + 4);
>           print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-11-06 12:20:31.093446541 +0000
> +++ gcc/tree-ssa-loop.c 2017-11-06 12:21:29.333209826 +0000
> @@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
>         case RESULT_DECL:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>         case COMPLEX_CST:
>         case INTEGER_CST:
>         case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c  2017-11-06 12:20:31.093446541 +0000
> +++ gcc/tree-ssa-pre.c  2017-11-06 12:21:29.333209826 +0000
> @@ -2628,6 +2628,7 @@ create_component_ref_by_pieces_1 (basic_
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case REAL_CST:
>      case CONSTRUCTOR:
>      case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c        2017-11-06 12:20:31.094360512 +0000
> +++ gcc/tree-ssa-sccvn.c        2017-11-06 12:21:29.334209826 +0000
> @@ -867,6 +867,7 @@ copy_reference_ops_from_ref (tree ref, v
>         case COMPLEX_CST:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>         case REAL_CST:
>         case FIXED_CST:
>         case CONSTRUCTOR:
> @@ -1060,6 +1061,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
>         case COMPLEX_CST:
>         case VECTOR_CST:
>         case VEC_DUPLICATE_CST:
> +       case VEC_SERIES_CST:
>         case REAL_CST:
>         case CONSTRUCTOR:
>         case CONST_DECL:
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c        2017-11-06 12:20:31.100758308 +0000
> +++ gcc/varasm.c        2017-11-06 12:21:29.337209826 +0000
> @@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
>        return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
>               + const_hash_1 (TREE_OPERAND (exp, 1)));
>
> +    case VEC_SERIES_CST:
> +      return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
> +             + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
> +
>      CASE_CONVERT:
>        return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> @@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
>        return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
>                                VEC_DUPLICATE_CST_ELT (t2));
>
> +    case VEC_SERIES_CST:
> +      return (compare_constant (VEC_SERIES_CST_BASE (t1),
> +                               VEC_SERIES_CST_BASE (t2))
> +             && compare_constant (VEC_SERIES_CST_STEP (t1),
> +                                  VEC_SERIES_CST_STEP (t2)));
> +
>      case CONSTRUCTOR:
>        {
>         vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-11-06 12:20:31.087048745 +0000
> +++ gcc/fold-const.c    2017-11-06 12:21:29.328209826 +0000
> @@ -421,6 +421,10 @@ negate_expr_p (tree t)
>      case VEC_DUPLICATE_CST:
>        return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
>
> +    case VEC_SERIES_CST:
> +      return (negate_expr_p (VEC_SERIES_CST_BASE (t))
> +             && negate_expr_p (VEC_SERIES_CST_STEP (t)));
> +
>      case COMPLEX_EXPR:
>        return negate_expr_p (TREE_OPERAND (t, 0))
>              && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
>         return build_vector_from_val (type, sub);
>        }
>
> +    case VEC_SERIES_CST:
> +      {
> +       tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
> +       if (!neg_base)
> +         return NULL_TREE;
> +       tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
> +       if (!neg_step)
> +         return NULL_TREE;
> +       return build_vec_series (type, neg_base, neg_step);
> +      }
> +
>      case COMPLEX_EXPR:
>        if (negate_expr_p (t))
>         return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
>    return int_const_binop_1 (code, arg1, arg2, 1);
>  }
>
> +/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
> +   and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
> +   The step will be zero for VEC_DUPLICATE_CST.  */
> +
> +static bool
> +vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
> +{
> +  if (TREE_CODE (exp) == VEC_SERIES_CST)
> +    {
> +      *base_out = VEC_SERIES_CST_BASE (exp);
> +      *step_out = VEC_SERIES_CST_STEP (exp);
> +      return true;
> +    }
> +  if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
> +    {
> +      *base_out = VEC_DUPLICATE_CST_ELT (exp);
> +      *step_out = build_zero_cst (TREE_TYPE (*base_out));
> +      return true;
> +    }
> +  return false;
> +}
> +
>  /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
>     constant.  We assume ARG1 and ARG2 have the same data type, or at least
>     are the same kind of constant and the same machine mode.  Return zero if
> @@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
>        return build_vector_from_val (TREE_TYPE (arg1), sub);
>      }
>
> +  tree base1, step1, base2, step2;
> +  if ((code == PLUS_EXPR || code == MINUS_EXPR)
> +      && vec_series_equivalent_p (arg1, &base1, &step1)
> +      && vec_series_equivalent_p (arg2, &base2, &step2))
> +    {
> +      tree new_base = const_binop (code, base1, base2);
> +      if (!new_base)
> +       return NULL_TREE;
> +      tree new_step = const_binop (code, step1, step2);
> +      if (!new_step)
> +       return NULL_TREE;
> +      return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
> +    }
> +
>    /* Shifts allow a scalar offset for a vector.  */
>    if (TREE_CODE (arg1) == VECTOR_CST
>        && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
>       result as argument put those cases that need it here.  */
>    switch (code)
>      {
> +    case VEC_SERIES_EXPR:
> +      if (CONSTANT_CLASS_P (arg1)
> +         && CONSTANT_CLASS_P (arg2))
> +       return build_vec_series (type, arg1, arg2);
> +      return NULL_TREE;
> +
>      case COMPLEX_EXPR:
>        if ((TREE_CODE (arg1) == REAL_CST
>            && TREE_CODE (arg2) == REAL_CST)
> @@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
>         return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
>                                 VEC_DUPLICATE_CST_ELT (arg1), flags);
>
> +      case VEC_SERIES_CST:
> +       return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
> +                                VEC_SERIES_CST_BASE (arg1), flags)
> +               && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
> +                                   VEC_SERIES_CST_STEP (arg1), flags));
> +
>        case COMPLEX_CST:
>         return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
>                                  flags)
> @@ -12020,6 +12083,10 @@ fold_checksum_tree (const_tree expr, str
>         case VEC_DUPLICATE_CST:
>           fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
>           break;
> +       case VEC_SERIES_CST:
> +         fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
> +         fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
> +         break;
>         default:
>           break;
>         }
> @@ -14528,6 +14595,54 @@ test_vec_duplicate_folding ()
>    ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
>  }
>
> +/* Verify folding of VEC_SERIES_CSTs and VEC_SERIES_EXPRs.  */
> +
> +static void
> +test_vec_series_folding ()
> +{
> +  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
> +  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> +  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +  if (nunits == 1)
> +    nunits = 4;
> +
> +  tree type = build_vector_type (ssizetype, nunits);
> +  tree s5_4 = build_vec_series (type, ssize_int (5), ssize_int (4));
> +  tree s3_9 = build_vec_series (type, ssize_int (3), ssize_int (9));
> +
> +  tree neg_s5_4_a = fold_unary (NEGATE_EXPR, type, s5_4);
> +  tree neg_s5_4_b = build_vec_series (type, ssize_int (-5), ssize_int (-4));
> +  ASSERT_TRUE (operand_equal_p (neg_s5_4_a, neg_s5_4_b, 0));
> +
> +  tree s8_s13_a = fold_binary (PLUS_EXPR, type, s5_4, s3_9);
> +  tree s8_s13_b = build_vec_series (type, ssize_int (8), ssize_int (13));
> +  ASSERT_TRUE (operand_equal_p (s8_s13_a, s8_s13_b, 0));
> +
> +  tree s2_m5_a = fold_binary (MINUS_EXPR, type, s5_4, s3_9);
> +  tree s2_m5_b = build_vec_series (type, ssize_int (2), ssize_int (-5));
> +  ASSERT_TRUE (operand_equal_p (s2_m5_a, s2_m5_b, 0));
> +
> +  tree s11 = build_vector_from_val (type, ssize_int (11));
> +  tree s16_4_a = fold_binary (PLUS_EXPR, type, s5_4, s11);
> +  tree s16_4_b = fold_binary (PLUS_EXPR, type, s11, s5_4);
> +  tree s16_4_c = build_vec_series (type, ssize_int (16), ssize_int (4));
> +  ASSERT_TRUE (operand_equal_p (s16_4_a, s16_4_c, 0));
> +  ASSERT_TRUE (operand_equal_p (s16_4_b, s16_4_c, 0));
> +
> +  tree sm6_4_a = fold_binary (MINUS_EXPR, type, s5_4, s11);
> +  tree sm6_4_b = build_vec_series (type, ssize_int (-6), ssize_int (4));
> +  ASSERT_TRUE (operand_equal_p (sm6_4_a, sm6_4_b, 0));
> +
> +  tree s6_m4_a = fold_binary (MINUS_EXPR, type, s11, s5_4);
> +  tree s6_m4_b = build_vec_series (type, ssize_int (6), ssize_int (-4));
> +  ASSERT_TRUE (operand_equal_p (s6_m4_a, s6_m4_b, 0));
> +
> +  tree s5_4_expr = fold_binary (VEC_SERIES_EXPR, type,
> +                               ssize_int (5), ssize_int (4));
> +  ASSERT_TRUE (operand_equal_p (s5_4_expr, s5_4, 0));
> +  ASSERT_FALSE (operand_equal_p (s5_4_expr, s3_9, 0));
> +}
> +
>  /* Run all of the selftests within this file.  */
>
>  void
> @@ -14536,6 +14651,7 @@ fold_const_c_tests ()
>    test_arithmetic_folding ();
>    test_vector_folding ();
>    test_vec_duplicate_folding ();
> +  test_vec_series_folding ();
>  }
>
>  } // namespace selftest
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c        2017-11-06 12:20:31.081564919 +0000
> +++ gcc/expmed.c        2017-11-06 12:21:29.325209826 +0000
> @@ -5252,6 +5252,13 @@ make_tree (tree type, rtx x)
>             tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
>             return build_vector_from_val (type, elt_tree);
>           }
> +       if (GET_CODE (op) == VEC_SERIES)
> +         {
> +           tree itype = TREE_TYPE (type);
> +           tree base_tree = make_tree (itype, XEXP (op, 0));
> +           tree step_tree = make_tree (itype, XEXP (op, 1));
> +           return build_vec_series (type, base_tree, step_tree);
> +         }
>         return make_tree (type, op);
>        }
>
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c   2017-11-06 12:20:31.087048745 +0000
> +++ gcc/gimple-pretty-print.c   2017-11-06 12:21:29.328209826 +0000
> @@ -431,6 +431,7 @@ dump_binary_rhs (pretty_printer *buffer,
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
> +    case VEC_SERIES_EXPR:
>        for (p = get_tree_code_name (code); *p; p++)
>         pp_character (buffer, TOUPPER (*p));
>        pp_string (buffer, " <");
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2017-11-06 12:20:31.092532570 +0000
> +++ gcc/tree-inline.c   2017-11-06 12:21:29.332209826 +0000
> @@ -3931,6 +3931,7 @@ estimate_operator_cost (enum tree_code c
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_DUPLICATE_EXPR:
> +    case VEC_SERIES_EXPR:
>
>        return 1;
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-11-06 12:20:31.082478890 +0000
> +++ gcc/expr.c  2017-11-06 12:21:29.326209826 +0000
> @@ -7708,7 +7708,7 @@ expand_operands (tree exp0, tree exp1, r
>
>
>  /* Expand constant vector element ELT, which has mode MODE.  This is used
> -   for members of VECTOR_CST and VEC_DUPLICATE_CST.  */
> +   for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST.  */
>
>  static rtx
>  const_vector_element (scalar_mode mode, const_tree elt)
> @@ -9591,6 +9591,10 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        gcc_assert (target);
>        return target;
>
> +    case VEC_SERIES_EXPR:
> +      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
> +      return expand_vec_series_expr (mode, op0, op1, target);
> +
>      case BIT_INSERT_EXPR:
>        {
>         unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -10029,6 +10033,13 @@ expand_expr_real_1 (tree exp, rtx target
>                                   VEC_DUPLICATE_CST_ELT (exp));
>        return gen_const_vec_duplicate (mode, op0);
>
> +    case VEC_SERIES_CST:
> +      op0 = const_vector_element (GET_MODE_INNER (mode),
> +                                 VEC_SERIES_CST_BASE (exp));
> +      op1 = const_vector_element (GET_MODE_INNER (mode),
> +                                 VEC_SERIES_CST_STEP (exp));
> +      return gen_const_vec_series (mode, op0, op1);
> +
>      case CONST_DECL:
>        if (modifier == EXPAND_WRITE)
>         {
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-11-06 12:20:31.090704628 +0000
> +++ gcc/optabs.def      2017-11-06 12:21:29.331209826 +0000
> @@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>
>  OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-11-06 12:20:31.090704628 +0000
> +++ gcc/optabs.h        2017-11-06 12:21:29.331209826 +0000
> @@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
>  /* Generate code for VEC_COND_EXPR.  */
>  extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>
> +/* Generate code for VEC_SERIES_EXPR.  */
> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
> +
>  /* Generate code for MULT_HIGHPART_EXPR.  */
>  extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-11-06 12:20:31.090704628 +0000
> +++ gcc/optabs.c        2017-11-06 12:21:29.330209826 +0000
> @@ -5703,6 +5703,27 @@ expand_vec_cond_expr (tree vec_cond_type
>    return ops[0].value;
>  }
>
> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
> +   Use TARGET for the result if nonnull and convenient.  */
> +
> +rtx
> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
> +{
> +  struct expand_operand ops[3];
> +  enum insn_code icode;
> +  machine_mode emode = GET_MODE_INNER (vmode);
> +
> +  icode = direct_optab_handler (vec_series_optab, vmode);
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  create_output_operand (&ops[0], target, vmode);
> +  create_input_operand (&ops[1], op0, emode);
> +  create_input_operand (&ops[2], op1, emode);
> +
> +  expand_insn (icode, 3, ops);
> +  return ops[0].value;
> +}
> +
>  /* Generate insns for a vector comparison into a mask.  */
>
>  rtx
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2017-11-06 12:20:31.089790657 +0000
> +++ gcc/optabs-tree.c   2017-11-06 12:21:29.330209826 +0000
> @@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
>      case VEC_DUPLICATE_EXPR:
>        return vec_duplicate_optab;
>
> +    case VEC_SERIES_EXPR:
> +      return vec_series_optab;
> +
>      default:
>        break;
>      }
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-11-06 12:20:31.091618599 +0000
> +++ gcc/tree-cfg.c      2017-11-06 12:21:29.332209826 +0000
> @@ -4114,6 +4114,23 @@ verify_gimple_assign_binary (gassign *st
>        /* Continue with generic binary expression handling.  */
>        break;
>
> +    case VEC_SERIES_EXPR:
> +      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
> +       {
> +         error ("type mismatch in series expression");
> +         debug_generic_expr (rhs1_type);
> +         debug_generic_expr (rhs2_type);
> +         return true;
> +       }
> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> +       {
> +         error ("vector type expected in series expression");
> +         debug_generic_expr (lhs_type);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> @@ -4480,6 +4497,7 @@ verify_gimple_assign_single (gassign *st
>      case COMPLEX_CST:
>      case VECTOR_CST:
>      case VEC_DUPLICATE_CST:
> +    case VEC_SERIES_CST:
>      case STRING_CST:
>        return res;
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-11-06 12:20:31.094360512 +0000
> +++ gcc/tree-vect-generic.c     2017-11-06 12:21:29.334209826 +0000
> @@ -1596,7 +1596,8 @@ expand_vector_operations_1 (gimple_stmt_
>    if (rhs_class == GIMPLE_BINARY_RHS)
>      rhs2 = gimple_assign_rhs2 (stmt);
>
> -  if (TREE_CODE (type) != VECTOR_TYPE)
> +  if (!VECTOR_TYPE_P (type)
> +      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
>      return;
>
>    /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [02/nn] Add more vec_duplicate simplifications
  2017-10-25 16:35   ` Jeff Law
@ 2017-11-10  9:42     ` Christophe Lyon
  0 siblings, 0 replies; 90+ messages in thread
From: Christophe Lyon @ 2017-11-10  9:42 UTC (permalink / raw)
  To: Jeff Law; +Cc: gcc-patches, Richard Sandiford

On 25 October 2017 at 18:29, Jeff Law <law@redhat.com> wrote:
> On 10/23/2017 05:17 AM, Richard Sandiford wrote:
>> This patch adds a vec_duplicate_p helper that tests for constant
>> or non-constant vector duplicates.  Together with the existing
>> const_vec_duplicate_p, this complements the gen_vec_duplicate
>> and gen_const_vec_duplicate added by a previous patch.
>>
>> The patch uses the new routines to add more rtx simplifications
>> involving vector duplicates.  These mirror simplifications that
>> we already do for CONST_VECTOR broadcasts and are needed for
>> variable-length SVE, which uses:
>>
>>   (const:M (vec_duplicate:M X))
>>
>> to represent constant broadcasts instead.  The simplifications do
>> trigger on the testsuite for variable duplicates too, and in each
>> case I saw the change was an improvement.  E.g.:
>>
> [ snip ]
>
>>
>> The best way of testing the new simplifications seemed to be
>> via selftests.  The patch cribs part of David's patch here:
>> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
> Cool.  I really wish I had more time to promote David's work by adding
> selftests to various things.  There's certainly cases where it's the
> most direct and useful way to test certain bits of lower level
> infrastructure we have.  Glad to see you found it useful here.
>
>
>
>>
>>
>> 2017-10-23  Richard Sandiford  <richard.sandiford@linaro.org>
>>           David Malcolm  <dmalcolm@redhat.com>
>>           Alan Hayward  <alan.hayward@arm.com>
>>           David Sherwood  <david.sherwood@arm.com>
>>
>> gcc/
>>       * rtl.h (vec_duplicate_p): New function.
>>       * selftest-rtl.c (assert_rtx_eq_at): New function.
>>       * selftest-rtl.h (ASSERT_RTX_EQ): New macro.
>>       (assert_rtx_eq_at): Declare.
>>       * selftest.h (selftest::simplify_rtx_c_tests): Declare.
>>       * selftest-run-tests.c (selftest::run_tests): Call it.
>>       * simplify-rtx.c: Include selftest.h and selftest-rtl.h.
>>       (simplify_unary_operation_1): Recursively handle vector duplicates.
>>       (simplify_binary_operation_1): Likewise.  Handle VEC_SELECTs of
>>       vector duplicates.
>>       (simplify_subreg): Handle subregs of vector duplicates.
>>       (make_test_reg, test_vector_ops_duplicate, test_vector_ops)
>>       (selftest::simplify_rtx_c_tests): New functions.

Hi Richard,

I've noticed that this patch (r254294) causes
FAIL: gcc.dg/vect/vect-126.c (internal compiler error)
FAIL: gcc.dg/vect/vect-126.c -flto -ffat-lto-objects (internal compiler error)
on arm* targets.
Sorry if this has been reported before, I've restarted validations
only recently,
so the process is still catching up.

gcc.log has this:
spawn -ignore SIGHUP
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/
/gcc/testsuite/gcc.dg/vect/vect-126.c -fno-diagnostics-show-caret
-fdiagnostics-color=never -ffast-math -ftree-vectorize
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o
vect-126.s
during RTL pass: combine
/gcc/testsuite/gcc.dg/vect/vect-126.c: In function 'f5':
/gcc/testsuite/gcc.dg/vect/vect-126.c:53:1: internal compiler error:
in neon_valid_immediate, at config/arm/arm.c:11850
0xf3e6c8 neon_valid_immediate
        /gcc/config/arm/arm.c:11850
0xf3ea9a neon_immediate_valid_for_move(rtx_def*, machine_mode, rtx_def**, int*)
        /gcc/config/arm/arm.c:11968
0xf40a20 arm_rtx_costs_internal
        /gcc/config/arm/arm.c:10695
0xf40a20 arm_rtx_costs
        /gcc/config/arm/arm.c:10946
0xb113ef rtx_cost(rtx_def*, machine_mode, rtx_code, int, bool)
        /gcc/rtlanal.c:4187
0xb1169f set_src_cost
        /gcc/rtl.h:2700
0xb1169f pattern_cost(rtx_def*, bool)
        /gcc/rtlanal.c:5315
0x128bb3b combine_validate_cost
        /gcc/combine.c:893
0x128bb3b try_combine
        /gcc/combine.c:4113
0x12923d5 combine_instructions
        /gcc/combine.c:1452
0x12926ed rest_of_handle_combine
        /gcc/combine.c:14795
0x12926ed execute
        /gcc/combine.c:14840
Please submit a full bug report,


Thanks,

Christophe

> Thanks for the examples of how this affects various targets.  Seems like
> it ought to be a consistent win when they trigger.
>
> jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-10-26 12:07     ` Richard Biener
@ 2017-11-20 21:04       ` Richard Sandiford
  2017-11-21 15:00         ` Richard Biener
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-11-20 21:04 UTC (permalink / raw)
  To: Richard Biener, Jeff Law; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds a stub helper routine to provide the mode
>>> of a scalar shift amount, given the mode of the values
>>> being shifted.
>>>
>>> One long-standing problem has been to decide what this mode
>>> should be for arbitrary rtxes (as opposed to those directly
>>> tied to a target pattern).  Is it the mode of the shifted
>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>> the corresponding target pattern says?  (In which case what
>>> should the mode be when the target doesn't have a pattern?)
>>>
>>> For now the patch picks word_mode, which should be safe on
>>> all targets but could perhaps become suboptimal if the helper
>>> routine is used more often than it is in this patch.  As it
>>> stands the patch does not change the generated code.
>>>
>>> The patch also adds a helper function that constructs rtxes
>>> for constant shift amounts, again given the mode of the value
>>> being shifted.  As well as helping with the SVE patches, this
>>> is one step towards allowing CONST_INTs to have a real mode.
>>
>> I think gen_shift_amount_mode is flawed and while encapsulating
>> constant shift amount RTX generation into a gen_int_shift_amount
>> looks good to me I'd rather have that ??? in this function (and
>> I'd use the mode of the RTX shifted, not word_mode...).

OK.  I'd gone for word_mode because that's what expand_binop uses
for CONST_INTs:

      op1_mode = (GET_MODE (op1) != VOIDmode
		  ? as_a <scalar_int_mode> (GET_MODE (op1))
		  : word_mode);

But using the inner mode should be fine too.  The patch below does that.

>> In the end it's up to insn recognizing to convert the op to the
>> expected mode and for generic RTL it's us that should decide
>> on the mode -- on GENERIC the shift amount has to be an
>> integer so why not simply use a mode that is large enough to
>> make the constant fit?

...but I can do that instead if you think it's better.

>> Just throwing in some comments here, RTL isn't my primary
>> expertise.
>
> To add a little bit - shift amounts is maybe the only(?) place
> where a modeless CONST_INT makes sense!  So "fixing"
> that first sounds backwards.

But even here they have a mode conceptually, since out-of-range shift
amounts are target-defined rather than undefined.  E.g. if the target
interprets the shift amount as unsigned, then for a shift amount
(const_int -1) it matters whether the mode is QImode (and so we're
shifting by 255) or HImode (and so we're shifting by 65535.

OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)

Jeff Law <law@redhat.com> writes:
> On 10/26/2017 06:06 AM, Richard Biener wrote:
>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds a stub helper routine to provide the mode
>>> of a scalar shift amount, given the mode of the values
>>> being shifted.
>>>
>>> One long-standing problem has been to decide what this mode
>>> should be for arbitrary rtxes (as opposed to those directly
>>> tied to a target pattern).  Is it the mode of the shifted
>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>> the corresponding target pattern says?  (In which case what
>>> should the mode be when the target doesn't have a pattern?)
>>>
>>> For now the patch picks word_mode, which should be safe on
>>> all targets but could perhaps become suboptimal if the helper
>>> routine is used more often than it is in this patch.  As it
>>> stands the patch does not change the generated code.
>>>
>>> The patch also adds a helper function that constructs rtxes
>>> for constant shift amounts, again given the mode of the value
>>> being shifted.  As well as helping with the SVE patches, this
>>> is one step towards allowing CONST_INTs to have a real mode.
>> 
>> I think gen_shift_amount_mode is flawed and while encapsulating
>> constant shift amount RTX generation into a gen_int_shift_amount
>> looks good to me I'd rather have that ??? in this function (and
>> I'd use the mode of the RTX shifted, not word_mode...).
>> 
>> In the end it's up to insn recognizing to convert the op to the
>> expected mode and for generic RTL it's us that should decide
>> on the mode -- on GENERIC the shift amount has to be an
>> integer so why not simply use a mode that is large enough to
>> make the constant fit?
>> 
>> Just throwing in some comments here, RTL isn't my primary
>> expertise.
> I wonder if encapsulation + a target hook to specify the mode would be
> better?  We'd then have to argue over word_mode, vs QImode vs something
> else for the default, but at least we'd have a way for the target to
> specify the mode is generally best when working on shift counts.
>
> In the end I doubt there's a single definition that is overall better.
> Largely because I suspect there are times when the narrowest mode is
> best, or the mode of the operand being shifted.
>
> So thoughts on doing the encapsulation with a target hook to specify the
> desired mode?  Does that get us what we need for SVE and does it provide
> us a path forward on this issue if we were to try to move towards
> CONST_INTs with modes?

I think it'd better to do that only if we have a use case, since
it's hard to predict what the best way of handling it is until then.
E.g. I'd still like to hold out the possibility of doing this automatically
from the .md file instead, if some kind of override ends up being necessary.

Like you say, we have to argue over the default either way, and I think
that's been the sticking point.

Thanks,
Richard


2017-11-20  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* emit-rtl.h (gen_int_shift_amount): Declare.
	* emit-rtl.c (gen_int_shift_amount): New function.
	* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
	instead of GEN_INT.
	* calls.c (shift_return_value): Likewise.
	* cse.c (fold_rtx): Likewise.
	* dse.c (find_shift_sequence): Likewise.
	* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
	(expand_shift, expand_smod_pow2): Likewise.
	* lower-subreg.c (shift_cost): Likewise.
	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
	(simplify_binary_operation_1): Likewise.
	* combine.c (try_combine, find_split_point, force_int_to_mode)
	(simplify_shift_const_1, simplify_shift_const): Likewise.
	(change_zero_ext): Likewise.  Use simplify_gen_binary.
	* optabs.c (expand_superword_shift, expand_doubleword_mult)
	(expand_unop, expand_binop): Use gen_int_shift_amount instead
	of GEN_INT.
	(shift_amt_for_vec_perm_mask): Add a machine_mode argument.
	Use gen_int_shift_amount instead of GEN_INT.
	(expand_vec_perm): Update caller accordingly.  Use
	gen_int_shift_amount instead of GEN_INT.

Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-11-20 20:37:41.918226976 +0000
+++ gcc/emit-rtl.h	2017-11-20 20:37:51.661320782 +0000
@@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
 extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
 extern void adjust_reg_mode (rtx, machine_mode);
 extern int mem_expr_equal_p (const_tree, const_tree);
+extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
 
 extern bool need_atomic_barrier_p (enum memmodel, bool);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/emit-rtl.c	2017-11-20 20:37:51.660320782 +0000
@@ -6507,6 +6507,24 @@ need_atomic_barrier_p (enum memmodel mod
     }
 }
 
+/* Return a constant shift amount for shifting a value of mode MODE
+   by VALUE bits.  */
+
+rtx
+gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
+{
+  /* ??? Using the inner mode should be wide enough for all useful
+     cases (e.g. QImode usually has 8 shiftable bits, while a QImode
+     shift amount has a range of [-128, 127]).  But in principle
+     a target could require target-dependent behaviour for a
+     shift whose shift amount is wider than the shifted value.
+     Perhaps this should be automatically derived from the .md
+     files instead, or perhaps have a target hook.  */
+  scalar_int_mode shift_mode
+    = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
+  return gen_int_mode (value, shift_mode);
+}
+
 /* Initialize fields of rtl_data related to stack alignment.  */
 
 void
Index: gcc/asan.c
===================================================================
--- gcc/asan.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/asan.c	2017-11-20 20:37:51.657320781 +0000
@@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
   TREE_ASM_WRITTEN (id) = 1;
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
-			      GEN_INT (ASAN_SHADOW_SHIFT),
+			      gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
 			      NULL_RTX, 1, OPTAB_DIRECT);
   shadow_base
     = plus_constant (Pmode, shadow_base,
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/calls.c	2017-11-20 20:37:51.657320781 +0000
@@ -2742,15 +2742,17 @@ shift_return_value (machine_mode mode, b
   HOST_WIDE_INT shift;
 
   gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
-  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
+  machine_mode value_mode = GET_MODE (value);
+  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
   if (shift == 0)
     return false;
 
   /* Use ashr rather than lshr for right shifts.  This is for the benefit
      of the MIPS port, which requires SImode values to be sign-extended
      when stored in 64-bit registers.  */
-  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
-			   value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
+  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
+			   value, gen_int_shift_amount (value_mode, shift),
+			   value, 1, OPTAB_WIDEN))
     gcc_unreachable ();
   return true;
 }
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/cse.c	2017-11-20 20:37:51.660320782 +0000
@@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (const_arg1) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
-						& (GET_MODE_UNIT_BITSIZE (mode)
-						   - 1));
+		    canon_const_arg1 = gen_int_shift_amount
+		      (mode, (INTVAL (const_arg1)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (inner_const) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    inner_const = GEN_INT (INTVAL (inner_const)
-					   & (GET_MODE_UNIT_BITSIZE (mode)
-					      - 1));
+		    inner_const = gen_int_shift_amount
+		      (mode, (INTVAL (inner_const)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
 		  /* As an exception, we can turn an ASHIFTRT of this
 		     form into a shift of the number of bits - 1.  */
 		  if (code == ASHIFTRT)
-		    new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
+		    new_const = gen_int_shift_amount
+		      (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
 		  else if (!side_effects_p (XEXP (y, 0)))
 		    return CONST0_RTX (mode);
 		  else
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/dse.c	2017-11-20 20:37:51.660320782 +0000
@@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
 				     store_mode, byte);
 	  if (ret && CONSTANT_P (ret))
 	    {
+	      rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
 	      ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
-						     ret, GEN_INT (shift));
+						     ret, shift_rtx);
 	      if (ret && CONSTANT_P (ret))
 		{
 		  byte = subreg_lowpart_offset (read_mode, new_mode);
@@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
 	 of one dsp where the cost of these two was not the same.  But
 	 this really is a rare case anyway.  */
       target = expand_binop (new_mode, lshr_optab, new_reg,
-			     GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+			     gen_int_shift_amount (new_mode, shift),
+			     new_reg, 1, OPTAB_DIRECT);
 
       shift_seq = get_insns ();
       end_sequence ();
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/expmed.c	2017-11-20 20:37:51.661320782 +0000
@@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
 	  PUT_MODE (all->zext, wider_mode);
 	  PUT_MODE (all->wide_mult, wider_mode);
 	  PUT_MODE (all->wide_lshr, wider_mode);
-	  XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
+	  XEXP (all->wide_lshr, 1)
+	    = gen_int_shift_amount (wider_mode, mode_bitsize);
 
 	  set_mul_widen_cost (speed, wider_mode,
 			      set_src_cost (all->wide_mult, wider_mode, speed));
@@ -909,12 +910,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     to make sure that for big-endian machines the higher order
 	     bits are used.  */
 	  if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
-	    value_word = simplify_expand_binop (word_mode, lshr_optab,
-						value_word,
-						GEN_INT (BITS_PER_WORD
-							 - new_bitsize),
-						NULL_RTX, true,
-						OPTAB_LIB_WIDEN);
+	    {
+	      int shift = BITS_PER_WORD - new_bitsize;
+	      rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
+	      value_word = simplify_expand_binop (word_mode, lshr_optab,
+						  value_word, shift_rtx,
+						  NULL_RTX, true,
+						  OPTAB_LIB_WIDEN);
+	    }
 
 	  if (!store_bit_field_1 (op0, new_bitsize,
 				  bitnum + bit_offset,
@@ -2365,8 +2368,9 @@ expand_shift_1 (enum tree_code code, mac
       if (CONST_INT_P (op1)
 	  && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
 	      (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
-	op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
-		       % GET_MODE_BITSIZE (scalar_mode));
+	op1 = gen_int_shift_amount (mode,
+				    (unsigned HOST_WIDE_INT) INTVAL (op1)
+				    % GET_MODE_BITSIZE (scalar_mode));
       else if (GET_CODE (op1) == SUBREG
 	       && subreg_lowpart_p (op1)
 	       && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
@@ -2383,7 +2387,8 @@ expand_shift_1 (enum tree_code code, mac
       && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
 		   GET_MODE_BITSIZE (scalar_mode) - 1))
     {
-      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
+      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
+					 - INTVAL (op1)));
       left = !left;
       code = left ? LROTATE_EXPR : RROTATE_EXPR;
     }
@@ -2463,8 +2468,8 @@ expand_shift_1 (enum tree_code code, mac
 	      if (op1 == const0_rtx)
 		return shifted;
 	      else if (CONST_INT_P (op1))
-		other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
-					- INTVAL (op1));
+		other_amount = gen_int_shift_amount
+		  (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
 	      else
 		{
 		  other_amount
@@ -2537,8 +2542,9 @@ expand_shift_1 (enum tree_code code, mac
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 	      int amount, rtx target, int unsignedp)
 {
-  return expand_shift_1 (code, mode,
-			 shifted, GEN_INT (amount), target, unsignedp);
+  return expand_shift_1 (code, mode, shifted,
+			 gen_int_shift_amount (mode, amount),
+			 target, unsignedp);
 }
 
 /* Likewise, but return 0 if that cannot be done.  */
@@ -3856,7 +3862,7 @@ expand_smod_pow2 (scalar_int_mode mode,
 	{
 	  HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
 	  signmask = force_reg (mode, signmask);
-	  shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
+	  shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
 
 	  /* Use the rtx_cost of a LSHIFTRT instruction to determine
 	     which instruction sequence to use.  If logical right shifts
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/lower-subreg.c	2017-11-20 20:37:51.661320782 +0000
@@ -141,7 +141,7 @@ shift_cost (bool speed_p, struct cost_rt
   PUT_CODE (rtxes->shift, code);
   PUT_MODE (rtxes->shift, mode);
   PUT_MODE (rtxes->source, mode);
-  XEXP (rtxes->shift, 1) = GEN_INT (op1);
+  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
   return set_src_cost (rtxes->shift, mode, speed_p);
 }
 
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/simplify-rtx.c	2017-11-20 20:37:51.663320783 +0000
@@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  if (STORE_FLAG_VALUE == 1)
 	    {
 	      temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  else if (STORE_FLAG_VALUE == -1)
 	    {
 	      temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -2672,7 +2674,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
 	  if (val >= 0)
-	    return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (ASHIFT, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
 
       /* x*2 is x+x and x*(-1) is -x */
@@ -3296,7 +3299,8 @@ simplify_binary_operation_1 (enum rtx_co
       /* Convert divide by power of two into shift.  */
       if (CONST_INT_P (trueop1)
 	  && (val = exact_log2 (UINTVAL (trueop1))) > 0)
-	return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
+	return simplify_gen_binary (LSHIFTRT, mode, op0,
+				    gen_int_shift_amount (mode, val));
       break;
 
     case DIV:
@@ -3416,10 +3420,12 @@ simplify_binary_operation_1 (enum rtx_co
 	  && IN_RANGE (INTVAL (trueop1),
 		       GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
 		       GET_MODE_UNIT_PRECISION (mode) - 1))
-	return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
-				    mode, op0,
-				    GEN_INT (GET_MODE_UNIT_PRECISION (mode)
-					     - INTVAL (trueop1)));
+	{
+	  int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
+	  rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
+	  return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
+				      mode, op0, new_amount_rtx);
+	}
 #endif
       /* FALLTHRU */
     case ASHIFTRT:
@@ -3460,8 +3466,8 @@ simplify_binary_operation_1 (enum rtx_co
 	      == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
 	  && subreg_lowpart_p (op0))
 	{
-	  rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
-			     + INTVAL (op1));
+	  rtx tmp = gen_int_shift_amount
+	    (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
 	  tmp = simplify_gen_binary (code, inner_mode,
 				     XEXP (SUBREG_REG (op0), 0),
 				     tmp);
@@ -3472,7 +3478,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
 	  if (val != INTVAL (op1))
-	    return simplify_gen_binary (code, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (code, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
       break;
 
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/combine.c	2017-11-20 20:37:51.659320782 +0000
@@ -3792,8 +3792,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && INTVAL (XEXP (*split, 1)) > 0
 	      && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
 	    {
+	      rtx i_rtx = gen_int_shift_amount (split_mode, i);
 	      SUBST (*split, gen_rtx_ASHIFT (split_mode,
-					     XEXP (*split, 0), GEN_INT (i)));
+					     XEXP (*split, 0), i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -3807,8 +3808,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
 	    {
 	      rtx nsplit = XEXP (*split, 0);
+	      rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
 	      SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
-					     XEXP (nsplit, 0), GEN_INT (i)));
+						       XEXP (nsplit, 0),
+						       i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -5077,12 +5080,12 @@ find_split_point (rtx *loc, rtx_insn *in
 				      GET_MODE (XEXP (SET_SRC (x), 0))))))
 	    {
 	      machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
-
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_NEG (mode,
 				  gen_rtx_LSHIFTRT (mode,
 						    XEXP (SET_SRC (x), 0),
-						    GEN_INT (pos))));
+						    pos_rtx)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -5140,11 +5143,11 @@ find_split_point (rtx *loc, rtx_insn *in
 	    {
 	      unsigned HOST_WIDE_INT mask
 		= (HOST_WIDE_INT_1U << len) - 1;
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_AND (mode,
 				  gen_rtx_LSHIFTRT
-				  (mode, gen_lowpart (mode, inner),
-				   GEN_INT (pos)),
+				  (mode, gen_lowpart (mode, inner), pos_rtx),
 				  gen_int_mode (mask, mode)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
@@ -5153,14 +5156,15 @@ find_split_point (rtx *loc, rtx_insn *in
 	    }
 	  else
 	    {
+	      int left_bits = GET_MODE_PRECISION (mode) - len - pos;
+	      int right_bits = GET_MODE_PRECISION (mode) - len;
 	      SUBST (SET_SRC (x),
 		     gen_rtx_fmt_ee
 		     (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
 		      gen_rtx_ASHIFT (mode,
 				      gen_lowpart (mode, inner),
-				      GEN_INT (GET_MODE_PRECISION (mode)
-					       - len - pos)),
-		      GEN_INT (GET_MODE_PRECISION (mode) - len)));
+				      gen_int_shift_amount (mode, left_bits)),
+		      gen_int_shift_amount (mode, right_bits)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -8935,10 +8939,11 @@ force_int_to_mode (rtx x, scalar_int_mod
 	  /* Must be more sign bit copies than the mask needs.  */
 	  && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
 	      >= exact_log2 (mask + 1)))
-	x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
-				 GEN_INT (GET_MODE_PRECISION (xmode)
-					  - exact_log2 (mask + 1)));
-
+	{
+	  int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
+	  x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
+				   gen_int_shift_amount (xmode, nbits));
+	}
       goto shiftrt;
 
     case ASHIFTRT:
@@ -10431,7 +10436,7 @@ simplify_shift_const_1 (enum rtx_code co
 {
   enum rtx_code orig_code = code;
   rtx orig_varop = varop;
-  int count;
+  int count, log2;
   machine_mode mode = result_mode;
   machine_mode shift_mode;
   scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
@@ -10634,13 +10639,11 @@ simplify_shift_const_1 (enum rtx_code co
 	     is cheaper.  But it is still better on those machines to
 	     merge two shifts into one.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (ASHIFT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10648,13 +10651,11 @@ simplify_shift_const_1 (enum rtx_code co
 	case UDIV:
 	  /* Similar, for when divides are cheaper.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10789,10 +10790,10 @@ simplify_shift_const_1 (enum rtx_code co
 
 	      mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
 				       int_result_mode);
-
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      mask_rtx
 		= simplify_const_binary_operation (code, int_result_mode,
-						   mask_rtx, GEN_INT (count));
+						   mask_rtx, count_rtx);
 
 	      /* Give up if we can't compute an outer operation to use.  */
 	      if (mask_rtx == 0
@@ -10848,9 +10849,10 @@ simplify_shift_const_1 (enum rtx_code co
 	      if (code == ASHIFTRT && int_mode != int_result_mode)
 		break;
 
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      rtx new_rtx = simplify_const_binary_operation (code, int_mode,
 							     XEXP (varop, 0),
-							     GEN_INT (count));
+							     count_rtx);
 	      varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
 	      count = 0;
 	      continue;
@@ -10916,7 +10918,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
 				  INTVAL (new_rtx), int_result_mode,
@@ -11059,7 +11061,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (ASHIFT, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, PLUS,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11080,7 +11082,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, XOR,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11135,12 +11137,12 @@ simplify_shift_const_1 (enum rtx_code co
 		      - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
 	    {
 	      rtx varop_inner = XEXP (varop, 0);
-
-	      varop_inner
-		= gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
-				    XEXP (varop_inner, 0),
-				    GEN_INT
-				    (count + INTVAL (XEXP (varop_inner, 1))));
+	      int new_count = count + INTVAL (XEXP (varop_inner, 1));
+	      rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
+							new_count);
+	      varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
+					      XEXP (varop_inner, 0),
+					      new_count_rtx);
 	      varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
 	      count = 0;
 	      continue;
@@ -11192,7 +11194,8 @@ simplify_shift_const_1 (enum rtx_code co
     x = NULL_RTX;
 
   if (x == NULL_RTX)
-    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
+    x = simplify_gen_binary (code, shift_mode, varop,
+			     gen_int_shift_amount (shift_mode, count));
 
   /* If we were doing an LSHIFTRT in a wider mode than it was originally,
      turn off all the bits that the shift would have turned off.  */
@@ -11254,7 +11257,8 @@ simplify_shift_const (rtx x, enum rtx_co
     return tem;
 
   if (!x)
-    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
+    x = simplify_gen_binary (code, GET_MODE (varop), varop,
+			     gen_int_shift_amount (GET_MODE (varop), count));
   if (GET_MODE (x) != result_mode)
     x = gen_lowpart (result_mode, x);
   return x;
@@ -11445,8 +11449,9 @@ change_zero_ext (rtx pat)
 	  if (BITS_BIG_ENDIAN)
 	    start = GET_MODE_PRECISION (inner_mode) - size - start;
 
-	  if (start)
-	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
+	  if (start != 0)
+	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
+				  gen_int_shift_amount (inner_mode, start));
 	  else
 	    x = XEXP (x, 0);
 	  if (mode != inner_mode)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-11-20 20:37:41.918226976 +0000
+++ gcc/optabs.c	2017-11-20 20:37:51.662320782 +0000
@@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
       if (binoptab != ashr_optab)
 	emit_move_insn (outof_target, CONST0_RTX (word_mode));
       else
-	if (!force_expand_binop (word_mode, binoptab,
-				 outof_input, GEN_INT (BITS_PER_WORD - 1),
+	if (!force_expand_binop (word_mode, binoptab, outof_input,
+				 gen_int_shift_amount (word_mode,
+						       BITS_PER_WORD - 1),
 				 outof_target, unsignedp, methods))
 	  return false;
     }
@@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
 {
   int low = (WORDS_BIG_ENDIAN ? 1 : 0);
   int high = (WORDS_BIG_ENDIAN ? 0 : 1);
-  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
+  rtx wordm1 = (umulp ? NULL_RTX
+		: gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
   rtx product, adjust, product_high, temp;
 
   rtx op0_high = operand_subword_force (op0, high, mode);
@@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
       unsigned int bits = GET_MODE_PRECISION (int_mode);
 
       if (CONST_INT_P (op1))
-        newop1 = GEN_INT (bits - INTVAL (op1));
+	newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
       else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
         newop1 = negate_rtx (GET_MODE (op1), op1);
       else
@@ -1403,7 +1405,7 @@ expand_binop (machine_mode mode, optab b
 
       /* Apply the truncation to constant shifts.  */
       if (double_shift_mask > 0 && CONST_INT_P (op1))
-	op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
+	op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
 
       if (op1 == CONST0_RTX (op1_mode))
 	return op0;
@@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
       else
 	{
 	  rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
-	  rtx first_shift_count, second_shift_count;
+	  HOST_WIDE_INT first_shift_count, second_shift_count;
 	  optab reverse_unsigned_shift, unsigned_shift;
 
 	  reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
@@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
 
 	  if (shift_count > BITS_PER_WORD)
 	    {
-	      first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
-	      second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
+	      first_shift_count = shift_count - BITS_PER_WORD;
+	      second_shift_count = 2 * BITS_PER_WORD - shift_count;
 	    }
 	  else
 	    {
-	      first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
-	      second_shift_count = GEN_INT (shift_count);
+	      first_shift_count = BITS_PER_WORD - shift_count;
+	      second_shift_count = shift_count;
 	    }
+	  rtx first_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, first_shift_count);
+	  rtx second_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, second_shift_count);
 
 	  into_temp1 = expand_binop (word_mode, unsigned_shift,
-				     outof_input, first_shift_count,
+				     outof_input, first_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 	  into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				     into_input, second_shift_count,
+				     into_input, second_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 
 	  if (into_temp1 != 0 && into_temp2 != 0)
@@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
 	    emit_move_insn (into_target, inter);
 
 	  outof_temp1 = expand_binop (word_mode, unsigned_shift,
-				      into_input, first_shift_count,
+				      into_input, first_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 	  outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				      outof_input, second_shift_count,
+				      outof_input, second_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 
 	  if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
@@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
 
 	  if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotl_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	     }
 
 	  if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotr_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	    }
 
 	  last = get_last_insn ();
 
-	  temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp1 = expand_binop (mode, ashl_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
-	  temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp2 = expand_binop (mode, lshr_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
 	  if (temp1 && temp2)
 	    {
@@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
 }
 
 /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
-   vec_perm operand, assuming the second operand is a constant vector of zeroes.
-   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
-   shift.  */
+   vec_perm operand (which has mode OP0_MODE), assuming the second
+   operand is a constant vector of zeroes.  Return the shift distance in
+   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
 static rtx
-shift_amt_for_vec_perm_mask (rtx sel)
+shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
 {
   unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
   unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
@@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
 	return NULL_RTX;
     }
 
-  return GEN_INT (first * bitsize);
+  return gen_int_shift_amount (op0_mode, first * bitsize);
 }
 
 /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
@@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
 	  && (shift_code != CODE_FOR_nothing
 	      || shift_code_qi != CODE_FOR_nothing))
 	{
-	  shift_amt = shift_amt_for_vec_perm_mask (sel);
+	  shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
 	  if (shift_amt)
 	    {
 	      struct expand_operand ops[3];
@@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
 				   NULL, 0, OPTAB_DIRECT);
       else
 	sel = expand_simple_binop (selmode, ASHIFT, sel,
-				   GEN_INT (exact_log2 (u)),
+				   gen_int_shift_amount (selmode,
+							 exact_log2 (u)),
 				   NULL, 0, OPTAB_DIRECT);
       gcc_assert (sel != NULL);
 

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-11-20 21:04       ` Richard Sandiford
@ 2017-11-21 15:00         ` Richard Biener
  2017-12-15  0:48           ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-11-21 15:00 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, GCC Patches, Richard Sandiford

On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch adds a stub helper routine to provide the mode
>>>> of a scalar shift amount, given the mode of the values
>>>> being shifted.
>>>>
>>>> One long-standing problem has been to decide what this mode
>>>> should be for arbitrary rtxes (as opposed to those directly
>>>> tied to a target pattern).  Is it the mode of the shifted
>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>> the corresponding target pattern says?  (In which case what
>>>> should the mode be when the target doesn't have a pattern?)
>>>>
>>>> For now the patch picks word_mode, which should be safe on
>>>> all targets but could perhaps become suboptimal if the helper
>>>> routine is used more often than it is in this patch.  As it
>>>> stands the patch does not change the generated code.
>>>>
>>>> The patch also adds a helper function that constructs rtxes
>>>> for constant shift amounts, again given the mode of the value
>>>> being shifted.  As well as helping with the SVE patches, this
>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>
>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>> constant shift amount RTX generation into a gen_int_shift_amount
>>> looks good to me I'd rather have that ??? in this function (and
>>> I'd use the mode of the RTX shifted, not word_mode...).
>
> OK.  I'd gone for word_mode because that's what expand_binop uses
> for CONST_INTs:
>
>       op1_mode = (GET_MODE (op1) != VOIDmode
>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>                   : word_mode);
>
> But using the inner mode should be fine too.  The patch below does that.
>
>>> In the end it's up to insn recognizing to convert the op to the
>>> expected mode and for generic RTL it's us that should decide
>>> on the mode -- on GENERIC the shift amount has to be an
>>> integer so why not simply use a mode that is large enough to
>>> make the constant fit?
>
> ...but I can do that instead if you think it's better.
>
>>> Just throwing in some comments here, RTL isn't my primary
>>> expertise.
>>
>> To add a little bit - shift amounts is maybe the only(?) place
>> where a modeless CONST_INT makes sense!  So "fixing"
>> that first sounds backwards.
>
> But even here they have a mode conceptually, since out-of-range shift
> amounts are target-defined rather than undefined.  E.g. if the target
> interprets the shift amount as unsigned, then for a shift amount
> (const_int -1) it matters whether the mode is QImode (and so we're
> shifting by 255) or HImode (and so we're shifting by 65535.

I think RTL is well-defined (at least I hope so ...) and machine constraints
need to be modeled explicitely (like embedding an implicit bit_and in
shift patterns).

> OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)
>
> Jeff Law <law@redhat.com> writes:
>> On 10/26/2017 06:06 AM, Richard Biener wrote:
>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch adds a stub helper routine to provide the mode
>>>> of a scalar shift amount, given the mode of the values
>>>> being shifted.
>>>>
>>>> One long-standing problem has been to decide what this mode
>>>> should be for arbitrary rtxes (as opposed to those directly
>>>> tied to a target pattern).  Is it the mode of the shifted
>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>> the corresponding target pattern says?  (In which case what
>>>> should the mode be when the target doesn't have a pattern?)
>>>>
>>>> For now the patch picks word_mode, which should be safe on
>>>> all targets but could perhaps become suboptimal if the helper
>>>> routine is used more often than it is in this patch.  As it
>>>> stands the patch does not change the generated code.
>>>>
>>>> The patch also adds a helper function that constructs rtxes
>>>> for constant shift amounts, again given the mode of the value
>>>> being shifted.  As well as helping with the SVE patches, this
>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>
>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>> constant shift amount RTX generation into a gen_int_shift_amount
>>> looks good to me I'd rather have that ??? in this function (and
>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>
>>> In the end it's up to insn recognizing to convert the op to the
>>> expected mode and for generic RTL it's us that should decide
>>> on the mode -- on GENERIC the shift amount has to be an
>>> integer so why not simply use a mode that is large enough to
>>> make the constant fit?
>>>
>>> Just throwing in some comments here, RTL isn't my primary
>>> expertise.
>> I wonder if encapsulation + a target hook to specify the mode would be
>> better?  We'd then have to argue over word_mode, vs QImode vs something
>> else for the default, but at least we'd have a way for the target to
>> specify the mode is generally best when working on shift counts.
>>
>> In the end I doubt there's a single definition that is overall better.
>> Largely because I suspect there are times when the narrowest mode is
>> best, or the mode of the operand being shifted.
>>
>> So thoughts on doing the encapsulation with a target hook to specify the
>> desired mode?  Does that get us what we need for SVE and does it provide
>> us a path forward on this issue if we were to try to move towards
>> CONST_INTs with modes?
>
> I think it'd better to do that only if we have a use case, since
> it's hard to predict what the best way of handling it is until then.
> E.g. I'd still like to hold out the possibility of doing this automatically
> from the .md file instead, if some kind of override ends up being necessary.
>
> Like you say, we have to argue over the default either way, and I think
> that's been the sticking point.
>
> Thanks,
> Richard
>
>
> 2017-11-20  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * emit-rtl.h (gen_int_shift_amount): Declare.
>         * emit-rtl.c (gen_int_shift_amount): New function.
>         * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>         instead of GEN_INT.
>         * calls.c (shift_return_value): Likewise.
>         * cse.c (fold_rtx): Likewise.
>         * dse.c (find_shift_sequence): Likewise.
>         * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>         (expand_shift, expand_smod_pow2): Likewise.
>         * lower-subreg.c (shift_cost): Likewise.
>         * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>         (simplify_binary_operation_1): Likewise.
>         * combine.c (try_combine, find_split_point, force_int_to_mode)
>         (simplify_shift_const_1, simplify_shift_const): Likewise.
>         (change_zero_ext): Likewise.  Use simplify_gen_binary.
>         * optabs.c (expand_superword_shift, expand_doubleword_mult)
>         (expand_unop, expand_binop): Use gen_int_shift_amount instead
>         of GEN_INT.
>         (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>         Use gen_int_shift_amount instead of GEN_INT.
>         (expand_vec_perm): Update caller accordingly.  Use
>         gen_int_shift_amount instead of GEN_INT.
>
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h      2017-11-20 20:37:41.918226976 +0000
> +++ gcc/emit-rtl.h      2017-11-20 20:37:51.661320782 +0000
> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>  extern void adjust_reg_mode (rtx, machine_mode);
>  extern int mem_expr_equal_p (const_tree, const_tree);
> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>
>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c      2017-11-20 20:37:41.918226976 +0000
> +++ gcc/emit-rtl.c      2017-11-20 20:37:51.660320782 +0000
> @@ -6507,6 +6507,24 @@ need_atomic_barrier_p (enum memmodel mod
>      }
>  }
>
> +/* Return a constant shift amount for shifting a value of mode MODE
> +   by VALUE bits.  */
> +
> +rtx
> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
> +{
> +  /* ??? Using the inner mode should be wide enough for all useful
> +     cases (e.g. QImode usually has 8 shiftable bits, while a QImode
> +     shift amount has a range of [-128, 127]).  But in principle
> +     a target could require target-dependent behaviour for a
> +     shift whose shift amount is wider than the shifted value.
> +     Perhaps this should be automatically derived from the .md
> +     files instead, or perhaps have a target hook.  */
> +  scalar_int_mode shift_mode
> +    = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
> +  return gen_int_mode (value, shift_mode);
> +}
> +
>  /* Initialize fields of rtl_data related to stack alignment.  */
>
>  void
> Index: gcc/asan.c
> ===================================================================
> --- gcc/asan.c  2017-11-20 20:37:41.918226976 +0000
> +++ gcc/asan.c  2017-11-20 20:37:51.657320781 +0000
> @@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
>    TREE_ASM_WRITTEN (id) = 1;
>    emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>    shadow_base = expand_binop (Pmode, lshr_optab, base,
> -                             GEN_INT (ASAN_SHADOW_SHIFT),
> +                             gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>                               NULL_RTX, 1, OPTAB_DIRECT);
>    shadow_base
>      = plus_constant (Pmode, shadow_base,
> Index: gcc/calls.c
> ===================================================================
> --- gcc/calls.c 2017-11-20 20:37:41.918226976 +0000
> +++ gcc/calls.c 2017-11-20 20:37:51.657320781 +0000
> @@ -2742,15 +2742,17 @@ shift_return_value (machine_mode mode, b
>    HOST_WIDE_INT shift;
>
>    gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
> -  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
> +  machine_mode value_mode = GET_MODE (value);
> +  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
>    if (shift == 0)
>      return false;
>
>    /* Use ashr rather than lshr for right shifts.  This is for the benefit
>       of the MIPS port, which requires SImode values to be sign-extended
>       when stored in 64-bit registers.  */
> -  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
> -                          value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
> +  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
> +                          value, gen_int_shift_amount (value_mode, shift),
> +                          value, 1, OPTAB_WIDEN))
>      gcc_unreachable ();
>    return true;
>  }
> Index: gcc/cse.c
> ===================================================================
> --- gcc/cse.c   2017-11-20 20:37:41.918226976 +0000
> +++ gcc/cse.c   2017-11-20 20:37:51.660320782 +0000
> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>                       || INTVAL (const_arg1) < 0))
>                 {
>                   if (SHIFT_COUNT_TRUNCATED)
> -                   canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
> -                                               & (GET_MODE_UNIT_BITSIZE (mode)
> -                                                  - 1));
> +                   canon_const_arg1 = gen_int_shift_amount
> +                     (mode, (INTVAL (const_arg1)
> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>                   else
>                     break;
>                 }
> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>                       || INTVAL (inner_const) < 0))
>                 {
>                   if (SHIFT_COUNT_TRUNCATED)
> -                   inner_const = GEN_INT (INTVAL (inner_const)
> -                                          & (GET_MODE_UNIT_BITSIZE (mode)
> -                                             - 1));
> +                   inner_const = gen_int_shift_amount
> +                     (mode, (INTVAL (inner_const)
> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>                   else
>                     break;
>                 }
> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
>                   /* As an exception, we can turn an ASHIFTRT of this
>                      form into a shift of the number of bits - 1.  */
>                   if (code == ASHIFTRT)
> -                   new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
> +                   new_const = gen_int_shift_amount
> +                     (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
>                   else if (!side_effects_p (XEXP (y, 0)))
>                     return CONST0_RTX (mode);
>                   else
> Index: gcc/dse.c
> ===================================================================
> --- gcc/dse.c   2017-11-20 20:37:41.918226976 +0000
> +++ gcc/dse.c   2017-11-20 20:37:51.660320782 +0000
> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
>                                      store_mode, byte);
>           if (ret && CONSTANT_P (ret))
>             {
> +             rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
>               ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
> -                                                    ret, GEN_INT (shift));
> +                                                    ret, shift_rtx);
>               if (ret && CONSTANT_P (ret))
>                 {
>                   byte = subreg_lowpart_offset (read_mode, new_mode);
> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
>          of one dsp where the cost of these two was not the same.  But
>          this really is a rare case anyway.  */
>        target = expand_binop (new_mode, lshr_optab, new_reg,
> -                            GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
> +                            gen_int_shift_amount (new_mode, shift),
> +                            new_reg, 1, OPTAB_DIRECT);
>
>        shift_seq = get_insns ();
>        end_sequence ();
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c        2017-11-20 20:37:41.918226976 +0000
> +++ gcc/expmed.c        2017-11-20 20:37:51.661320782 +0000
> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
>           PUT_MODE (all->zext, wider_mode);
>           PUT_MODE (all->wide_mult, wider_mode);
>           PUT_MODE (all->wide_lshr, wider_mode);
> -         XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
> +         XEXP (all->wide_lshr, 1)
> +           = gen_int_shift_amount (wider_mode, mode_bitsize);
>
>           set_mul_widen_cost (speed, wider_mode,
>                               set_src_cost (all->wide_mult, wider_mode, speed));
> @@ -909,12 +910,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
>              to make sure that for big-endian machines the higher order
>              bits are used.  */
>           if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
> -           value_word = simplify_expand_binop (word_mode, lshr_optab,
> -                                               value_word,
> -                                               GEN_INT (BITS_PER_WORD
> -                                                        - new_bitsize),
> -                                               NULL_RTX, true,
> -                                               OPTAB_LIB_WIDEN);
> +           {
> +             int shift = BITS_PER_WORD - new_bitsize;
> +             rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
> +             value_word = simplify_expand_binop (word_mode, lshr_optab,
> +                                                 value_word, shift_rtx,
> +                                                 NULL_RTX, true,
> +                                                 OPTAB_LIB_WIDEN);
> +           }
>
>           if (!store_bit_field_1 (op0, new_bitsize,
>                                   bitnum + bit_offset,
> @@ -2365,8 +2368,9 @@ expand_shift_1 (enum tree_code code, mac
>        if (CONST_INT_P (op1)
>           && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
>               (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
> -       op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
> -                      % GET_MODE_BITSIZE (scalar_mode));
> +       op1 = gen_int_shift_amount (mode,
> +                                   (unsigned HOST_WIDE_INT) INTVAL (op1)
> +                                   % GET_MODE_BITSIZE (scalar_mode));
>        else if (GET_CODE (op1) == SUBREG
>                && subreg_lowpart_p (op1)
>                && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
> @@ -2383,7 +2387,8 @@ expand_shift_1 (enum tree_code code, mac
>        && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
>                    GET_MODE_BITSIZE (scalar_mode) - 1))
>      {
> -      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
> +      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
> +                                        - INTVAL (op1)));
>        left = !left;
>        code = left ? LROTATE_EXPR : RROTATE_EXPR;
>      }
> @@ -2463,8 +2468,8 @@ expand_shift_1 (enum tree_code code, mac
>               if (op1 == const0_rtx)
>                 return shifted;
>               else if (CONST_INT_P (op1))
> -               other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
> -                                       - INTVAL (op1));
> +               other_amount = gen_int_shift_amount
> +                 (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>               else
>                 {
>                   other_amount
> @@ -2537,8 +2542,9 @@ expand_shift_1 (enum tree_code code, mac
>  expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>               int amount, rtx target, int unsignedp)
>  {
> -  return expand_shift_1 (code, mode,
> -                        shifted, GEN_INT (amount), target, unsignedp);
> +  return expand_shift_1 (code, mode, shifted,
> +                        gen_int_shift_amount (mode, amount),
> +                        target, unsignedp);
>  }
>
>  /* Likewise, but return 0 if that cannot be done.  */
> @@ -3856,7 +3862,7 @@ expand_smod_pow2 (scalar_int_mode mode,
>         {
>           HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
>           signmask = force_reg (mode, signmask);
> -         shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
> +         shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>
>           /* Use the rtx_cost of a LSHIFTRT instruction to determine
>              which instruction sequence to use.  If logical right shifts
> Index: gcc/lower-subreg.c
> ===================================================================
> --- gcc/lower-subreg.c  2017-11-20 20:37:41.918226976 +0000
> +++ gcc/lower-subreg.c  2017-11-20 20:37:51.661320782 +0000
> @@ -141,7 +141,7 @@ shift_cost (bool speed_p, struct cost_rt
>    PUT_CODE (rtxes->shift, code);
>    PUT_MODE (rtxes->shift, mode);
>    PUT_MODE (rtxes->source, mode);
> -  XEXP (rtxes->shift, 1) = GEN_INT (op1);
> +  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
>    return set_src_cost (rtxes->shift, mode, speed_p);
>  }
>
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c  2017-11-20 20:37:41.918226976 +0000
> +++ gcc/simplify-rtx.c  2017-11-20 20:37:51.663320783 +0000
> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
>           if (STORE_FLAG_VALUE == 1)
>             {
>               temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
> -                                         GEN_INT (isize - 1));
> +                                         gen_int_shift_amount (inner,
> +                                                               isize - 1));
>               if (int_mode == inner)
>                 return temp;
>               if (GET_MODE_PRECISION (int_mode) > isize)
> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
>           else if (STORE_FLAG_VALUE == -1)
>             {
>               temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
> -                                         GEN_INT (isize - 1));
> +                                         gen_int_shift_amount (inner,
> +                                                               isize - 1));
>               if (int_mode == inner)
>                 return temp;
>               if (GET_MODE_PRECISION (int_mode) > isize)
> @@ -2672,7 +2674,8 @@ simplify_binary_operation_1 (enum rtx_co
>         {
>           val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>           if (val >= 0)
> -           return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
> +           return simplify_gen_binary (ASHIFT, mode, op0,
> +                                       gen_int_shift_amount (mode, val));
>         }
>
>        /* x*2 is x+x and x*(-1) is -x */
> @@ -3296,7 +3299,8 @@ simplify_binary_operation_1 (enum rtx_co
>        /* Convert divide by power of two into shift.  */
>        if (CONST_INT_P (trueop1)
>           && (val = exact_log2 (UINTVAL (trueop1))) > 0)
> -       return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
> +       return simplify_gen_binary (LSHIFTRT, mode, op0,
> +                                   gen_int_shift_amount (mode, val));
>        break;
>
>      case DIV:
> @@ -3416,10 +3420,12 @@ simplify_binary_operation_1 (enum rtx_co
>           && IN_RANGE (INTVAL (trueop1),
>                        GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
>                        GET_MODE_UNIT_PRECISION (mode) - 1))
> -       return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
> -                                   mode, op0,
> -                                   GEN_INT (GET_MODE_UNIT_PRECISION (mode)
> -                                            - INTVAL (trueop1)));
> +       {
> +         int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
> +         rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
> +         return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
> +                                     mode, op0, new_amount_rtx);
> +       }
>  #endif
>        /* FALLTHRU */
>      case ASHIFTRT:
> @@ -3460,8 +3466,8 @@ simplify_binary_operation_1 (enum rtx_co
>               == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
>           && subreg_lowpart_p (op0))
>         {
> -         rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
> -                            + INTVAL (op1));
> +         rtx tmp = gen_int_shift_amount
> +           (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
>           tmp = simplify_gen_binary (code, inner_mode,
>                                      XEXP (SUBREG_REG (op0), 0),
>                                      tmp);
> @@ -3472,7 +3478,8 @@ simplify_binary_operation_1 (enum rtx_co
>         {
>           val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
>           if (val != INTVAL (op1))
> -           return simplify_gen_binary (code, mode, op0, GEN_INT (val));
> +           return simplify_gen_binary (code, mode, op0,
> +                                       gen_int_shift_amount (mode, val));
>         }
>        break;
>
> Index: gcc/combine.c
> ===================================================================
> --- gcc/combine.c       2017-11-20 20:37:41.918226976 +0000
> +++ gcc/combine.c       2017-11-20 20:37:51.659320782 +0000
> @@ -3792,8 +3792,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>               && INTVAL (XEXP (*split, 1)) > 0
>               && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
>             {
> +             rtx i_rtx = gen_int_shift_amount (split_mode, i);
>               SUBST (*split, gen_rtx_ASHIFT (split_mode,
> -                                            XEXP (*split, 0), GEN_INT (i)));
> +                                            XEXP (*split, 0), i_rtx));
>               /* Update split_code because we may not have a multiply
>                  anymore.  */
>               split_code = GET_CODE (*split);
> @@ -3807,8 +3808,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>               && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>             {
>               rtx nsplit = XEXP (*split, 0);
> +             rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
>               SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
> -                                            XEXP (nsplit, 0), GEN_INT (i)));
> +                                                      XEXP (nsplit, 0),
> +                                                      i_rtx));
>               /* Update split_code because we may not have a multiply
>                  anymore.  */
>               split_code = GET_CODE (*split);
> @@ -5077,12 +5080,12 @@ find_split_point (rtx *loc, rtx_insn *in
>                                       GET_MODE (XEXP (SET_SRC (x), 0))))))
>             {
>               machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
> -
> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>               SUBST (SET_SRC (x),
>                      gen_rtx_NEG (mode,
>                                   gen_rtx_LSHIFTRT (mode,
>                                                     XEXP (SET_SRC (x), 0),
> -                                                   GEN_INT (pos))));
> +                                                   pos_rtx)));
>
>               split = find_split_point (&SET_SRC (x), insn, true);
>               if (split && split != &SET_SRC (x))
> @@ -5140,11 +5143,11 @@ find_split_point (rtx *loc, rtx_insn *in
>             {
>               unsigned HOST_WIDE_INT mask
>                 = (HOST_WIDE_INT_1U << len) - 1;
> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>               SUBST (SET_SRC (x),
>                      gen_rtx_AND (mode,
>                                   gen_rtx_LSHIFTRT
> -                                 (mode, gen_lowpart (mode, inner),
> -                                  GEN_INT (pos)),
> +                                 (mode, gen_lowpart (mode, inner), pos_rtx),
>                                   gen_int_mode (mask, mode)));
>
>               split = find_split_point (&SET_SRC (x), insn, true);
> @@ -5153,14 +5156,15 @@ find_split_point (rtx *loc, rtx_insn *in
>             }
>           else
>             {
> +             int left_bits = GET_MODE_PRECISION (mode) - len - pos;
> +             int right_bits = GET_MODE_PRECISION (mode) - len;
>               SUBST (SET_SRC (x),
>                      gen_rtx_fmt_ee
>                      (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
>                       gen_rtx_ASHIFT (mode,
>                                       gen_lowpart (mode, inner),
> -                                     GEN_INT (GET_MODE_PRECISION (mode)
> -                                              - len - pos)),
> -                     GEN_INT (GET_MODE_PRECISION (mode) - len)));
> +                                     gen_int_shift_amount (mode, left_bits)),
> +                     gen_int_shift_amount (mode, right_bits)));
>
>               split = find_split_point (&SET_SRC (x), insn, true);
>               if (split && split != &SET_SRC (x))
> @@ -8935,10 +8939,11 @@ force_int_to_mode (rtx x, scalar_int_mod
>           /* Must be more sign bit copies than the mask needs.  */
>           && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>               >= exact_log2 (mask + 1)))
> -       x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
> -                                GEN_INT (GET_MODE_PRECISION (xmode)
> -                                         - exact_log2 (mask + 1)));
> -
> +       {
> +         int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
> +         x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
> +                                  gen_int_shift_amount (xmode, nbits));
> +       }
>        goto shiftrt;
>
>      case ASHIFTRT:
> @@ -10431,7 +10436,7 @@ simplify_shift_const_1 (enum rtx_code co
>  {
>    enum rtx_code orig_code = code;
>    rtx orig_varop = varop;
> -  int count;
> +  int count, log2;
>    machine_mode mode = result_mode;
>    machine_mode shift_mode;
>    scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
> @@ -10634,13 +10639,11 @@ simplify_shift_const_1 (enum rtx_code co
>              is cheaper.  But it is still better on those machines to
>              merge two shifts into one.  */
>           if (CONST_INT_P (XEXP (varop, 1))
> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>             {
> -             varop
> -               = simplify_gen_binary (ASHIFT, GET_MODE (varop),
> -                                      XEXP (varop, 0),
> -                                      GEN_INT (exact_log2 (
> -                                               UINTVAL (XEXP (varop, 1)))));
> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
> +             varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
> +                                          XEXP (varop, 0), log2_rtx);
>               continue;
>             }
>           break;
> @@ -10648,13 +10651,11 @@ simplify_shift_const_1 (enum rtx_code co
>         case UDIV:
>           /* Similar, for when divides are cheaper.  */
>           if (CONST_INT_P (XEXP (varop, 1))
> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>             {
> -             varop
> -               = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
> -                                      XEXP (varop, 0),
> -                                      GEN_INT (exact_log2 (
> -                                               UINTVAL (XEXP (varop, 1)))));
> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
> +             varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
> +                                          XEXP (varop, 0), log2_rtx);
>               continue;
>             }
>           break;
> @@ -10789,10 +10790,10 @@ simplify_shift_const_1 (enum rtx_code co
>
>               mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>                                        int_result_mode);
> -
> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>               mask_rtx
>                 = simplify_const_binary_operation (code, int_result_mode,
> -                                                  mask_rtx, GEN_INT (count));
> +                                                  mask_rtx, count_rtx);
>
>               /* Give up if we can't compute an outer operation to use.  */
>               if (mask_rtx == 0
> @@ -10848,9 +10849,10 @@ simplify_shift_const_1 (enum rtx_code co
>               if (code == ASHIFTRT && int_mode != int_result_mode)
>                 break;
>
> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>               rtx new_rtx = simplify_const_binary_operation (code, int_mode,
>                                                              XEXP (varop, 0),
> -                                                            GEN_INT (count));
> +                                                            count_rtx);
>               varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
>               count = 0;
>               continue;
> @@ -10916,7 +10918,7 @@ simplify_shift_const_1 (enum rtx_code co
>               && (new_rtx = simplify_const_binary_operation
>                   (code, int_result_mode,
>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> -                  GEN_INT (count))) != 0
> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>               && CONST_INT_P (new_rtx)
>               && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
>                                   INTVAL (new_rtx), int_result_mode,
> @@ -11059,7 +11061,7 @@ simplify_shift_const_1 (enum rtx_code co
>               && (new_rtx = simplify_const_binary_operation
>                   (ASHIFT, int_result_mode,
>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> -                  GEN_INT (count))) != 0
> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>               && CONST_INT_P (new_rtx)
>               && merge_outer_ops (&outer_op, &outer_const, PLUS,
>                                   INTVAL (new_rtx), int_result_mode,
> @@ -11080,7 +11082,7 @@ simplify_shift_const_1 (enum rtx_code co
>               && (new_rtx = simplify_const_binary_operation
>                   (code, int_result_mode,
>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> -                  GEN_INT (count))) != 0
> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>               && CONST_INT_P (new_rtx)
>               && merge_outer_ops (&outer_op, &outer_const, XOR,
>                                   INTVAL (new_rtx), int_result_mode,
> @@ -11135,12 +11137,12 @@ simplify_shift_const_1 (enum rtx_code co
>                       - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
>             {
>               rtx varop_inner = XEXP (varop, 0);
> -
> -             varop_inner
> -               = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
> -                                   XEXP (varop_inner, 0),
> -                                   GEN_INT
> -                                   (count + INTVAL (XEXP (varop_inner, 1))));
> +             int new_count = count + INTVAL (XEXP (varop_inner, 1));
> +             rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
> +                                                       new_count);
> +             varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
> +                                             XEXP (varop_inner, 0),
> +                                             new_count_rtx);
>               varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
>               count = 0;
>               continue;
> @@ -11192,7 +11194,8 @@ simplify_shift_const_1 (enum rtx_code co
>      x = NULL_RTX;
>
>    if (x == NULL_RTX)
> -    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
> +    x = simplify_gen_binary (code, shift_mode, varop,
> +                            gen_int_shift_amount (shift_mode, count));
>
>    /* If we were doing an LSHIFTRT in a wider mode than it was originally,
>       turn off all the bits that the shift would have turned off.  */
> @@ -11254,7 +11257,8 @@ simplify_shift_const (rtx x, enum rtx_co
>      return tem;
>
>    if (!x)
> -    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
> +    x = simplify_gen_binary (code, GET_MODE (varop), varop,
> +                            gen_int_shift_amount (GET_MODE (varop), count));
>    if (GET_MODE (x) != result_mode)
>      x = gen_lowpart (result_mode, x);
>    return x;
> @@ -11445,8 +11449,9 @@ change_zero_ext (rtx pat)
>           if (BITS_BIG_ENDIAN)
>             start = GET_MODE_PRECISION (inner_mode) - size - start;
>
> -         if (start)
> -           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
> +         if (start != 0)
> +           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
> +                                 gen_int_shift_amount (inner_mode, start));
>           else
>             x = XEXP (x, 0);
>           if (mode != inner_mode)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-11-20 20:37:41.918226976 +0000
> +++ gcc/optabs.c        2017-11-20 20:37:51.662320782 +0000
> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
>        if (binoptab != ashr_optab)
>         emit_move_insn (outof_target, CONST0_RTX (word_mode));
>        else
> -       if (!force_expand_binop (word_mode, binoptab,
> -                                outof_input, GEN_INT (BITS_PER_WORD - 1),
> +       if (!force_expand_binop (word_mode, binoptab, outof_input,
> +                                gen_int_shift_amount (word_mode,
> +                                                      BITS_PER_WORD - 1),
>                                  outof_target, unsignedp, methods))
>           return false;
>      }
> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
>  {
>    int low = (WORDS_BIG_ENDIAN ? 1 : 0);
>    int high = (WORDS_BIG_ENDIAN ? 0 : 1);
> -  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
> +  rtx wordm1 = (umulp ? NULL_RTX
> +               : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
>    rtx product, adjust, product_high, temp;
>
>    rtx op0_high = operand_subword_force (op0, high, mode);
> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
>        unsigned int bits = GET_MODE_PRECISION (int_mode);
>
>        if (CONST_INT_P (op1))
> -        newop1 = GEN_INT (bits - INTVAL (op1));
> +       newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
>        else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
>          newop1 = negate_rtx (GET_MODE (op1), op1);
>        else
> @@ -1403,7 +1405,7 @@ expand_binop (machine_mode mode, optab b
>
>        /* Apply the truncation to constant shifts.  */
>        if (double_shift_mask > 0 && CONST_INT_P (op1))
> -       op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
> +       op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>
>        if (op1 == CONST0_RTX (op1_mode))
>         return op0;
> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
>        else
>         {
>           rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
> -         rtx first_shift_count, second_shift_count;
> +         HOST_WIDE_INT first_shift_count, second_shift_count;
>           optab reverse_unsigned_shift, unsigned_shift;
>
>           reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>
>           if (shift_count > BITS_PER_WORD)
>             {
> -             first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
> -             second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
> +             first_shift_count = shift_count - BITS_PER_WORD;
> +             second_shift_count = 2 * BITS_PER_WORD - shift_count;
>             }
>           else
>             {
> -             first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
> -             second_shift_count = GEN_INT (shift_count);
> +             first_shift_count = BITS_PER_WORD - shift_count;
> +             second_shift_count = shift_count;
>             }
> +         rtx first_shift_count_rtx
> +           = gen_int_shift_amount (word_mode, first_shift_count);
> +         rtx second_shift_count_rtx
> +           = gen_int_shift_amount (word_mode, second_shift_count);
>
>           into_temp1 = expand_binop (word_mode, unsigned_shift,
> -                                    outof_input, first_shift_count,
> +                                    outof_input, first_shift_count_rtx,
>                                      NULL_RTX, unsignedp, next_methods);
>           into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
> -                                    into_input, second_shift_count,
> +                                    into_input, second_shift_count_rtx,
>                                      NULL_RTX, unsignedp, next_methods);
>
>           if (into_temp1 != 0 && into_temp2 != 0)
> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
>             emit_move_insn (into_target, inter);
>
>           outof_temp1 = expand_binop (word_mode, unsigned_shift,
> -                                     into_input, first_shift_count,
> +                                     into_input, first_shift_count_rtx,
>                                       NULL_RTX, unsignedp, next_methods);
>           outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
> -                                     outof_input, second_shift_count,
> +                                     outof_input, second_shift_count_rtx,
>                                       NULL_RTX, unsignedp, next_methods);
>
>           if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>
>           if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
>             {
> -             temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
> -                                  unsignedp, OPTAB_DIRECT);
> +             temp = expand_binop (mode, rotl_optab, op0,
> +                                  gen_int_shift_amount (mode, 8),
> +                                  target, unsignedp, OPTAB_DIRECT);
>               if (temp)
>                 return temp;
>              }
>
>           if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
>             {
> -             temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
> -                                  unsignedp, OPTAB_DIRECT);
> +             temp = expand_binop (mode, rotr_optab, op0,
> +                                  gen_int_shift_amount (mode, 8),
> +                                  target, unsignedp, OPTAB_DIRECT);
>               if (temp)
>                 return temp;
>             }
>
>           last = get_last_insn ();
>
> -         temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
> +         temp1 = expand_binop (mode, ashl_optab, op0,
> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>                                 unsignedp, OPTAB_WIDEN);
> -         temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
> +         temp2 = expand_binop (mode, lshr_optab, op0,
> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>                                 unsignedp, OPTAB_WIDEN);
>           if (temp1 && temp2)
>             {
> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
>  }
>
>  /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
> -   vec_perm operand, assuming the second operand is a constant vector of zeroes.
> -   Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
> -   shift.  */
> +   vec_perm operand (which has mode OP0_MODE), assuming the second
> +   operand is a constant vector of zeroes.  Return the shift distance in
> +   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
>  static rtx
> -shift_amt_for_vec_perm_mask (rtx sel)
> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
>  {
>    unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>         return NULL_RTX;
>      }
>
> -  return GEN_INT (first * bitsize);
> +  return gen_int_shift_amount (op0_mode, first * bitsize);
>  }
>
>  /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
>           && (shift_code != CODE_FOR_nothing
>               || shift_code_qi != CODE_FOR_nothing))
>         {
> -         shift_amt = shift_amt_for_vec_perm_mask (sel);
> +         shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>           if (shift_amt)
>             {
>               struct expand_operand ops[3];
> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
>                                    NULL, 0, OPTAB_DIRECT);
>        else
>         sel = expand_simple_binop (selmode, ASHIFT, sel,
> -                                  GEN_INT (exact_log2 (u)),
> +                                  gen_int_shift_amount (selmode,
> +                                                        exact_log2 (u)),
>                                    NULL, 0, OPTAB_DIRECT);
>        gcc_assert (sel != NULL);
>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
  2017-10-26 11:53   ` Richard Biener
@ 2017-12-15  0:29   ` Richard Sandiford
  2017-12-15  8:58     ` Richard Biener
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-12-15  0:29 UTC (permalink / raw)
  To: gcc-patches

This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
isn't needed with the new VECTOR_CST layout.  It's really just the
original patch with bits removed, but just in case:

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install?

Richard


2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hawyard@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (VEC_DUPLICATE_EXPR): Document.
	(VEC_COND_EXPR): Add missing @tindex.
	* doc/md.texi (vec_duplicate@var{m}): Document.
	* tree.def (VEC_DUPLICATE_EXPR): New tree codes.
	* tree.c (build_vector_from_val): Add stubbed-out handling of
	variable-length vectors, using VEC_DUPLICATE_EXPR.
	(uniform_vector_p): Handle VEC_DUPLICATE_EXPR.
	* cfgexpand.c (expand_debug_expr): Likewise.
	* tree-cfg.c (verify_gimple_assign_unary): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
	* fold-const.c (const_unop): Fold VEC_DUPLICATE_EXPRs of a constant.
	(test_vec_duplicate_folding): New function.
	(fold_const_c_tests): Call it.
	* optabs.def (vec_duplicate_optab): New optab.
	* optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
	* optabs.h (expand_vector_broadcast): Declare.
	* optabs.c (expand_vector_broadcast): Make non-static.  Try using
	vec_duplicate_optab.
	* expr.c (store_constructor): Try using vec_duplicate_optab for
	uniform vectors.
	(expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-12-15 00:24:47.213516622 +0000
+++ gcc/doc/generic.texi	2017-12-15 00:24:47.498459276 +0000
@@ -1768,6 +1768,7 @@ a value from @code{enum annot_expr_kind}
 
 @node Vectors
 @subsection Vectors
+@tindex VEC_DUPLICATE_EXPR
 @tindex VEC_LSHIFT_EXPR
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1779,9 +1780,14 @@ a value from @code{enum annot_expr_kind}
 @tindex VEC_PACK_TRUNC_EXPR
 @tindex VEC_PACK_SAT_EXPR
 @tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_COND_EXPR
 @tindex SAD_EXPR
 
 @table @code
+@item VEC_DUPLICATE_EXPR
+This node has a single operand and represents a vector in which every
+element is equal to that operand.
+
 @item VEC_LSHIFT_EXPR
 @itemx VEC_RSHIFT_EXPR
 These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-12-15 00:24:47.213516622 +0000
+++ gcc/doc/md.texi	2017-12-15 00:24:47.499459075 +0000
@@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
 the vector mode @var{m}, or a vector mode with the same element mode and
 smaller number of elements.
 
+@cindex @code{vec_duplicate@var{m}} instruction pattern
+@item @samp{vec_duplicate@var{m}}
+Initialize vector output operand 0 so that each element has the value given
+by scalar input operand 1.  The vector has mode @var{m} and the scalar has
+the mode appropriate for one element of @var{m}.
+
+This pattern only handles duplicates of non-constant inputs.  Constant
+vectors go through the @code{mov@var{m}} pattern instead.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
 Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-12-15 00:24:47.213516622 +0000
+++ gcc/tree.def	2017-12-15 00:24:47.505457868 +0000
@@ -537,6 +537,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
    1 and 2 are NULL.  The operands are then taken from the cfg edges. */
 DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
 
+/* Represents a vector in which every element is equal to operand 0.  */
+DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/tree.c	2017-12-15 00:24:47.505457868 +0000
@@ -1785,6 +1785,8 @@ build_vector_from_val (tree vectype, tre
       v.quick_push (sc);
       return v.build ();
     }
+  else if (0)
+    return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
   else
     {
       vec<constructor_elt, va_gc> *v;
@@ -10468,7 +10470,10 @@ uniform_vector_p (const_tree vec)
 
   gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
 
-  if (TREE_CODE (vec) == VECTOR_CST)
+  if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
+    return TREE_OPERAND (vec, 0);
+
+  else if (TREE_CODE (vec) == VECTOR_CST)
     {
       if (VECTOR_CST_NPATTERNS (vec) == 1 && VECTOR_CST_DUPLICATE_P (vec))
 	return VECTOR_CST_ENCODED_ELT (vec, 0);
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/cfgexpand.c	2017-12-15 00:24:47.498459276 +0000
@@ -5069,6 +5069,7 @@ expand_debug_expr (tree exp)
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_PERM_EXPR:
+    case VEC_DUPLICATE_EXPR:
       return NULL;
 
     /* Misc codes.  */
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/tree-cfg.c	2017-12-15 00:24:47.503458270 +0000
@@ -3857,6 +3857,17 @@ verify_gimple_assign_unary (gassign *stm
     case CONJ_EXPR:
       break;
 
+    case VEC_DUPLICATE_EXPR:
+      if (TREE_CODE (lhs_type) != VECTOR_TYPE
+	  || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+	{
+	  error ("vec_duplicate should be from a scalar to a like vector");
+	  debug_generic_expr (lhs_type);
+	  debug_generic_expr (rhs1_type);
+	  return true;
+	}
+      return false;
+
     default:
       gcc_unreachable ();
     }
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/tree-inline.c	2017-12-15 00:24:47.504458069 +0000
@@ -3928,6 +3928,7 @@ estimate_operator_cost (enum tree_code c
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_DUPLICATE_EXPR:
 
       return 1;
 
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/tree-pretty-print.c	2017-12-15 00:24:47.504458069 +0000
@@ -3178,6 +3178,15 @@ dump_generic_node (pretty_printer *pp, t
       pp_string (pp, " > ");
       break;
 
+    case VEC_DUPLICATE_EXPR:
+      pp_space (pp);
+      for (str = get_tree_code_name (code); *str; str++)
+	pp_character (pp, TOUPPER (*str));
+      pp_string (pp, " < ");
+      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+      pp_string (pp, " > ");
+      break;
+
     case VEC_UNPACK_HI_EXPR:
       pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
       dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/tree-vect-generic.c	2017-12-15 00:24:47.504458069 +0000
@@ -1418,6 +1418,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
 ssa_uniform_vector_p (tree op)
 {
   if (TREE_CODE (op) == VECTOR_CST
+      || TREE_CODE (op) == VEC_DUPLICATE_EXPR
       || TREE_CODE (op) == CONSTRUCTOR)
     return uniform_vector_p (op);
   if (TREE_CODE (op) == SSA_NAME)
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/fold-const.c	2017-12-15 00:24:47.501458673 +0000
@@ -1771,6 +1771,11 @@ const_unop (enum tree_code code, tree ty
 	return elts.build ();
       }
 
+    case VEC_DUPLICATE_EXPR:
+      if (CONSTANT_CLASS_P (arg0))
+	return build_vector_from_val (type, arg0);
+      return NULL_TREE;
+
     default:
       break;
     }
@@ -14442,6 +14447,22 @@ test_vector_folding ()
   ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
 }
 
+/* Verify folding of VEC_DUPLICATE_EXPRs.  */
+
+static void
+test_vec_duplicate_folding ()
+{
+  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
+  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+  /* This will be 1 if VEC_MODE isn't a vector mode.  */
+  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+  tree type = build_vector_type (ssizetype, nunits);
+  tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
+  tree dup5_cst = build_vector_from_val (type, ssize_int (5));
+  ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -14449,6 +14470,7 @@ fold_const_c_tests ()
 {
   test_arithmetic_folding ();
   test_vector_folding ();
+  test_vec_duplicate_folding ();
 }
 
 } // namespace selftest
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-12-15 00:24:47.213516622 +0000
+++ gcc/optabs.def	2017-12-15 00:24:47.502458472 +0000
@@ -363,3 +363,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
 
 OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
 OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
+
+OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/optabs-tree.c	2017-12-15 00:24:47.501458673 +0000
@@ -199,6 +199,9 @@ optab_for_tree_code (enum tree_code code
       return TYPE_UNSIGNED (type) ?
 	vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
 
+    case VEC_DUPLICATE_EXPR:
+      return vec_duplicate_optab;
+
     default:
       break;
     }
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-12-15 00:24:47.213516622 +0000
+++ gcc/optabs.h	2017-12-15 00:24:47.502458472 +0000
@@ -182,6 +182,7 @@ extern rtx simplify_expand_binop (machin
 				  enum optab_methods methods);
 extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
 				enum optab_methods);
+extern rtx expand_vector_broadcast (machine_mode, rtx);
 
 /* Generate code for a simple binary or unary operation.  "Simple" in
    this case means "can be unambiguously described by a (mode, code)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/optabs.c	2017-12-15 00:24:47.502458472 +0000
@@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
    mode of OP must be the element mode of VMODE.  If OP is a constant,
    then the return value will be a constant.  */
 
-static rtx
+rtx
 expand_vector_broadcast (machine_mode vmode, rtx op)
 {
   enum insn_code icode;
@@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
   if (valid_for_const_vec_duplicate_p (vmode, op))
     return gen_const_vec_duplicate (vmode, op);
 
+  icode = optab_handler (vec_duplicate_optab, vmode);
+  if (icode != CODE_FOR_nothing)
+    {
+      struct expand_operand ops[2];
+      create_output_operand (&ops[0], NULL_RTX, vmode);
+      create_input_operand (&ops[1], op, GET_MODE (op));
+      expand_insn (icode, 2, ops);
+      return ops[0].value;
+    }
+
   /* ??? If the target doesn't have a vec_init, then we have no easy way
      of performing this operation.  Most of this sort of generic support
      is hidden away in the vector lowering support in gimple.  */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-12-15 00:24:47.213516622 +0000
+++ gcc/expr.c	2017-12-15 00:24:47.500458874 +0000
@@ -6598,7 +6598,8 @@ store_constructor (tree exp, rtx target,
 	constructor_elt *ce;
 	int i;
 	int need_to_clear;
-	int icode = CODE_FOR_nothing;
+	insn_code icode = CODE_FOR_nothing;
+	tree elt;
 	tree elttype = TREE_TYPE (type);
 	int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
 	machine_mode eltmode = TYPE_MODE (elttype);
@@ -6608,13 +6609,30 @@ store_constructor (tree exp, rtx target,
 	unsigned n_elts;
 	alias_set_type alias;
 	bool vec_vec_init_p = false;
+	machine_mode mode = GET_MODE (target);
 
 	gcc_assert (eltmode != BLKmode);
 
+	/* Try using vec_duplicate_optab for uniform vectors.  */
+	if (!TREE_SIDE_EFFECTS (exp)
+	    && VECTOR_MODE_P (mode)
+	    && eltmode == GET_MODE_INNER (mode)
+	    && ((icode = optab_handler (vec_duplicate_optab, mode))
+		!= CODE_FOR_nothing)
+	    && (elt = uniform_vector_p (exp)))
+	  {
+	    struct expand_operand ops[2];
+	    create_output_operand (&ops[0], target, mode);
+	    create_input_operand (&ops[1], expand_normal (elt), eltmode);
+	    expand_insn (icode, 2, ops);
+	    if (!rtx_equal_p (target, ops[0].value))
+	      emit_move_insn (target, ops[0].value);
+	    break;
+	  }
+
 	n_elts = TYPE_VECTOR_SUBPARTS (type);
-	if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
+	if (REG_P (target) && VECTOR_MODE_P (mode))
 	  {
-	    machine_mode mode = GET_MODE (target);
 	    machine_mode emode = eltmode;
 
 	    if (CONSTRUCTOR_NELTS (exp)
@@ -6626,7 +6644,7 @@ store_constructor (tree exp, rtx target,
 			    == n_elts);
 		emode = TYPE_MODE (etype);
 	      }
-	    icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
+	    icode = convert_optab_handler (vec_init_optab, mode, emode);
 	    if (icode != CODE_FOR_nothing)
 	      {
 		unsigned int i, n = n_elts;
@@ -6674,7 +6692,7 @@ store_constructor (tree exp, rtx target,
 	if (need_to_clear && size > 0 && !vector)
 	  {
 	    if (REG_P (target))
-	      emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+	      emit_move_insn (target, CONST0_RTX (mode));
 	    else
 	      clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
 	    cleared = 1;
@@ -6682,7 +6700,7 @@ store_constructor (tree exp, rtx target,
 
 	/* Inform later passes that the old value is dead.  */
 	if (!cleared && !vector && REG_P (target))
-	  emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+	  emit_move_insn (target, CONST0_RTX (mode));
 
         if (MEM_P (target))
 	  alias = MEM_ALIAS_SET (target);
@@ -6733,8 +6751,7 @@ store_constructor (tree exp, rtx target,
 
 	if (vector)
 	  emit_insn (GEN_FCN (icode) (target,
-				      gen_rtx_PARALLEL (GET_MODE (target),
-							vector)));
+				      gen_rtx_PARALLEL (mode, vector)));
 	break;
       }
 
@@ -9563,6 +9580,12 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
       return target;
 
+    case VEC_DUPLICATE_EXPR:
+      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
+      target = expand_vector_broadcast (mode, op0);
+      gcc_assert (target);
+      return target;
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab Richard Sandiford
  2017-10-26 12:26   ` Richard Biener
@ 2017-12-15  0:34   ` Richard Sandiford
  2017-12-15  9:03     ` Richard Biener
  1 sibling, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-12-15  0:34 UTC (permalink / raw)
  To: gcc-patches

Similarly to the update 05 patch, this patch just adds VEC_SERIES_EXPR,
since the VEC_SERIES_CST isn't needed with the new VECTOR_CST layout.
build_vec_series now uses the new VECTOR_CST layout, but otherwise
this is just the original patch with bits removed.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
OK to install?

Richard


2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* doc/generic.texi (VEC_SERIES_EXPR): Document.
	* doc/md.texi (vec_series@var{m}): Document.
	* tree.def (VEC_SERIES_EXPR): New tree code.
	* tree.h (build_vec_series): Declare.
	* tree.c (build_vec_series): New function.
	* cfgexpand.c (expand_debug_expr): Handle VEC_SERIES_EXPR.
	* tree-pretty-print.c (dump_generic_node): Likewise.
	* gimple-pretty-print.c (dump_binary_rhs): Likewise.
	* tree-inline.c (estimate_operator_cost): Likewise.
	* expr.c (expand_expr_real_2): Likewise.
	* optabs-tree.c (optab_for_tree_code): Likewise.
	* tree-cfg.c (verify_gimple_assign_binary): Likewise.
	* fold-const.c (const_binop): Fold VEC_SERIES_EXPRs of constants.
	* expmed.c (make_tree): Handle VEC_SERIES.
	* optabs.def (vec_series_optab): New optab.
	* optabs.h (expand_vec_series_expr): Declare.
	* optabs.c (expand_vec_series_expr): New function.
	* tree-vect-generic.c (expand_vector_operations_1): Check that
	the operands also have vector type.

Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi	2017-12-15 00:30:46.596993903 +0000
+++ gcc/doc/generic.texi	2017-12-15 00:30:46.911991495 +0000
@@ -1769,6 +1769,7 @@ a value from @code{enum annot_expr_kind}
 @node Vectors
 @subsection Vectors
 @tindex VEC_DUPLICATE_EXPR
+@tindex VEC_SERIES_EXPR
 @tindex VEC_LSHIFT_EXPR
 @tindex VEC_RSHIFT_EXPR
 @tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1788,6 +1789,14 @@ a value from @code{enum annot_expr_kind}
 This node has a single operand and represents a vector in which every
 element is equal to that operand.
 
+@item VEC_SERIES_EXPR
+This node represents a vector formed from a scalar base and step,
+given as the first and second operands respectively.  Element @var{i}
+of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
+
+This node is restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
 @item VEC_LSHIFT_EXPR
 @itemx VEC_RSHIFT_EXPR
 These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi	2017-12-15 00:30:46.596993903 +0000
+++ gcc/doc/md.texi	2017-12-15 00:30:46.912991487 +0000
@@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
 
 This pattern is not allowed to @code{FAIL}.
 
+@cindex @code{vec_series@var{m}} instruction pattern
+@item @samp{vec_series@var{m}}
+Initialize vector output operand 0 so that element @var{i} is equal to
+operand 1 plus @var{i} times operand 2.  In other words, create a linear
+series whose base value is operand 1 and whose step is operand 2.
+
+The vector output has mode @var{m} and the scalar inputs have the mode
+appropriate for one element of @var{m}.  This pattern is not used for
+floating-point vectors, in order to avoid having to specify the
+rounding behavior for @var{i} > 1.
+
+This pattern is not allowed to @code{FAIL}.
+
 @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
 @item @samp{vec_cmp@var{m}@var{n}}
 Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree.def	2017-12-15 00:30:46.919991433 +0000
@@ -540,6 +540,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
 /* Represents a vector in which every element is equal to operand 0.  */
 DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
 
+/* Vector series created from a start (base) value and a step.
+
+   A = VEC_SERIES_EXPR (B, C)
+
+   means
+
+   for (i = 0; i < N; i++)
+     A[i] = B + C * i;  */
+DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
+
 /* Vector conditional expression. It is like COND_EXPR, but with
    vector operands.
 
Index: gcc/tree.h
===================================================================
--- gcc/tree.h	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree.h	2017-12-15 00:30:46.919991433 +0000
@@ -4052,6 +4052,7 @@ extern tree build_int_cst_type (tree, HO
 extern tree make_vector (unsigned, unsigned CXX_MEM_STAT_INFO);
 extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
 extern tree build_vector_from_val (tree, tree);
+extern tree build_vec_series (tree, tree, tree);
 extern void recompute_constructor_flags (tree);
 extern void verify_constructor_flags (tree);
 extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree.c	2017-12-15 00:30:46.918991441 +0000
@@ -1797,6 +1797,30 @@ build_vector_from_val (tree vectype, tre
     }
 }
 
+/* Build a vector series of type TYPE in which element I has the value
+   BASE + I * STEP.  The result is a constant if BASE and STEP are constant
+   and a VEC_SERIES_EXPR otherwise.  */
+
+tree
+build_vec_series (tree type, tree base, tree step)
+{
+  if (integer_zerop (step))
+    return build_vector_from_val (type, base);
+  if (TREE_CODE (base) == INTEGER_CST && TREE_CODE (step) == INTEGER_CST)
+    {
+      tree_vector_builder builder (type, 1, 3);
+      tree elt1 = wide_int_to_tree (TREE_TYPE (base),
+				    wi::to_wide (base) + wi::to_wide (step));
+      tree elt2 = wide_int_to_tree (TREE_TYPE (base),
+				    wi::to_wide (elt1) + wi::to_wide (step));
+      builder.quick_push (base);
+      builder.quick_push (elt1);
+      builder.quick_push (elt2);
+      return builder.build ();
+    }
+  return build2 (VEC_SERIES_EXPR, type, base, step);
+}
+
 /* Something has messed with the elements of CONSTRUCTOR C after it was built;
    calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
 
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/cfgexpand.c	2017-12-15 00:30:46.911991495 +0000
@@ -5070,6 +5070,7 @@ expand_debug_expr (tree exp)
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_PERM_EXPR:
     case VEC_DUPLICATE_EXPR:
+    case VEC_SERIES_EXPR:
       return NULL;
 
     /* Misc codes.  */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree-pretty-print.c	2017-12-15 00:30:46.917991449 +0000
@@ -3162,6 +3162,7 @@ dump_generic_node (pretty_printer *pp, t
       is_expr = false;
       break;
 
+    case VEC_SERIES_EXPR:
     case VEC_WIDEN_MULT_HI_EXPR:
     case VEC_WIDEN_MULT_LO_EXPR:
     case VEC_WIDEN_MULT_EVEN_EXPR:
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/gimple-pretty-print.c	2017-12-15 00:30:46.915991464 +0000
@@ -431,6 +431,7 @@ dump_binary_rhs (pretty_printer *buffer,
     case VEC_PACK_FIX_TRUNC_EXPR:
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
+    case VEC_SERIES_EXPR:
       for (p = get_tree_code_name (code); *p; p++)
 	pp_character (buffer, TOUPPER (*p));
       pp_string (buffer, " <");
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree-inline.c	2017-12-15 00:30:46.917991449 +0000
@@ -3929,6 +3929,7 @@ estimate_operator_cost (enum tree_code c
     case VEC_WIDEN_LSHIFT_HI_EXPR:
     case VEC_WIDEN_LSHIFT_LO_EXPR:
     case VEC_DUPLICATE_EXPR:
+    case VEC_SERIES_EXPR:
 
       return 1;
 
Index: gcc/expr.c
===================================================================
--- gcc/expr.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/expr.c	2017-12-15 00:30:46.914991472 +0000
@@ -9586,6 +9586,10 @@ #define REDUCE_BIT_FIELD(expr)	(reduce_b
       gcc_assert (target);
       return target;
 
+    case VEC_SERIES_EXPR:
+      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
+      return expand_vec_series_expr (mode, op0, op1, target);
+
     case BIT_INSERT_EXPR:
       {
 	unsigned bitpos = tree_to_uhwi (treeop2);
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/optabs-tree.c	2017-12-15 00:30:46.915991464 +0000
@@ -202,6 +202,9 @@ optab_for_tree_code (enum tree_code code
     case VEC_DUPLICATE_EXPR:
       return vec_duplicate_optab;
 
+    case VEC_SERIES_EXPR:
+      return vec_series_optab;
+
     default:
       break;
     }
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree-cfg.c	2017-12-15 00:30:46.917991449 +0000
@@ -4194,6 +4194,23 @@ verify_gimple_assign_binary (gassign *st
       /* Continue with generic binary expression handling.  */
       break;
 
+    case VEC_SERIES_EXPR:
+      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
+	{
+	  error ("type mismatch in series expression");
+	  debug_generic_expr (rhs1_type);
+	  debug_generic_expr (rhs2_type);
+	  return true;
+	}
+      if (TREE_CODE (lhs_type) != VECTOR_TYPE
+	  || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+	{
+	  error ("vector type expected in series expression");
+	  debug_generic_expr (lhs_type);
+	  return true;
+	}
+      return false;
+
     default:
       gcc_unreachable ();
     }
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/fold-const.c	2017-12-15 00:30:46.915991464 +0000
@@ -1527,6 +1527,12 @@ const_binop (enum tree_code code, tree t
      result as argument put those cases that need it here.  */
   switch (code)
     {
+    case VEC_SERIES_EXPR:
+      if (CONSTANT_CLASS_P (arg1)
+	  && CONSTANT_CLASS_P (arg2))
+	return build_vec_series (type, arg1, arg2);
+      return NULL_TREE;
+
     case COMPLEX_EXPR:
       if ((TREE_CODE (arg1) == REAL_CST
 	   && TREE_CODE (arg2) == REAL_CST)
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/expmed.c	2017-12-15 00:30:46.913991479 +0000
@@ -5255,6 +5255,13 @@ make_tree (tree type, rtx x)
 	    tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
 	    return build_vector_from_val (type, elt_tree);
 	  }
+	if (GET_CODE (op) == VEC_SERIES)
+	  {
+	    tree itype = TREE_TYPE (type);
+	    tree base_tree = make_tree (itype, XEXP (op, 0));
+	    tree step_tree = make_tree (itype, XEXP (op, 1));
+	    return build_vec_series (type, base_tree, step_tree);
+	  }
 	return make_tree (type, op);
       }
 
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def	2017-12-15 00:30:46.596993903 +0000
+++ gcc/optabs.def	2017-12-15 00:30:46.916991456 +0000
@@ -365,3 +365,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
 OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
 
 OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
+OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h	2017-12-15 00:30:46.596993903 +0000
+++ gcc/optabs.h	2017-12-15 00:30:46.916991456 +0000
@@ -319,6 +319,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
 /* Generate code for VEC_COND_EXPR.  */
 extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
 
+/* Generate code for VEC_SERIES_EXPR.  */
+extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
+
 /* Generate code for MULT_HIGHPART_EXPR.  */
 extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/optabs.c	2017-12-15 00:30:46.916991456 +0000
@@ -5768,6 +5768,27 @@ expand_vec_cond_expr (tree vec_cond_type
   return ops[0].value;
 }
 
+/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
+   Use TARGET for the result if nonnull and convenient.  */
+
+rtx
+expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
+{
+  struct expand_operand ops[3];
+  enum insn_code icode;
+  machine_mode emode = GET_MODE_INNER (vmode);
+
+  icode = direct_optab_handler (vec_series_optab, vmode);
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  create_output_operand (&ops[0], target, vmode);
+  create_input_operand (&ops[1], op0, emode);
+  create_input_operand (&ops[2], op1, emode);
+
+  expand_insn (icode, 3, ops);
+  return ops[0].value;
+}
+
 /* Generate insns for a vector comparison into a mask.  */
 
 rtx
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-12-15 00:30:46.596993903 +0000
+++ gcc/tree-vect-generic.c	2017-12-15 00:30:46.918991441 +0000
@@ -1594,7 +1594,8 @@ expand_vector_operations_1 (gimple_stmt_
   if (rhs_class == GIMPLE_BINARY_RHS)
     rhs2 = gimple_assign_rhs2 (stmt);
 
-  if (TREE_CODE (type) != VECTOR_TYPE)
+  if (!VECTOR_TYPE_P (type)
+      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
     return;
 
   /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-11-21 15:00         ` Richard Biener
@ 2017-12-15  0:48           ` Richard Sandiford
  2017-12-15  9:06             ` Richard Biener
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-12-15  0:48 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>>> <richard.guenther@gmail.com> wrote:
>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> This patch adds a stub helper routine to provide the mode
>>>>> of a scalar shift amount, given the mode of the values
>>>>> being shifted.
>>>>>
>>>>> One long-standing problem has been to decide what this mode
>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>> the corresponding target pattern says?  (In which case what
>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>
>>>>> For now the patch picks word_mode, which should be safe on
>>>>> all targets but could perhaps become suboptimal if the helper
>>>>> routine is used more often than it is in this patch.  As it
>>>>> stands the patch does not change the generated code.
>>>>>
>>>>> The patch also adds a helper function that constructs rtxes
>>>>> for constant shift amounts, again given the mode of the value
>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>
>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>> looks good to me I'd rather have that ??? in this function (and
>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>
>> OK.  I'd gone for word_mode because that's what expand_binop uses
>> for CONST_INTs:
>>
>>       op1_mode = (GET_MODE (op1) != VOIDmode
>>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>>                   : word_mode);
>>
>> But using the inner mode should be fine too.  The patch below does that.
>>
>>>> In the end it's up to insn recognizing to convert the op to the
>>>> expected mode and for generic RTL it's us that should decide
>>>> on the mode -- on GENERIC the shift amount has to be an
>>>> integer so why not simply use a mode that is large enough to
>>>> make the constant fit?
>>
>> ...but I can do that instead if you think it's better.
>>
>>>> Just throwing in some comments here, RTL isn't my primary
>>>> expertise.
>>>
>>> To add a little bit - shift amounts is maybe the only(?) place
>>> where a modeless CONST_INT makes sense!  So "fixing"
>>> that first sounds backwards.
>>
>> But even here they have a mode conceptually, since out-of-range shift
>> amounts are target-defined rather than undefined.  E.g. if the target
>> interprets the shift amount as unsigned, then for a shift amount
>> (const_int -1) it matters whether the mode is QImode (and so we're
>> shifting by 255) or HImode (and so we're shifting by 65535.
>
> I think RTL is well-defined (at least I hope so ...) and machine constraints
> need to be modeled explicitely (like embedding an implicit bit_and in
> shift patterns).

Well, RTL is well-defined in the sense that if you have

  (ashift X (foo:HI ...))

then the shift amount must be interpreted as HImode rather than some
other mode.  The problem here is to define a default choice of mode for
const_ints, in cases where the shift is being created out of the blue.

Whether the shift amount is effectively signed or unsigned isn't defined
by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
out-of-range values, and the behaviour for out-of-range RTL shifts is
specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.

I think the revised patch does implement your suggestion of using the
integer equivalent of the inner mode as the default, but we need to
decide whether to go with it, go with the original word_mode approach
(taken from existing expand_binop code) or something else.  Something
else could include the widest supported integer mode, so that we never
change the value.

Thanks,
Richard

>> OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)
>>
>> Jeff Law <law@redhat.com> writes:
>>> On 10/26/2017 06:06 AM, Richard Biener wrote:
>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> This patch adds a stub helper routine to provide the mode
>>>>> of a scalar shift amount, given the mode of the values
>>>>> being shifted.
>>>>>
>>>>> One long-standing problem has been to decide what this mode
>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>> the corresponding target pattern says?  (In which case what
>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>
>>>>> For now the patch picks word_mode, which should be safe on
>>>>> all targets but could perhaps become suboptimal if the helper
>>>>> routine is used more often than it is in this patch.  As it
>>>>> stands the patch does not change the generated code.
>>>>>
>>>>> The patch also adds a helper function that constructs rtxes
>>>>> for constant shift amounts, again given the mode of the value
>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>
>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>> looks good to me I'd rather have that ??? in this function (and
>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>>
>>>> In the end it's up to insn recognizing to convert the op to the
>>>> expected mode and for generic RTL it's us that should decide
>>>> on the mode -- on GENERIC the shift amount has to be an
>>>> integer so why not simply use a mode that is large enough to
>>>> make the constant fit?
>>>>
>>>> Just throwing in some comments here, RTL isn't my primary
>>>> expertise.
>>> I wonder if encapsulation + a target hook to specify the mode would be
>>> better?  We'd then have to argue over word_mode, vs QImode vs something
>>> else for the default, but at least we'd have a way for the target to
>>> specify the mode is generally best when working on shift counts.
>>>
>>> In the end I doubt there's a single definition that is overall better.
>>> Largely because I suspect there are times when the narrowest mode is
>>> best, or the mode of the operand being shifted.
>>>
>>> So thoughts on doing the encapsulation with a target hook to specify the
>>> desired mode?  Does that get us what we need for SVE and does it provide
>>> us a path forward on this issue if we were to try to move towards
>>> CONST_INTs with modes?
>>
>> I think it'd better to do that only if we have a use case, since
>> it's hard to predict what the best way of handling it is until then.
>> E.g. I'd still like to hold out the possibility of doing this automatically
>> from the .md file instead, if some kind of override ends up being necessary.
>>
>> Like you say, we have to argue over the default either way, and I think
>> that's been the sticking point.
>>
>> Thanks,
>> Richard
>>
>>
>> 2017-11-20  Richard Sandiford  <richard.sandiford@linaro.org>
>>             Alan Hayward  <alan.hayward@arm.com>
>>             David Sherwood  <david.sherwood@arm.com>
>>
>> gcc/
>>         * emit-rtl.h (gen_int_shift_amount): Declare.
>>         * emit-rtl.c (gen_int_shift_amount): New function.
>>         * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>>         instead of GEN_INT.
>>         * calls.c (shift_return_value): Likewise.
>>         * cse.c (fold_rtx): Likewise.
>>         * dse.c (find_shift_sequence): Likewise.
>>         * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>>         (expand_shift, expand_smod_pow2): Likewise.
>>         * lower-subreg.c (shift_cost): Likewise.
>>         * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>>         (simplify_binary_operation_1): Likewise.
>>         * combine.c (try_combine, find_split_point, force_int_to_mode)
>>         (simplify_shift_const_1, simplify_shift_const): Likewise.
>>         (change_zero_ext): Likewise.  Use simplify_gen_binary.
>>         * optabs.c (expand_superword_shift, expand_doubleword_mult)
>>         (expand_unop, expand_binop): Use gen_int_shift_amount instead
>>         of GEN_INT.
>>         (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>>         Use gen_int_shift_amount instead of GEN_INT.
>>         (expand_vec_perm): Update caller accordingly.  Use
>>         gen_int_shift_amount instead of GEN_INT.
>>
>> Index: gcc/emit-rtl.h
>> ===================================================================
>> --- gcc/emit-rtl.h      2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/emit-rtl.h      2017-11-20 20:37:51.661320782 +0000
>> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>>  extern void adjust_reg_mode (rtx, machine_mode);
>>  extern int mem_expr_equal_p (const_tree, const_tree);
>> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>>
>>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>>
>> Index: gcc/emit-rtl.c
>> ===================================================================
>> --- gcc/emit-rtl.c      2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/emit-rtl.c      2017-11-20 20:37:51.660320782 +0000
>> @@ -6507,6 +6507,24 @@ need_atomic_barrier_p (enum memmodel mod
>>      }
>>  }
>>
>> +/* Return a constant shift amount for shifting a value of mode MODE
>> +   by VALUE bits.  */
>> +
>> +rtx
>> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
>> +{
>> +  /* ??? Using the inner mode should be wide enough for all useful
>> +     cases (e.g. QImode usually has 8 shiftable bits, while a QImode
>> +     shift amount has a range of [-128, 127]).  But in principle
>> +     a target could require target-dependent behaviour for a
>> +     shift whose shift amount is wider than the shifted value.
>> +     Perhaps this should be automatically derived from the .md
>> +     files instead, or perhaps have a target hook.  */
>> +  scalar_int_mode shift_mode
>> +    = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
>> +  return gen_int_mode (value, shift_mode);
>> +}
>> +
>>  /* Initialize fields of rtl_data related to stack alignment.  */
>>
>>  void
>> Index: gcc/asan.c
>> ===================================================================
>> --- gcc/asan.c  2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/asan.c  2017-11-20 20:37:51.657320781 +0000
>> @@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
>>    TREE_ASM_WRITTEN (id) = 1;
>>    emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>>    shadow_base = expand_binop (Pmode, lshr_optab, base,
>> -                             GEN_INT (ASAN_SHADOW_SHIFT),
>> +                             gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>>                               NULL_RTX, 1, OPTAB_DIRECT);
>>    shadow_base
>>      = plus_constant (Pmode, shadow_base,
>> Index: gcc/calls.c
>> ===================================================================
>> --- gcc/calls.c 2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/calls.c 2017-11-20 20:37:51.657320781 +0000
>> @@ -2742,15 +2742,17 @@ shift_return_value (machine_mode mode, b
>>    HOST_WIDE_INT shift;
>>
>>    gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
>> -  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
>> +  machine_mode value_mode = GET_MODE (value);
>> +  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
>>    if (shift == 0)
>>      return false;
>>
>>    /* Use ashr rather than lshr for right shifts.  This is for the benefit
>>       of the MIPS port, which requires SImode values to be sign-extended
>>       when stored in 64-bit registers.  */
>> - if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab :
> ashr_optab,
>> -                          value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
>> +  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
>> +                          value, gen_int_shift_amount (value_mode, shift),
>> +                          value, 1, OPTAB_WIDEN))
>>      gcc_unreachable ();
>>    return true;
>>  }
>> Index: gcc/cse.c
>> ===================================================================
>> --- gcc/cse.c   2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/cse.c   2017-11-20 20:37:51.660320782 +0000
>> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>>                       || INTVAL (const_arg1) < 0))
>>                 {
>>                   if (SHIFT_COUNT_TRUNCATED)
>> -                   canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
>> - & (GET_MODE_UNIT_BITSIZE (mode)
>> -                                                  - 1));
>> +                   canon_const_arg1 = gen_int_shift_amount
>> +                     (mode, (INTVAL (const_arg1)
>> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>>                   else
>>                     break;
>>                 }
>> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>>                       || INTVAL (inner_const) < 0))
>>                 {
>>                   if (SHIFT_COUNT_TRUNCATED)
>> -                   inner_const = GEN_INT (INTVAL (inner_const)
>> -                                          & (GET_MODE_UNIT_BITSIZE (mode)
>> -                                             - 1));
>> +                   inner_const = gen_int_shift_amount
>> +                     (mode, (INTVAL (inner_const)
>> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>>                   else
>>                     break;
>>                 }
>> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
>>                   /* As an exception, we can turn an ASHIFTRT of this
>>                      form into a shift of the number of bits - 1.  */
>>                   if (code == ASHIFTRT)
>> -                   new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
>> +                   new_const = gen_int_shift_amount
>> +                     (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
>>                   else if (!side_effects_p (XEXP (y, 0)))
>>                     return CONST0_RTX (mode);
>>                   else
>> Index: gcc/dse.c
>> ===================================================================
>> --- gcc/dse.c   2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/dse.c   2017-11-20 20:37:51.660320782 +0000
>> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
>>                                      store_mode, byte);
>>           if (ret && CONSTANT_P (ret))
>>             {
>> +             rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
>>               ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
>> -                                                    ret, GEN_INT (shift));
>> +                                                    ret, shift_rtx);
>>               if (ret && CONSTANT_P (ret))
>>                 {
>>                   byte = subreg_lowpart_offset (read_mode, new_mode);
>> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
>>          of one dsp where the cost of these two was not the same.  But
>>          this really is a rare case anyway.  */
>>        target = expand_binop (new_mode, lshr_optab, new_reg,
>> -                            GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
>> +                            gen_int_shift_amount (new_mode, shift),
>> +                            new_reg, 1, OPTAB_DIRECT);
>>
>>        shift_seq = get_insns ();
>>        end_sequence ();
>> Index: gcc/expmed.c
>> ===================================================================
>> --- gcc/expmed.c        2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/expmed.c        2017-11-20 20:37:51.661320782 +0000
>> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
>>           PUT_MODE (all->zext, wider_mode);
>>           PUT_MODE (all->wide_mult, wider_mode);
>>           PUT_MODE (all->wide_lshr, wider_mode);
>> -         XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
>> +         XEXP (all->wide_lshr, 1)
>> +           = gen_int_shift_amount (wider_mode, mode_bitsize);
>>
>>           set_mul_widen_cost (speed, wider_mode,
>> set_src_cost (all->wide_mult, wider_mode, speed));
>> @@ -909,12 +910,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
>>              to make sure that for big-endian machines the higher order
>>              bits are used.  */
>>           if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
>> -           value_word = simplify_expand_binop (word_mode, lshr_optab,
>> -                                               value_word,
>> -                                               GEN_INT (BITS_PER_WORD
>> -                                                        - new_bitsize),
>> -                                               NULL_RTX, true,
>> -                                               OPTAB_LIB_WIDEN);
>> +           {
>> +             int shift = BITS_PER_WORD - new_bitsize;
>> +             rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
>> +             value_word = simplify_expand_binop (word_mode, lshr_optab,
>> +                                                 value_word, shift_rtx,
>> +                                                 NULL_RTX, true,
>> +                                                 OPTAB_LIB_WIDEN);
>> +           }
>>
>>           if (!store_bit_field_1 (op0, new_bitsize,
>>                                   bitnum + bit_offset,
>> @@ -2365,8 +2368,9 @@ expand_shift_1 (enum tree_code code, mac
>>        if (CONST_INT_P (op1)
>>           && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
>>               (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
>> -       op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
>> -                      % GET_MODE_BITSIZE (scalar_mode));
>> +       op1 = gen_int_shift_amount (mode,
>> +                                   (unsigned HOST_WIDE_INT) INTVAL (op1)
>> +                                   % GET_MODE_BITSIZE (scalar_mode));
>>        else if (GET_CODE (op1) == SUBREG
>>                && subreg_lowpart_p (op1)
>>                && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
>> @@ -2383,7 +2387,8 @@ expand_shift_1 (enum tree_code code, mac
>>        && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
>>                    GET_MODE_BITSIZE (scalar_mode) - 1))
>>      {
>> -      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>> +      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
>> +                                        - INTVAL (op1)));
>>        left = !left;
>>        code = left ? LROTATE_EXPR : RROTATE_EXPR;
>>      }
>> @@ -2463,8 +2468,8 @@ expand_shift_1 (enum tree_code code, mac
>>               if (op1 == const0_rtx)
>>                 return shifted;
>>               else if (CONST_INT_P (op1))
>> -               other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
>> -                                       - INTVAL (op1));
>> +               other_amount = gen_int_shift_amount
>> +                 (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>>               else
>>                 {
>>                   other_amount
>> @@ -2537,8 +2542,9 @@ expand_shift_1 (enum tree_code code, mac
>>  expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>>               int amount, rtx target, int unsignedp)
>>  {
>> -  return expand_shift_1 (code, mode,
>> -                        shifted, GEN_INT (amount), target, unsignedp);
>> +  return expand_shift_1 (code, mode, shifted,
>> +                        gen_int_shift_amount (mode, amount),
>> +                        target, unsignedp);
>>  }
>>
>>  /* Likewise, but return 0 if that cannot be done.  */
>> @@ -3856,7 +3862,7 @@ expand_smod_pow2 (scalar_int_mode mode,
>>         {
>>           HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
>>           signmask = force_reg (mode, signmask);
>> -         shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
>> +         shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>>
>>           /* Use the rtx_cost of a LSHIFTRT instruction to determine
>>              which instruction sequence to use.  If logical right shifts
>> Index: gcc/lower-subreg.c
>> ===================================================================
>> --- gcc/lower-subreg.c  2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/lower-subreg.c  2017-11-20 20:37:51.661320782 +0000
>> @@ -141,7 +141,7 @@ shift_cost (bool speed_p, struct cost_rt
>>    PUT_CODE (rtxes->shift, code);
>>    PUT_MODE (rtxes->shift, mode);
>>    PUT_MODE (rtxes->source, mode);
>> -  XEXP (rtxes->shift, 1) = GEN_INT (op1);
>> +  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
>>    return set_src_cost (rtxes->shift, mode, speed_p);
>>  }
>>
>> Index: gcc/simplify-rtx.c
>> ===================================================================
>> --- gcc/simplify-rtx.c  2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/simplify-rtx.c  2017-11-20 20:37:51.663320783 +0000
>> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
>>           if (STORE_FLAG_VALUE == 1)
>>             {
>>               temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
>> -                                         GEN_INT (isize - 1));
>> +                                         gen_int_shift_amount (inner,
>> +                                                               isize - 1));
>>               if (int_mode == inner)
>>                 return temp;
>>               if (GET_MODE_PRECISION (int_mode) > isize)
>> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
>>           else if (STORE_FLAG_VALUE == -1)
>>             {
>>               temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
>> -                                         GEN_INT (isize - 1));
>> +                                         gen_int_shift_amount (inner,
>> +                                                               isize - 1));
>>               if (int_mode == inner)
>>                 return temp;
>>               if (GET_MODE_PRECISION (int_mode) > isize)
>> @@ -2672,7 +2674,8 @@ simplify_binary_operation_1 (enum rtx_co
>>         {
>>           val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>>           if (val >= 0)
>> -           return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
>> +           return simplify_gen_binary (ASHIFT, mode, op0,
>> +                                       gen_int_shift_amount (mode, val));
>>         }
>>
>>        /* x*2 is x+x and x*(-1) is -x */
>> @@ -3296,7 +3299,8 @@ simplify_binary_operation_1 (enum rtx_co
>>        /* Convert divide by power of two into shift.  */
>>        if (CONST_INT_P (trueop1)
>>           && (val = exact_log2 (UINTVAL (trueop1))) > 0)
>> -       return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
>> +       return simplify_gen_binary (LSHIFTRT, mode, op0,
>> +                                   gen_int_shift_amount (mode, val));
>>        break;
>>
>>      case DIV:
>> @@ -3416,10 +3420,12 @@ simplify_binary_operation_1 (enum rtx_co
>>           && IN_RANGE (INTVAL (trueop1),
>>                        GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
>>                        GET_MODE_UNIT_PRECISION (mode) - 1))
>> -       return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>> -                                   mode, op0,
>> -                                   GEN_INT (GET_MODE_UNIT_PRECISION (mode)
>> -                                            - INTVAL (trueop1)));
>> +       {
>> +         int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
>> +         rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
>> +         return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>> +                                     mode, op0, new_amount_rtx);
>> +       }
>>  #endif
>>        /* FALLTHRU */
>>      case ASHIFTRT:
>> @@ -3460,8 +3466,8 @@ simplify_binary_operation_1 (enum rtx_co
>>               == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
>>           && subreg_lowpart_p (op0))
>>         {
>> -         rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
>> -                            + INTVAL (op1));
>> +         rtx tmp = gen_int_shift_amount
>> +           (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
>>           tmp = simplify_gen_binary (code, inner_mode,
>>                                      XEXP (SUBREG_REG (op0), 0),
>>                                      tmp);
>> @@ -3472,7 +3478,8 @@ simplify_binary_operation_1 (enum rtx_co
>>         {
>>           val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
>>           if (val != INTVAL (op1))
>> -           return simplify_gen_binary (code, mode, op0, GEN_INT (val));
>> +           return simplify_gen_binary (code, mode, op0,
>> +                                       gen_int_shift_amount (mode, val));
>>         }
>>        break;
>>
>> Index: gcc/combine.c
>> ===================================================================
>> --- gcc/combine.c       2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/combine.c       2017-11-20 20:37:51.659320782 +0000
>> @@ -3792,8 +3792,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>>               && INTVAL (XEXP (*split, 1)) > 0
>>               && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
>>             {
>> +             rtx i_rtx = gen_int_shift_amount (split_mode, i);
>>               SUBST (*split, gen_rtx_ASHIFT (split_mode,
>> -                                            XEXP (*split, 0), GEN_INT (i)));
>> +                                            XEXP (*split, 0), i_rtx));
>>               /* Update split_code because we may not have a multiply
>>                  anymore.  */
>>               split_code = GET_CODE (*split);
>> @@ -3807,8 +3808,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>>               && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>>             {
>>               rtx nsplit = XEXP (*split, 0);
>> +             rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
>>               SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
>> -                                            XEXP (nsplit, 0), GEN_INT (i)));
>> +                                                      XEXP (nsplit, 0),
>> +                                                      i_rtx));
>>               /* Update split_code because we may not have a multiply
>>                  anymore.  */
>>               split_code = GET_CODE (*split);
>> @@ -5077,12 +5080,12 @@ find_split_point (rtx *loc, rtx_insn *in
>>                                       GET_MODE (XEXP (SET_SRC (x), 0))))))
>>             {
>>               machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
>> -
>> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>>               SUBST (SET_SRC (x),
>>                      gen_rtx_NEG (mode,
>>                                   gen_rtx_LSHIFTRT (mode,
>>                                                     XEXP (SET_SRC (x), 0),
>> -                                                   GEN_INT (pos))));
>> +                                                   pos_rtx)));
>>
>>               split = find_split_point (&SET_SRC (x), insn, true);
>>               if (split && split != &SET_SRC (x))
>> @@ -5140,11 +5143,11 @@ find_split_point (rtx *loc, rtx_insn *in
>>             {
>>               unsigned HOST_WIDE_INT mask
>>                 = (HOST_WIDE_INT_1U << len) - 1;
>> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>>               SUBST (SET_SRC (x),
>>                      gen_rtx_AND (mode,
>>                                   gen_rtx_LSHIFTRT
>> -                                 (mode, gen_lowpart (mode, inner),
>> -                                  GEN_INT (pos)),
>> +                                 (mode, gen_lowpart (mode, inner), pos_rtx),
>>                                   gen_int_mode (mask, mode)));
>>
>>               split = find_split_point (&SET_SRC (x), insn, true);
>> @@ -5153,14 +5156,15 @@ find_split_point (rtx *loc, rtx_insn *in
>>             }
>>           else
>>             {
>> +             int left_bits = GET_MODE_PRECISION (mode) - len - pos;
>> +             int right_bits = GET_MODE_PRECISION (mode) - len;
>>               SUBST (SET_SRC (x),
>>                      gen_rtx_fmt_ee
>>                      (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
>>                       gen_rtx_ASHIFT (mode,
>>                                       gen_lowpart (mode, inner),
>> -                                     GEN_INT (GET_MODE_PRECISION (mode)
>> -                                              - len - pos)),
>> -                     GEN_INT (GET_MODE_PRECISION (mode) - len)));
>> +                                     gen_int_shift_amount (mode, left_bits)),
>> +                     gen_int_shift_amount (mode, right_bits)));
>>
>>               split = find_split_point (&SET_SRC (x), insn, true);
>>               if (split && split != &SET_SRC (x))
>> @@ -8935,10 +8939,11 @@ force_int_to_mode (rtx x, scalar_int_mod
>>           /* Must be more sign bit copies than the mask needs.  */
>>           && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>>               >= exact_log2 (mask + 1)))
>> -       x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>> -                                GEN_INT (GET_MODE_PRECISION (xmode)
>> -                                         - exact_log2 (mask + 1)));
>> -
>> +       {
>> +         int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
>> +         x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>> +                                  gen_int_shift_amount (xmode, nbits));
>> +       }
>>        goto shiftrt;
>>
>>      case ASHIFTRT:
>> @@ -10431,7 +10436,7 @@ simplify_shift_const_1 (enum rtx_code co
>>  {
>>    enum rtx_code orig_code = code;
>>    rtx orig_varop = varop;
>> -  int count;
>> +  int count, log2;
>>    machine_mode mode = result_mode;
>>    machine_mode shift_mode;
>> scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode,
> int_result_mode;
>> @@ -10634,13 +10639,11 @@ simplify_shift_const_1 (enum rtx_code co
>>              is cheaper.  But it is still better on those machines to
>>              merge two shifts into one.  */
>>           if (CONST_INT_P (XEXP (varop, 1))
>> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>>             {
>> -             varop
>> -               = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>> -                                      XEXP (varop, 0),
>> -                                      GEN_INT (exact_log2 (
>> -                                               UINTVAL (XEXP (varop, 1)))));
>> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>> +             varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>> +                                          XEXP (varop, 0), log2_rtx);
>>               continue;
>>             }
>>           break;
>> @@ -10648,13 +10651,11 @@ simplify_shift_const_1 (enum rtx_code co
>>         case UDIV:
>>           /* Similar, for when divides are cheaper.  */
>>           if (CONST_INT_P (XEXP (varop, 1))
>> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>>             {
>> -             varop
>> -               = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>> -                                      XEXP (varop, 0),
>> -                                      GEN_INT (exact_log2 (
>> -                                               UINTVAL (XEXP (varop, 1)))));
>> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>> +             varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>> +                                          XEXP (varop, 0), log2_rtx);
>>               continue;
>>             }
>>           break;
>> @@ -10789,10 +10790,10 @@ simplify_shift_const_1 (enum rtx_code co
>>
>>               mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>>                                        int_result_mode);
>> -
>> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>>               mask_rtx
>>                 = simplify_const_binary_operation (code, int_result_mode,
>> -                                                  mask_rtx, GEN_INT (count));
>> +                                                  mask_rtx, count_rtx);
>>
>>               /* Give up if we can't compute an outer operation to use.  */
>>               if (mask_rtx == 0
>> @@ -10848,9 +10849,10 @@ simplify_shift_const_1 (enum rtx_code co
>>               if (code == ASHIFTRT && int_mode != int_result_mode)
>>                 break;
>>
>> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>>               rtx new_rtx = simplify_const_binary_operation (code, int_mode,
>>                                                              XEXP (varop, 0),
>> -                                                            GEN_INT (count));
>> +                                                            count_rtx);
>> varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
>>               count = 0;
>>               continue;
>> @@ -10916,7 +10918,7 @@ simplify_shift_const_1 (enum rtx_code co
>>               && (new_rtx = simplify_const_binary_operation
>>                   (code, int_result_mode,
>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> -                  GEN_INT (count))) != 0
>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>               && CONST_INT_P (new_rtx)
>>               && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
>>                                   INTVAL (new_rtx), int_result_mode,
>> @@ -11059,7 +11061,7 @@ simplify_shift_const_1 (enum rtx_code co
>>               && (new_rtx = simplify_const_binary_operation
>>                   (ASHIFT, int_result_mode,
>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> -                  GEN_INT (count))) != 0
>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>               && CONST_INT_P (new_rtx)
>>               && merge_outer_ops (&outer_op, &outer_const, PLUS,
>>                                   INTVAL (new_rtx), int_result_mode,
>> @@ -11080,7 +11082,7 @@ simplify_shift_const_1 (enum rtx_code co
>>               && (new_rtx = simplify_const_binary_operation
>>                   (code, int_result_mode,
>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> -                  GEN_INT (count))) != 0
>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>               && CONST_INT_P (new_rtx)
>>               && merge_outer_ops (&outer_op, &outer_const, XOR,
>>                                   INTVAL (new_rtx), int_result_mode,
>> @@ -11135,12 +11137,12 @@ simplify_shift_const_1 (enum rtx_code co
>>                       - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
>>             {
>>               rtx varop_inner = XEXP (varop, 0);
>> -
>> -             varop_inner
>> -               = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>> -                                   XEXP (varop_inner, 0),
>> -                                   GEN_INT
>> -                                   (count + INTVAL (XEXP (varop_inner, 1))));
>> +             int new_count = count + INTVAL (XEXP (varop_inner, 1));
>> + rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
>> +                                                       new_count);
>> +             varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>> +                                             XEXP (varop_inner, 0),
>> +                                             new_count_rtx);
>>               varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
>>               count = 0;
>>               continue;
>> @@ -11192,7 +11194,8 @@ simplify_shift_const_1 (enum rtx_code co
>>      x = NULL_RTX;
>>
>>    if (x == NULL_RTX)
>> -    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
>> +    x = simplify_gen_binary (code, shift_mode, varop,
>> +                            gen_int_shift_amount (shift_mode, count));
>>
>>    /* If we were doing an LSHIFTRT in a wider mode than it was originally,
>>       turn off all the bits that the shift would have turned off.  */
>> @@ -11254,7 +11257,8 @@ simplify_shift_const (rtx x, enum rtx_co
>>      return tem;
>>
>>    if (!x)
>> -    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
>> +    x = simplify_gen_binary (code, GET_MODE (varop), varop,
>> +                            gen_int_shift_amount (GET_MODE (varop), count));
>>    if (GET_MODE (x) != result_mode)
>>      x = gen_lowpart (result_mode, x);
>>    return x;
>> @@ -11445,8 +11449,9 @@ change_zero_ext (rtx pat)
>>           if (BITS_BIG_ENDIAN)
>>             start = GET_MODE_PRECISION (inner_mode) - size - start;
>>
>> -         if (start)
>> -           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
>> +         if (start != 0)
>> +           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
>> +                                 gen_int_shift_amount (inner_mode, start));
>>           else
>>             x = XEXP (x, 0);
>>           if (mode != inner_mode)
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c        2017-11-20 20:37:41.918226976 +0000
>> +++ gcc/optabs.c        2017-11-20 20:37:51.662320782 +0000
>> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
>>        if (binoptab != ashr_optab)
>>         emit_move_insn (outof_target, CONST0_RTX (word_mode));
>>        else
>> -       if (!force_expand_binop (word_mode, binoptab,
>> -                                outof_input, GEN_INT (BITS_PER_WORD - 1),
>> +       if (!force_expand_binop (word_mode, binoptab, outof_input,
>> +                                gen_int_shift_amount (word_mode,
>> +                                                      BITS_PER_WORD - 1),
>>                                  outof_target, unsignedp, methods))
>>           return false;
>>      }
>> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
>>  {
>>    int low = (WORDS_BIG_ENDIAN ? 1 : 0);
>>    int high = (WORDS_BIG_ENDIAN ? 0 : 1);
>> -  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
>> +  rtx wordm1 = (umulp ? NULL_RTX
>> +               : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
>>    rtx product, adjust, product_high, temp;
>>
>>    rtx op0_high = operand_subword_force (op0, high, mode);
>> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
>>        unsigned int bits = GET_MODE_PRECISION (int_mode);
>>
>>        if (CONST_INT_P (op1))
>> -        newop1 = GEN_INT (bits - INTVAL (op1));
>> +       newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
>>        else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
>>          newop1 = negate_rtx (GET_MODE (op1), op1);
>>        else
>> @@ -1403,7 +1405,7 @@ expand_binop (machine_mode mode, optab b
>>
>>        /* Apply the truncation to constant shifts.  */
>>        if (double_shift_mask > 0 && CONST_INT_P (op1))
>> -       op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
>> +       op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>>
>>        if (op1 == CONST0_RTX (op1_mode))
>>         return op0;
>> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
>>        else
>>         {
>>           rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
>> -         rtx first_shift_count, second_shift_count;
>> +         HOST_WIDE_INT first_shift_count, second_shift_count;
>>           optab reverse_unsigned_shift, unsigned_shift;
>>
>>           reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
>> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>>
>>           if (shift_count > BITS_PER_WORD)
>>             {
>> -             first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
>> -             second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
>> +             first_shift_count = shift_count - BITS_PER_WORD;
>> +             second_shift_count = 2 * BITS_PER_WORD - shift_count;
>>             }
>>           else
>>             {
>> -             first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
>> -             second_shift_count = GEN_INT (shift_count);
>> +             first_shift_count = BITS_PER_WORD - shift_count;
>> +             second_shift_count = shift_count;
>>             }
>> +         rtx first_shift_count_rtx
>> +           = gen_int_shift_amount (word_mode, first_shift_count);
>> +         rtx second_shift_count_rtx
>> +           = gen_int_shift_amount (word_mode, second_shift_count);
>>
>>           into_temp1 = expand_binop (word_mode, unsigned_shift,
>> -                                    outof_input, first_shift_count,
>> +                                    outof_input, first_shift_count_rtx,
>>                                      NULL_RTX, unsignedp, next_methods);
>>           into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>> -                                    into_input, second_shift_count,
>> +                                    into_input, second_shift_count_rtx,
>>                                      NULL_RTX, unsignedp, next_methods);
>>
>>           if (into_temp1 != 0 && into_temp2 != 0)
>> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
>>             emit_move_insn (into_target, inter);
>>
>>           outof_temp1 = expand_binop (word_mode, unsigned_shift,
>> -                                     into_input, first_shift_count,
>> +                                     into_input, first_shift_count_rtx,
>>                                       NULL_RTX, unsignedp, next_methods);
>>           outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>> -                                     outof_input, second_shift_count,
>> +                                     outof_input, second_shift_count_rtx,
>>                                       NULL_RTX, unsignedp, next_methods);
>>
>>           if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
>> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>>
>>           if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
>>             {
>> -             temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
>> -                                  unsignedp, OPTAB_DIRECT);
>> +             temp = expand_binop (mode, rotl_optab, op0,
>> +                                  gen_int_shift_amount (mode, 8),
>> +                                  target, unsignedp, OPTAB_DIRECT);
>>               if (temp)
>>                 return temp;
>>              }
>>
>>           if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
>>             {
>> -             temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
>> -                                  unsignedp, OPTAB_DIRECT);
>> +             temp = expand_binop (mode, rotr_optab, op0,
>> +                                  gen_int_shift_amount (mode, 8),
>> +                                  target, unsignedp, OPTAB_DIRECT);
>>               if (temp)
>>                 return temp;
>>             }
>>
>>           last = get_last_insn ();
>>
>> -         temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
>> +         temp1 = expand_binop (mode, ashl_optab, op0,
>> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>>                                 unsignedp, OPTAB_WIDEN);
>> -         temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
>> +         temp2 = expand_binop (mode, lshr_optab, op0,
>> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>>                                 unsignedp, OPTAB_WIDEN);
>>           if (temp1 && temp2)
>>             {
>> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
>>  }
>>
>> /* Checks if vec_perm mask SEL is a constant equivalent to a shift of
> the first
>> - vec_perm operand, assuming the second operand is a constant vector
> of zeroes.
>> - Return the shift distance in bits if so, or NULL_RTX if the vec_perm
> is not a
>> -   shift.  */
>> +   vec_perm operand (which has mode OP0_MODE), assuming the second
>> +   operand is a constant vector of zeroes.  Return the shift distance in
>> +   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
>>  static rtx
>> -shift_amt_for_vec_perm_mask (rtx sel)
>> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
>>  {
>>    unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
>> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>>         return NULL_RTX;
>>      }
>>
>> -  return GEN_INT (first * bitsize);
>> +  return gen_int_shift_amount (op0_mode, first * bitsize);
>>  }
>>
>>  /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
>> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
>>           && (shift_code != CODE_FOR_nothing
>>               || shift_code_qi != CODE_FOR_nothing))
>>         {
>> -         shift_amt = shift_amt_for_vec_perm_mask (sel);
>> +         shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>>           if (shift_amt)
>>             {
>>               struct expand_operand ops[3];
>> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
>>                                    NULL, 0, OPTAB_DIRECT);
>>        else
>>         sel = expand_simple_binop (selmode, ASHIFT, sel,
>> -                                  GEN_INT (exact_log2 (u)),
>> +                                  gen_int_shift_amount (selmode,
>> +                                                        exact_log2 (u)),
>>                                    NULL, 0, OPTAB_DIRECT);
>>        gcc_assert (sel != NULL);
>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-12-15  0:29   ` Richard Sandiford
@ 2017-12-15  8:58     ` Richard Biener
  2017-12-15 12:52       ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-12-15  8:58 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
> isn't needed with the new VECTOR_CST layout.  It's really just the
> original patch with bits removed, but just in case:
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install?

To keep things simple at this point OK.  Note that I'd eventually
like to see this as VEC_PERM_EXPR <scalar_type_1, scalar_type_1, { 0, ... }>.
For reductions when we need { x, 0, ... } we now have to use a
VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR
to merge it with {0, ... }, right?  Rather than VEC_PERM_EXPR <x_1, 0,
{ 0, 1, 1, 1.... }>

Thanks,
Richard.

> Richard
>
>
> 2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hawyard@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/generic.texi (VEC_DUPLICATE_EXPR): Document.
>         (VEC_COND_EXPR): Add missing @tindex.
>         * doc/md.texi (vec_duplicate@var{m}): Document.
>         * tree.def (VEC_DUPLICATE_EXPR): New tree codes.
>         * tree.c (build_vector_from_val): Add stubbed-out handling of
>         variable-length vectors, using VEC_DUPLICATE_EXPR.
>         (uniform_vector_p): Handle VEC_DUPLICATE_EXPR.
>         * cfgexpand.c (expand_debug_expr): Likewise.
>         * tree-cfg.c (verify_gimple_assign_unary): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
>         * fold-const.c (const_unop): Fold VEC_DUPLICATE_EXPRs of a constant.
>         (test_vec_duplicate_folding): New function.
>         (fold_const_c_tests): Call it.
>         * optabs.def (vec_duplicate_optab): New optab.
>         * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
>         * optabs.h (expand_vector_broadcast): Declare.
>         * optabs.c (expand_vector_broadcast): Make non-static.  Try using
>         vec_duplicate_optab.
>         * expr.c (store_constructor): Try using vec_duplicate_optab for
>         uniform vectors.
>         (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi        2017-12-15 00:24:47.213516622 +0000
> +++ gcc/doc/generic.texi        2017-12-15 00:24:47.498459276 +0000
> @@ -1768,6 +1768,7 @@ a value from @code{enum annot_expr_kind}
>
>  @node Vectors
>  @subsection Vectors
> +@tindex VEC_DUPLICATE_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1779,9 +1780,14 @@ a value from @code{enum annot_expr_kind}
>  @tindex VEC_PACK_TRUNC_EXPR
>  @tindex VEC_PACK_SAT_EXPR
>  @tindex VEC_PACK_FIX_TRUNC_EXPR
> +@tindex VEC_COND_EXPR
>  @tindex SAD_EXPR
>
>  @table @code
> +@item VEC_DUPLICATE_EXPR
> +This node has a single operand and represents a vector in which every
> +element is equal to that operand.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-12-15 00:24:47.213516622 +0000
> +++ gcc/doc/md.texi     2017-12-15 00:24:47.499459075 +0000
> @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
>  the vector mode @var{m}, or a vector mode with the same element mode and
>  smaller number of elements.
>
> +@cindex @code{vec_duplicate@var{m}} instruction pattern
> +@item @samp{vec_duplicate@var{m}}
> +Initialize vector output operand 0 so that each element has the value given
> +by scalar input operand 1.  The vector has mode @var{m} and the scalar has
> +the mode appropriate for one element of @var{m}.
> +
> +This pattern only handles duplicates of non-constant inputs.  Constant
> +vectors go through the @code{mov@var{m}} pattern instead.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2017-12-15 00:24:47.213516622 +0000
> +++ gcc/tree.def        2017-12-15 00:24:47.505457868 +0000
> @@ -537,6 +537,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
>     1 and 2 are NULL.  The operands are then taken from the cfg edges. */
>  DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Represents a vector in which every element is equal to operand 0.  */
> +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-12-15 00:24:47.213516622 +0000
> +++ gcc/tree.c  2017-12-15 00:24:47.505457868 +0000
> @@ -1785,6 +1785,8 @@ build_vector_from_val (tree vectype, tre
>        v.quick_push (sc);
>        return v.build ();
>      }
> +  else if (0)
> +    return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
>    else
>      {
>        vec<constructor_elt, va_gc> *v;
> @@ -10468,7 +10470,10 @@ uniform_vector_p (const_tree vec)
>
>    gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
>
> -  if (TREE_CODE (vec) == VECTOR_CST)
> +  if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
> +    return TREE_OPERAND (vec, 0);
> +
> +  else if (TREE_CODE (vec) == VECTOR_CST)
>      {
>        if (VECTOR_CST_NPATTERNS (vec) == 1 && VECTOR_CST_DUPLICATE_P (vec))
>         return VECTOR_CST_ENCODED_ELT (vec, 0);
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-12-15 00:24:47.213516622 +0000
> +++ gcc/cfgexpand.c     2017-12-15 00:24:47.498459276 +0000
> @@ -5069,6 +5069,7 @@ expand_debug_expr (tree exp)
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_PERM_EXPR:
> +    case VEC_DUPLICATE_EXPR:
>        return NULL;
>
>      /* Misc codes.  */
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-12-15 00:24:47.213516622 +0000
> +++ gcc/tree-cfg.c      2017-12-15 00:24:47.503458270 +0000
> @@ -3857,6 +3857,17 @@ verify_gimple_assign_unary (gassign *stm
>      case CONJ_EXPR:
>        break;
>
> +    case VEC_DUPLICATE_EXPR:
> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> +       {
> +         error ("vec_duplicate should be from a scalar to a like vector");
> +         debug_generic_expr (lhs_type);
> +         debug_generic_expr (rhs1_type);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2017-12-15 00:24:47.213516622 +0000
> +++ gcc/tree-inline.c   2017-12-15 00:24:47.504458069 +0000
> @@ -3928,6 +3928,7 @@ estimate_operator_cost (enum tree_code c
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
> +    case VEC_DUPLICATE_EXPR:
>
>        return 1;
>
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2017-12-15 00:24:47.213516622 +0000
> +++ gcc/tree-pretty-print.c     2017-12-15 00:24:47.504458069 +0000
> @@ -3178,6 +3178,15 @@ dump_generic_node (pretty_printer *pp, t
>        pp_string (pp, " > ");
>        break;
>
> +    case VEC_DUPLICATE_EXPR:
> +      pp_space (pp);
> +      for (str = get_tree_code_name (code); *str; str++)
> +       pp_character (pp, TOUPPER (*str));
> +      pp_string (pp, " < ");
> +      dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> +      pp_string (pp, " > ");
> +      break;
> +
>      case VEC_UNPACK_HI_EXPR:
>        pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
>        dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-12-15 00:24:47.213516622 +0000
> +++ gcc/tree-vect-generic.c     2017-12-15 00:24:47.504458069 +0000
> @@ -1418,6 +1418,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>  ssa_uniform_vector_p (tree op)
>  {
>    if (TREE_CODE (op) == VECTOR_CST
> +      || TREE_CODE (op) == VEC_DUPLICATE_EXPR
>        || TREE_CODE (op) == CONSTRUCTOR)
>      return uniform_vector_p (op);
>    if (TREE_CODE (op) == SSA_NAME)
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-15 00:24:47.213516622 +0000
> +++ gcc/fold-const.c    2017-12-15 00:24:47.501458673 +0000
> @@ -1771,6 +1771,11 @@ const_unop (enum tree_code code, tree ty
>         return elts.build ();
>        }
>
> +    case VEC_DUPLICATE_EXPR:
> +      if (CONSTANT_CLASS_P (arg0))
> +       return build_vector_from_val (type, arg0);
> +      return NULL_TREE;
> +
>      default:
>        break;
>      }
> @@ -14442,6 +14447,22 @@ test_vector_folding ()
>    ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
>  }
>
> +/* Verify folding of VEC_DUPLICATE_EXPRs.  */
> +
> +static void
> +test_vec_duplicate_folding ()
> +{
> +  scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
> +  machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> +  /* This will be 1 if VEC_MODE isn't a vector mode.  */
> +  unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> +  tree type = build_vector_type (ssizetype, nunits);
> +  tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
> +  tree dup5_cst = build_vector_from_val (type, ssize_int (5));
> +  ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
> +}
> +
>  /* Run all of the selftests within this file.  */
>
>  void
> @@ -14449,6 +14470,7 @@ fold_const_c_tests ()
>  {
>    test_arithmetic_folding ();
>    test_vector_folding ();
> +  test_vec_duplicate_folding ();
>  }
>
>  } // namespace selftest
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-12-15 00:24:47.213516622 +0000
> +++ gcc/optabs.def      2017-12-15 00:24:47.502458472 +0000
> @@ -363,3 +363,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
>
>  OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
> +
> +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2017-12-15 00:24:47.213516622 +0000
> +++ gcc/optabs-tree.c   2017-12-15 00:24:47.501458673 +0000
> @@ -199,6 +199,9 @@ optab_for_tree_code (enum tree_code code
>        return TYPE_UNSIGNED (type) ?
>         vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
>
> +    case VEC_DUPLICATE_EXPR:
> +      return vec_duplicate_optab;
> +
>      default:
>        break;
>      }
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-12-15 00:24:47.213516622 +0000
> +++ gcc/optabs.h        2017-12-15 00:24:47.502458472 +0000
> @@ -182,6 +182,7 @@ extern rtx simplify_expand_binop (machin
>                                   enum optab_methods methods);
>  extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
>                                 enum optab_methods);
> +extern rtx expand_vector_broadcast (machine_mode, rtx);
>
>  /* Generate code for a simple binary or unary operation.  "Simple" in
>     this case means "can be unambiguously described by a (mode, code)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-15 00:24:47.213516622 +0000
> +++ gcc/optabs.c        2017-12-15 00:24:47.502458472 +0000
> @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
>     mode of OP must be the element mode of VMODE.  If OP is a constant,
>     then the return value will be a constant.  */
>
> -static rtx
> +rtx
>  expand_vector_broadcast (machine_mode vmode, rtx op)
>  {
>    enum insn_code icode;
> @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
>    if (valid_for_const_vec_duplicate_p (vmode, op))
>      return gen_const_vec_duplicate (vmode, op);
>
> +  icode = optab_handler (vec_duplicate_optab, vmode);
> +  if (icode != CODE_FOR_nothing)
> +    {
> +      struct expand_operand ops[2];
> +      create_output_operand (&ops[0], NULL_RTX, vmode);
> +      create_input_operand (&ops[1], op, GET_MODE (op));
> +      expand_insn (icode, 2, ops);
> +      return ops[0].value;
> +    }
> +
>    /* ??? If the target doesn't have a vec_init, then we have no easy way
>       of performing this operation.  Most of this sort of generic support
>       is hidden away in the vector lowering support in gimple.  */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-12-15 00:24:47.213516622 +0000
> +++ gcc/expr.c  2017-12-15 00:24:47.500458874 +0000
> @@ -6598,7 +6598,8 @@ store_constructor (tree exp, rtx target,
>         constructor_elt *ce;
>         int i;
>         int need_to_clear;
> -       int icode = CODE_FOR_nothing;
> +       insn_code icode = CODE_FOR_nothing;
> +       tree elt;
>         tree elttype = TREE_TYPE (type);
>         int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
>         machine_mode eltmode = TYPE_MODE (elttype);
> @@ -6608,13 +6609,30 @@ store_constructor (tree exp, rtx target,
>         unsigned n_elts;
>         alias_set_type alias;
>         bool vec_vec_init_p = false;
> +       machine_mode mode = GET_MODE (target);
>
>         gcc_assert (eltmode != BLKmode);
>
> +       /* Try using vec_duplicate_optab for uniform vectors.  */
> +       if (!TREE_SIDE_EFFECTS (exp)
> +           && VECTOR_MODE_P (mode)
> +           && eltmode == GET_MODE_INNER (mode)
> +           && ((icode = optab_handler (vec_duplicate_optab, mode))
> +               != CODE_FOR_nothing)
> +           && (elt = uniform_vector_p (exp)))
> +         {
> +           struct expand_operand ops[2];
> +           create_output_operand (&ops[0], target, mode);
> +           create_input_operand (&ops[1], expand_normal (elt), eltmode);
> +           expand_insn (icode, 2, ops);
> +           if (!rtx_equal_p (target, ops[0].value))
> +             emit_move_insn (target, ops[0].value);
> +           break;
> +         }
> +
>         n_elts = TYPE_VECTOR_SUBPARTS (type);
> -       if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
> +       if (REG_P (target) && VECTOR_MODE_P (mode))
>           {
> -           machine_mode mode = GET_MODE (target);
>             machine_mode emode = eltmode;
>
>             if (CONSTRUCTOR_NELTS (exp)
> @@ -6626,7 +6644,7 @@ store_constructor (tree exp, rtx target,
>                             == n_elts);
>                 emode = TYPE_MODE (etype);
>               }
> -           icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
> +           icode = convert_optab_handler (vec_init_optab, mode, emode);
>             if (icode != CODE_FOR_nothing)
>               {
>                 unsigned int i, n = n_elts;
> @@ -6674,7 +6692,7 @@ store_constructor (tree exp, rtx target,
>         if (need_to_clear && size > 0 && !vector)
>           {
>             if (REG_P (target))
> -             emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> +             emit_move_insn (target, CONST0_RTX (mode));
>             else
>               clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
>             cleared = 1;
> @@ -6682,7 +6700,7 @@ store_constructor (tree exp, rtx target,
>
>         /* Inform later passes that the old value is dead.  */
>         if (!cleared && !vector && REG_P (target))
> -         emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> +         emit_move_insn (target, CONST0_RTX (mode));
>
>          if (MEM_P (target))
>           alias = MEM_ALIAS_SET (target);
> @@ -6733,8 +6751,7 @@ store_constructor (tree exp, rtx target,
>
>         if (vector)
>           emit_insn (GEN_FCN (icode) (target,
> -                                     gen_rtx_PARALLEL (GET_MODE (target),
> -                                                       vector)));
> +                                     gen_rtx_PARALLEL (mode, vector)));
>         break;
>        }
>
> @@ -9563,6 +9580,12 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
>        return target;
>
> +    case VEC_DUPLICATE_EXPR:
> +      op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
> +      target = expand_vector_broadcast (mode, op0);
> +      gcc_assert (target);
> +      return target;
> +
>      case BIT_INSERT_EXPR:
>        {
>         unsigned bitpos = tree_to_uhwi (treeop2);

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
  2017-12-15  0:34   ` Richard Sandiford
@ 2017-12-15  9:03     ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-12-15  9:03 UTC (permalink / raw)
  To: GCC Patches, Richard Sandiford

On Fri, Dec 15, 2017 at 1:34 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Similarly to the update 05 patch, this patch just adds VEC_SERIES_EXPR,
> since the VEC_SERIES_CST isn't needed with the new VECTOR_CST layout.
> build_vec_series now uses the new VECTOR_CST layout, but otherwise
> this is just the original patch with bits removed.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
> OK to install?

Given we need to use VEC_DUPLICATE + VEC_PERM for {x, 0... }(?)
how about doing VEC_DUPLICATE and PLUS for this one?  Or is 'step'
allowed to be non-constant?  It seems to be.

Ah well.

OK.

Thanks,
Richard.

> Richard
>
>
> 2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
>             Alan Hayward  <alan.hayward@arm.com>
>             David Sherwood  <david.sherwood@arm.com>
>
> gcc/
>         * doc/generic.texi (VEC_SERIES_EXPR): Document.
>         * doc/md.texi (vec_series@var{m}): Document.
>         * tree.def (VEC_SERIES_EXPR): New tree code.
>         * tree.h (build_vec_series): Declare.
>         * tree.c (build_vec_series): New function.
>         * cfgexpand.c (expand_debug_expr): Handle VEC_SERIES_EXPR.
>         * tree-pretty-print.c (dump_generic_node): Likewise.
>         * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>         * tree-inline.c (estimate_operator_cost): Likewise.
>         * expr.c (expand_expr_real_2): Likewise.
>         * optabs-tree.c (optab_for_tree_code): Likewise.
>         * tree-cfg.c (verify_gimple_assign_binary): Likewise.
>         * fold-const.c (const_binop): Fold VEC_SERIES_EXPRs of constants.
>         * expmed.c (make_tree): Handle VEC_SERIES.
>         * optabs.def (vec_series_optab): New optab.
>         * optabs.h (expand_vec_series_expr): Declare.
>         * optabs.c (expand_vec_series_expr): New function.
>         * tree-vect-generic.c (expand_vector_operations_1): Check that
>         the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi        2017-12-15 00:30:46.596993903 +0000
> +++ gcc/doc/generic.texi        2017-12-15 00:30:46.911991495 +0000
> @@ -1769,6 +1769,7 @@ a value from @code{enum annot_expr_kind}
>  @node Vectors
>  @subsection Vectors
>  @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
>  @tindex VEC_LSHIFT_EXPR
>  @tindex VEC_RSHIFT_EXPR
>  @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1788,6 +1789,14 @@ a value from @code{enum annot_expr_kind}
>  This node has a single operand and represents a vector in which every
>  element is equal to that operand.
>
> +@item VEC_SERIES_EXPR
> +This node represents a vector formed from a scalar base and step,
> +given as the first and second operands respectively.  Element @var{i}
> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
> +
> +This node is restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
>  @item VEC_LSHIFT_EXPR
>  @itemx VEC_RSHIFT_EXPR
>  These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi     2017-12-15 00:30:46.596993903 +0000
> +++ gcc/doc/md.texi     2017-12-15 00:30:46.912991487 +0000
> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>
>  This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{vec_series@var{m}} instruction pattern
> +@item @samp{vec_series@var{m}}
> +Initialize vector output operand 0 so that element @var{i} is equal to
> +operand 1 plus @var{i} times operand 2.  In other words, create a linear
> +series whose base value is operand 1 and whose step is operand 2.
> +
> +The vector output has mode @var{m} and the scalar inputs have the mode
> +appropriate for one element of @var{m}.  This pattern is not used for
> +floating-point vectors, in order to avoid having to specify the
> +rounding behavior for @var{i} > 1.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
>  @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>  @item @samp{vec_cmp@var{m}@var{n}}
>  Output a vector comparison.  Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def        2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree.def        2017-12-15 00:30:46.919991433 +0000
> @@ -540,6 +540,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
>  /* Represents a vector in which every element is equal to operand 0.  */
>  DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>
> +/* Vector series created from a start (base) value and a step.
> +
> +   A = VEC_SERIES_EXPR (B, C)
> +
> +   means
> +
> +   for (i = 0; i < N; i++)
> +     A[i] = B + C * i;  */
> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
> +
>  /* Vector conditional expression. It is like COND_EXPR, but with
>     vector operands.
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h  2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree.h  2017-12-15 00:30:46.919991433 +0000
> @@ -4052,6 +4052,7 @@ extern tree build_int_cst_type (tree, HO
>  extern tree make_vector (unsigned, unsigned CXX_MEM_STAT_INFO);
>  extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
>  extern tree build_vector_from_val (tree, tree);
> +extern tree build_vec_series (tree, tree, tree);
>  extern void recompute_constructor_flags (tree);
>  extern void verify_constructor_flags (tree);
>  extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c  2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree.c  2017-12-15 00:30:46.918991441 +0000
> @@ -1797,6 +1797,30 @@ build_vector_from_val (tree vectype, tre
>      }
>  }
>
> +/* Build a vector series of type TYPE in which element I has the value
> +   BASE + I * STEP.  The result is a constant if BASE and STEP are constant
> +   and a VEC_SERIES_EXPR otherwise.  */
> +
> +tree
> +build_vec_series (tree type, tree base, tree step)
> +{
> +  if (integer_zerop (step))
> +    return build_vector_from_val (type, base);
> +  if (TREE_CODE (base) == INTEGER_CST && TREE_CODE (step) == INTEGER_CST)
> +    {
> +      tree_vector_builder builder (type, 1, 3);
> +      tree elt1 = wide_int_to_tree (TREE_TYPE (base),
> +                                   wi::to_wide (base) + wi::to_wide (step));
> +      tree elt2 = wide_int_to_tree (TREE_TYPE (base),
> +                                   wi::to_wide (elt1) + wi::to_wide (step));
> +      builder.quick_push (base);
> +      builder.quick_push (elt1);
> +      builder.quick_push (elt2);
> +      return builder.build ();
> +    }
> +  return build2 (VEC_SERIES_EXPR, type, base, step);
> +}
> +
>  /* Something has messed with the elements of CONSTRUCTOR C after it was built;
>     calculate TREE_CONSTANT and TREE_SIDE_EFFECTS.  */
>
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c     2017-12-15 00:30:46.596993903 +0000
> +++ gcc/cfgexpand.c     2017-12-15 00:30:46.911991495 +0000
> @@ -5070,6 +5070,7 @@ expand_debug_expr (tree exp)
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_PERM_EXPR:
>      case VEC_DUPLICATE_EXPR:
> +    case VEC_SERIES_EXPR:
>        return NULL;
>
>      /* Misc codes.  */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c     2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree-pretty-print.c     2017-12-15 00:30:46.917991449 +0000
> @@ -3162,6 +3162,7 @@ dump_generic_node (pretty_printer *pp, t
>        is_expr = false;
>        break;
>
> +    case VEC_SERIES_EXPR:
>      case VEC_WIDEN_MULT_HI_EXPR:
>      case VEC_WIDEN_MULT_LO_EXPR:
>      case VEC_WIDEN_MULT_EVEN_EXPR:
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c   2017-12-15 00:30:46.596993903 +0000
> +++ gcc/gimple-pretty-print.c   2017-12-15 00:30:46.915991464 +0000
> @@ -431,6 +431,7 @@ dump_binary_rhs (pretty_printer *buffer,
>      case VEC_PACK_FIX_TRUNC_EXPR:
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
> +    case VEC_SERIES_EXPR:
>        for (p = get_tree_code_name (code); *p; p++)
>         pp_character (buffer, TOUPPER (*p));
>        pp_string (buffer, " <");
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c   2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree-inline.c   2017-12-15 00:30:46.917991449 +0000
> @@ -3929,6 +3929,7 @@ estimate_operator_cost (enum tree_code c
>      case VEC_WIDEN_LSHIFT_HI_EXPR:
>      case VEC_WIDEN_LSHIFT_LO_EXPR:
>      case VEC_DUPLICATE_EXPR:
> +    case VEC_SERIES_EXPR:
>
>        return 1;
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c  2017-12-15 00:30:46.596993903 +0000
> +++ gcc/expr.c  2017-12-15 00:30:46.914991472 +0000
> @@ -9586,6 +9586,10 @@ #define REDUCE_BIT_FIELD(expr)   (reduce_b
>        gcc_assert (target);
>        return target;
>
> +    case VEC_SERIES_EXPR:
> +      expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
> +      return expand_vec_series_expr (mode, op0, op1, target);
> +
>      case BIT_INSERT_EXPR:
>        {
>         unsigned bitpos = tree_to_uhwi (treeop2);
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c   2017-12-15 00:30:46.596993903 +0000
> +++ gcc/optabs-tree.c   2017-12-15 00:30:46.915991464 +0000
> @@ -202,6 +202,9 @@ optab_for_tree_code (enum tree_code code
>      case VEC_DUPLICATE_EXPR:
>        return vec_duplicate_optab;
>
> +    case VEC_SERIES_EXPR:
> +      return vec_series_optab;
> +
>      default:
>        break;
>      }
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c      2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree-cfg.c      2017-12-15 00:30:46.917991449 +0000
> @@ -4194,6 +4194,23 @@ verify_gimple_assign_binary (gassign *st
>        /* Continue with generic binary expression handling.  */
>        break;
>
> +    case VEC_SERIES_EXPR:
> +      if (!useless_type_conversion_p (rhs1_type, rhs2_type))
> +       {
> +         error ("type mismatch in series expression");
> +         debug_generic_expr (rhs1_type);
> +         debug_generic_expr (rhs2_type);
> +         return true;
> +       }
> +      if (TREE_CODE (lhs_type) != VECTOR_TYPE
> +         || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> +       {
> +         error ("vector type expected in series expression");
> +         debug_generic_expr (lhs_type);
> +         return true;
> +       }
> +      return false;
> +
>      default:
>        gcc_unreachable ();
>      }
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c    2017-12-15 00:30:46.596993903 +0000
> +++ gcc/fold-const.c    2017-12-15 00:30:46.915991464 +0000
> @@ -1527,6 +1527,12 @@ const_binop (enum tree_code code, tree t
>       result as argument put those cases that need it here.  */
>    switch (code)
>      {
> +    case VEC_SERIES_EXPR:
> +      if (CONSTANT_CLASS_P (arg1)
> +         && CONSTANT_CLASS_P (arg2))
> +       return build_vec_series (type, arg1, arg2);
> +      return NULL_TREE;
> +
>      case COMPLEX_EXPR:
>        if ((TREE_CODE (arg1) == REAL_CST
>            && TREE_CODE (arg2) == REAL_CST)
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c        2017-12-15 00:30:46.596993903 +0000
> +++ gcc/expmed.c        2017-12-15 00:30:46.913991479 +0000
> @@ -5255,6 +5255,13 @@ make_tree (tree type, rtx x)
>             tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
>             return build_vector_from_val (type, elt_tree);
>           }
> +       if (GET_CODE (op) == VEC_SERIES)
> +         {
> +           tree itype = TREE_TYPE (type);
> +           tree base_tree = make_tree (itype, XEXP (op, 0));
> +           tree step_tree = make_tree (itype, XEXP (op, 1));
> +           return build_vec_series (type, base_tree, step_tree);
> +         }
>         return make_tree (type, op);
>        }
>
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def      2017-12-15 00:30:46.596993903 +0000
> +++ gcc/optabs.def      2017-12-15 00:30:46.916991456 +0000
> @@ -365,3 +365,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
>  OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>
>  OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h        2017-12-15 00:30:46.596993903 +0000
> +++ gcc/optabs.h        2017-12-15 00:30:46.916991456 +0000
> @@ -319,6 +319,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
>  /* Generate code for VEC_COND_EXPR.  */
>  extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>
> +/* Generate code for VEC_SERIES_EXPR.  */
> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
> +
>  /* Generate code for MULT_HIGHPART_EXPR.  */
>  extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c        2017-12-15 00:30:46.596993903 +0000
> +++ gcc/optabs.c        2017-12-15 00:30:46.916991456 +0000
> @@ -5768,6 +5768,27 @@ expand_vec_cond_expr (tree vec_cond_type
>    return ops[0].value;
>  }
>
> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
> +   Use TARGET for the result if nonnull and convenient.  */
> +
> +rtx
> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
> +{
> +  struct expand_operand ops[3];
> +  enum insn_code icode;
> +  machine_mode emode = GET_MODE_INNER (vmode);
> +
> +  icode = direct_optab_handler (vec_series_optab, vmode);
> +  gcc_assert (icode != CODE_FOR_nothing);
> +
> +  create_output_operand (&ops[0], target, vmode);
> +  create_input_operand (&ops[1], op0, emode);
> +  create_input_operand (&ops[2], op1, emode);
> +
> +  expand_insn (icode, 3, ops);
> +  return ops[0].value;
> +}
> +
>  /* Generate insns for a vector comparison into a mask.  */
>
>  rtx
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c     2017-12-15 00:30:46.596993903 +0000
> +++ gcc/tree-vect-generic.c     2017-12-15 00:30:46.918991441 +0000
> @@ -1594,7 +1594,8 @@ expand_vector_operations_1 (gimple_stmt_
>    if (rhs_class == GIMPLE_BINARY_RHS)
>      rhs2 = gimple_assign_rhs2 (stmt);
>
> -  if (TREE_CODE (type) != VECTOR_TYPE)
> +  if (!VECTOR_TYPE_P (type)
> +      || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
>      return;
>
>    /* If the vector operation is operating on all same vector elements

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-12-15  0:48           ` Richard Sandiford
@ 2017-12-15  9:06             ` Richard Biener
  2017-12-15 15:17               ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-12-15  9:06 UTC (permalink / raw)
  To: Richard Biener, Jeff Law, GCC Patches, Richard Sandiford

On Fri, Dec 15, 2017 at 1:48 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>>>> <richard.guenther@gmail.com> wrote:
>>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch adds a stub helper routine to provide the mode
>>>>>> of a scalar shift amount, given the mode of the values
>>>>>> being shifted.
>>>>>>
>>>>>> One long-standing problem has been to decide what this mode
>>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>>> the corresponding target pattern says?  (In which case what
>>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>>
>>>>>> For now the patch picks word_mode, which should be safe on
>>>>>> all targets but could perhaps become suboptimal if the helper
>>>>>> routine is used more often than it is in this patch.  As it
>>>>>> stands the patch does not change the generated code.
>>>>>>
>>>>>> The patch also adds a helper function that constructs rtxes
>>>>>> for constant shift amounts, again given the mode of the value
>>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>>
>>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>>> looks good to me I'd rather have that ??? in this function (and
>>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>
>>> OK.  I'd gone for word_mode because that's what expand_binop uses
>>> for CONST_INTs:
>>>
>>>       op1_mode = (GET_MODE (op1) != VOIDmode
>>>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>>>                   : word_mode);
>>>
>>> But using the inner mode should be fine too.  The patch below does that.
>>>
>>>>> In the end it's up to insn recognizing to convert the op to the
>>>>> expected mode and for generic RTL it's us that should decide
>>>>> on the mode -- on GENERIC the shift amount has to be an
>>>>> integer so why not simply use a mode that is large enough to
>>>>> make the constant fit?
>>>
>>> ...but I can do that instead if you think it's better.
>>>
>>>>> Just throwing in some comments here, RTL isn't my primary
>>>>> expertise.
>>>>
>>>> To add a little bit - shift amounts is maybe the only(?) place
>>>> where a modeless CONST_INT makes sense!  So "fixing"
>>>> that first sounds backwards.
>>>
>>> But even here they have a mode conceptually, since out-of-range shift
>>> amounts are target-defined rather than undefined.  E.g. if the target
>>> interprets the shift amount as unsigned, then for a shift amount
>>> (const_int -1) it matters whether the mode is QImode (and so we're
>>> shifting by 255) or HImode (and so we're shifting by 65535.
>>
>> I think RTL is well-defined (at least I hope so ...) and machine constraints
>> need to be modeled explicitely (like embedding an implicit bit_and in
>> shift patterns).
>
> Well, RTL is well-defined in the sense that if you have
>
>   (ashift X (foo:HI ...))
>
> then the shift amount must be interpreted as HImode rather than some
> other mode.  The problem here is to define a default choice of mode for
> const_ints, in cases where the shift is being created out of the blue.
>
> Whether the shift amount is effectively signed or unsigned isn't defined
> by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
> out-of-range values, and the behaviour for out-of-range RTL shifts is
> specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.
>
> I think the revised patch does implement your suggestion of using the
> integer equivalent of the inner mode as the default, but we need to
> decide whether to go with it, go with the original word_mode approach
> (taken from existing expand_binop code) or something else.  Something
> else could include the widest supported integer mode, so that we never
> change the value.

I guess it's pretty arbitrary what we choose (but we might need to adjust
targets?).  For something like this an appealing choice would be sth
that is host and target idependent, like [u]int32_t or given CONST_INT
is always 64bits now and signed int64_t aka HOST_WIDE_INT (bad
name now).  That means it's the "infinite precision" thing that fits
into CONST_INT ;)

Richard.

> Thanks,
> Richard
>
>>> OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)
>>>
>>> Jeff Law <law@redhat.com> writes:
>>>> On 10/26/2017 06:06 AM, Richard Biener wrote:
>>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch adds a stub helper routine to provide the mode
>>>>>> of a scalar shift amount, given the mode of the values
>>>>>> being shifted.
>>>>>>
>>>>>> One long-standing problem has been to decide what this mode
>>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>>> the corresponding target pattern says?  (In which case what
>>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>>
>>>>>> For now the patch picks word_mode, which should be safe on
>>>>>> all targets but could perhaps become suboptimal if the helper
>>>>>> routine is used more often than it is in this patch.  As it
>>>>>> stands the patch does not change the generated code.
>>>>>>
>>>>>> The patch also adds a helper function that constructs rtxes
>>>>>> for constant shift amounts, again given the mode of the value
>>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>>
>>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>>> looks good to me I'd rather have that ??? in this function (and
>>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>>>
>>>>> In the end it's up to insn recognizing to convert the op to the
>>>>> expected mode and for generic RTL it's us that should decide
>>>>> on the mode -- on GENERIC the shift amount has to be an
>>>>> integer so why not simply use a mode that is large enough to
>>>>> make the constant fit?
>>>>>
>>>>> Just throwing in some comments here, RTL isn't my primary
>>>>> expertise.
>>>> I wonder if encapsulation + a target hook to specify the mode would be
>>>> better?  We'd then have to argue over word_mode, vs QImode vs something
>>>> else for the default, but at least we'd have a way for the target to
>>>> specify the mode is generally best when working on shift counts.
>>>>
>>>> In the end I doubt there's a single definition that is overall better.
>>>> Largely because I suspect there are times when the narrowest mode is
>>>> best, or the mode of the operand being shifted.
>>>>
>>>> So thoughts on doing the encapsulation with a target hook to specify the
>>>> desired mode?  Does that get us what we need for SVE and does it provide
>>>> us a path forward on this issue if we were to try to move towards
>>>> CONST_INTs with modes?
>>>
>>> I think it'd better to do that only if we have a use case, since
>>> it's hard to predict what the best way of handling it is until then.
>>> E.g. I'd still like to hold out the possibility of doing this automatically
>>> from the .md file instead, if some kind of override ends up being necessary.
>>>
>>> Like you say, we have to argue over the default either way, and I think
>>> that's been the sticking point.
>>>
>>> Thanks,
>>> Richard
>>>
>>>
>>> 2017-11-20  Richard Sandiford  <richard.sandiford@linaro.org>
>>>             Alan Hayward  <alan.hayward@arm.com>
>>>             David Sherwood  <david.sherwood@arm.com>
>>>
>>> gcc/
>>>         * emit-rtl.h (gen_int_shift_amount): Declare.
>>>         * emit-rtl.c (gen_int_shift_amount): New function.
>>>         * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>>>         instead of GEN_INT.
>>>         * calls.c (shift_return_value): Likewise.
>>>         * cse.c (fold_rtx): Likewise.
>>>         * dse.c (find_shift_sequence): Likewise.
>>>         * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>>>         (expand_shift, expand_smod_pow2): Likewise.
>>>         * lower-subreg.c (shift_cost): Likewise.
>>>         * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>>>         (simplify_binary_operation_1): Likewise.
>>>         * combine.c (try_combine, find_split_point, force_int_to_mode)
>>>         (simplify_shift_const_1, simplify_shift_const): Likewise.
>>>         (change_zero_ext): Likewise.  Use simplify_gen_binary.
>>>         * optabs.c (expand_superword_shift, expand_doubleword_mult)
>>>         (expand_unop, expand_binop): Use gen_int_shift_amount instead
>>>         of GEN_INT.
>>>         (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>>>         Use gen_int_shift_amount instead of GEN_INT.
>>>         (expand_vec_perm): Update caller accordingly.  Use
>>>         gen_int_shift_amount instead of GEN_INT.
>>>
>>> Index: gcc/emit-rtl.h
>>> ===================================================================
>>> --- gcc/emit-rtl.h      2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/emit-rtl.h      2017-11-20 20:37:51.661320782 +0000
>>> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>>>  extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>>>  extern void adjust_reg_mode (rtx, machine_mode);
>>>  extern int mem_expr_equal_p (const_tree, const_tree);
>>> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>>>
>>>  extern bool need_atomic_barrier_p (enum memmodel, bool);
>>>
>>> Index: gcc/emit-rtl.c
>>> ===================================================================
>>> --- gcc/emit-rtl.c      2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/emit-rtl.c      2017-11-20 20:37:51.660320782 +0000
>>> @@ -6507,6 +6507,24 @@ need_atomic_barrier_p (enum memmodel mod
>>>      }
>>>  }
>>>
>>> +/* Return a constant shift amount for shifting a value of mode MODE
>>> +   by VALUE bits.  */
>>> +
>>> +rtx
>>> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
>>> +{
>>> +  /* ??? Using the inner mode should be wide enough for all useful
>>> +     cases (e.g. QImode usually has 8 shiftable bits, while a QImode
>>> +     shift amount has a range of [-128, 127]).  But in principle
>>> +     a target could require target-dependent behaviour for a
>>> +     shift whose shift amount is wider than the shifted value.
>>> +     Perhaps this should be automatically derived from the .md
>>> +     files instead, or perhaps have a target hook.  */
>>> +  scalar_int_mode shift_mode
>>> +    = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
>>> +  return gen_int_mode (value, shift_mode);
>>> +}
>>> +
>>>  /* Initialize fields of rtl_data related to stack alignment.  */
>>>
>>>  void
>>> Index: gcc/asan.c
>>> ===================================================================
>>> --- gcc/asan.c  2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/asan.c  2017-11-20 20:37:51.657320781 +0000
>>> @@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
>>>    TREE_ASM_WRITTEN (id) = 1;
>>>    emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>>>    shadow_base = expand_binop (Pmode, lshr_optab, base,
>>> -                             GEN_INT (ASAN_SHADOW_SHIFT),
>>> +                             gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>>>                               NULL_RTX, 1, OPTAB_DIRECT);
>>>    shadow_base
>>>      = plus_constant (Pmode, shadow_base,
>>> Index: gcc/calls.c
>>> ===================================================================
>>> --- gcc/calls.c 2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/calls.c 2017-11-20 20:37:51.657320781 +0000
>>> @@ -2742,15 +2742,17 @@ shift_return_value (machine_mode mode, b
>>>    HOST_WIDE_INT shift;
>>>
>>>    gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
>>> -  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
>>> +  machine_mode value_mode = GET_MODE (value);
>>> +  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
>>>    if (shift == 0)
>>>      return false;
>>>
>>>    /* Use ashr rather than lshr for right shifts.  This is for the benefit
>>>       of the MIPS port, which requires SImode values to be sign-extended
>>>       when stored in 64-bit registers.  */
>>> - if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab :
>> ashr_optab,
>>> -                          value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
>>> +  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
>>> +                          value, gen_int_shift_amount (value_mode, shift),
>>> +                          value, 1, OPTAB_WIDEN))
>>>      gcc_unreachable ();
>>>    return true;
>>>  }
>>> Index: gcc/cse.c
>>> ===================================================================
>>> --- gcc/cse.c   2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/cse.c   2017-11-20 20:37:51.660320782 +0000
>>> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>>>                       || INTVAL (const_arg1) < 0))
>>>                 {
>>>                   if (SHIFT_COUNT_TRUNCATED)
>>> -                   canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
>>> - & (GET_MODE_UNIT_BITSIZE (mode)
>>> -                                                  - 1));
>>> +                   canon_const_arg1 = gen_int_shift_amount
>>> +                     (mode, (INTVAL (const_arg1)
>>> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>>>                   else
>>>                     break;
>>>                 }
>>> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>>>                       || INTVAL (inner_const) < 0))
>>>                 {
>>>                   if (SHIFT_COUNT_TRUNCATED)
>>> -                   inner_const = GEN_INT (INTVAL (inner_const)
>>> -                                          & (GET_MODE_UNIT_BITSIZE (mode)
>>> -                                             - 1));
>>> +                   inner_const = gen_int_shift_amount
>>> +                     (mode, (INTVAL (inner_const)
>>> +                             & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>>>                   else
>>>                     break;
>>>                 }
>>> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
>>>                   /* As an exception, we can turn an ASHIFTRT of this
>>>                      form into a shift of the number of bits - 1.  */
>>>                   if (code == ASHIFTRT)
>>> -                   new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
>>> +                   new_const = gen_int_shift_amount
>>> +                     (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
>>>                   else if (!side_effects_p (XEXP (y, 0)))
>>>                     return CONST0_RTX (mode);
>>>                   else
>>> Index: gcc/dse.c
>>> ===================================================================
>>> --- gcc/dse.c   2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/dse.c   2017-11-20 20:37:51.660320782 +0000
>>> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
>>>                                      store_mode, byte);
>>>           if (ret && CONSTANT_P (ret))
>>>             {
>>> +             rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
>>>               ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
>>> -                                                    ret, GEN_INT (shift));
>>> +                                                    ret, shift_rtx);
>>>               if (ret && CONSTANT_P (ret))
>>>                 {
>>>                   byte = subreg_lowpart_offset (read_mode, new_mode);
>>> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
>>>          of one dsp where the cost of these two was not the same.  But
>>>          this really is a rare case anyway.  */
>>>        target = expand_binop (new_mode, lshr_optab, new_reg,
>>> -                            GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
>>> +                            gen_int_shift_amount (new_mode, shift),
>>> +                            new_reg, 1, OPTAB_DIRECT);
>>>
>>>        shift_seq = get_insns ();
>>>        end_sequence ();
>>> Index: gcc/expmed.c
>>> ===================================================================
>>> --- gcc/expmed.c        2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/expmed.c        2017-11-20 20:37:51.661320782 +0000
>>> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
>>>           PUT_MODE (all->zext, wider_mode);
>>>           PUT_MODE (all->wide_mult, wider_mode);
>>>           PUT_MODE (all->wide_lshr, wider_mode);
>>> -         XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
>>> +         XEXP (all->wide_lshr, 1)
>>> +           = gen_int_shift_amount (wider_mode, mode_bitsize);
>>>
>>>           set_mul_widen_cost (speed, wider_mode,
>>> set_src_cost (all->wide_mult, wider_mode, speed));
>>> @@ -909,12 +910,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
>>>              to make sure that for big-endian machines the higher order
>>>              bits are used.  */
>>>           if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
>>> -           value_word = simplify_expand_binop (word_mode, lshr_optab,
>>> -                                               value_word,
>>> -                                               GEN_INT (BITS_PER_WORD
>>> -                                                        - new_bitsize),
>>> -                                               NULL_RTX, true,
>>> -                                               OPTAB_LIB_WIDEN);
>>> +           {
>>> +             int shift = BITS_PER_WORD - new_bitsize;
>>> +             rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
>>> +             value_word = simplify_expand_binop (word_mode, lshr_optab,
>>> +                                                 value_word, shift_rtx,
>>> +                                                 NULL_RTX, true,
>>> +                                                 OPTAB_LIB_WIDEN);
>>> +           }
>>>
>>>           if (!store_bit_field_1 (op0, new_bitsize,
>>>                                   bitnum + bit_offset,
>>> @@ -2365,8 +2368,9 @@ expand_shift_1 (enum tree_code code, mac
>>>        if (CONST_INT_P (op1)
>>>           && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
>>>               (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
>>> -       op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
>>> -                      % GET_MODE_BITSIZE (scalar_mode));
>>> +       op1 = gen_int_shift_amount (mode,
>>> +                                   (unsigned HOST_WIDE_INT) INTVAL (op1)
>>> +                                   % GET_MODE_BITSIZE (scalar_mode));
>>>        else if (GET_CODE (op1) == SUBREG
>>>                && subreg_lowpart_p (op1)
>>>                && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
>>> @@ -2383,7 +2387,8 @@ expand_shift_1 (enum tree_code code, mac
>>>        && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
>>>                    GET_MODE_BITSIZE (scalar_mode) - 1))
>>>      {
>>> -      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>>> +      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
>>> +                                        - INTVAL (op1)));
>>>        left = !left;
>>>        code = left ? LROTATE_EXPR : RROTATE_EXPR;
>>>      }
>>> @@ -2463,8 +2468,8 @@ expand_shift_1 (enum tree_code code, mac
>>>               if (op1 == const0_rtx)
>>>                 return shifted;
>>>               else if (CONST_INT_P (op1))
>>> -               other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
>>> -                                       - INTVAL (op1));
>>> +               other_amount = gen_int_shift_amount
>>> +                 (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>>>               else
>>>                 {
>>>                   other_amount
>>> @@ -2537,8 +2542,9 @@ expand_shift_1 (enum tree_code code, mac
>>>  expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>>>               int amount, rtx target, int unsignedp)
>>>  {
>>> -  return expand_shift_1 (code, mode,
>>> -                        shifted, GEN_INT (amount), target, unsignedp);
>>> +  return expand_shift_1 (code, mode, shifted,
>>> +                        gen_int_shift_amount (mode, amount),
>>> +                        target, unsignedp);
>>>  }
>>>
>>>  /* Likewise, but return 0 if that cannot be done.  */
>>> @@ -3856,7 +3862,7 @@ expand_smod_pow2 (scalar_int_mode mode,
>>>         {
>>>           HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
>>>           signmask = force_reg (mode, signmask);
>>> -         shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
>>> +         shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>>>
>>>           /* Use the rtx_cost of a LSHIFTRT instruction to determine
>>>              which instruction sequence to use.  If logical right shifts
>>> Index: gcc/lower-subreg.c
>>> ===================================================================
>>> --- gcc/lower-subreg.c  2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/lower-subreg.c  2017-11-20 20:37:51.661320782 +0000
>>> @@ -141,7 +141,7 @@ shift_cost (bool speed_p, struct cost_rt
>>>    PUT_CODE (rtxes->shift, code);
>>>    PUT_MODE (rtxes->shift, mode);
>>>    PUT_MODE (rtxes->source, mode);
>>> -  XEXP (rtxes->shift, 1) = GEN_INT (op1);
>>> +  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
>>>    return set_src_cost (rtxes->shift, mode, speed_p);
>>>  }
>>>
>>> Index: gcc/simplify-rtx.c
>>> ===================================================================
>>> --- gcc/simplify-rtx.c  2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/simplify-rtx.c  2017-11-20 20:37:51.663320783 +0000
>>> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
>>>           if (STORE_FLAG_VALUE == 1)
>>>             {
>>>               temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
>>> -                                         GEN_INT (isize - 1));
>>> +                                         gen_int_shift_amount (inner,
>>> +                                                               isize - 1));
>>>               if (int_mode == inner)
>>>                 return temp;
>>>               if (GET_MODE_PRECISION (int_mode) > isize)
>>> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
>>>           else if (STORE_FLAG_VALUE == -1)
>>>             {
>>>               temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
>>> -                                         GEN_INT (isize - 1));
>>> +                                         gen_int_shift_amount (inner,
>>> +                                                               isize - 1));
>>>               if (int_mode == inner)
>>>                 return temp;
>>>               if (GET_MODE_PRECISION (int_mode) > isize)
>>> @@ -2672,7 +2674,8 @@ simplify_binary_operation_1 (enum rtx_co
>>>         {
>>>           val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>>>           if (val >= 0)
>>> -           return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
>>> +           return simplify_gen_binary (ASHIFT, mode, op0,
>>> +                                       gen_int_shift_amount (mode, val));
>>>         }
>>>
>>>        /* x*2 is x+x and x*(-1) is -x */
>>> @@ -3296,7 +3299,8 @@ simplify_binary_operation_1 (enum rtx_co
>>>        /* Convert divide by power of two into shift.  */
>>>        if (CONST_INT_P (trueop1)
>>>           && (val = exact_log2 (UINTVAL (trueop1))) > 0)
>>> -       return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
>>> +       return simplify_gen_binary (LSHIFTRT, mode, op0,
>>> +                                   gen_int_shift_amount (mode, val));
>>>        break;
>>>
>>>      case DIV:
>>> @@ -3416,10 +3420,12 @@ simplify_binary_operation_1 (enum rtx_co
>>>           && IN_RANGE (INTVAL (trueop1),
>>>                        GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
>>>                        GET_MODE_UNIT_PRECISION (mode) - 1))
>>> -       return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>>> -                                   mode, op0,
>>> -                                   GEN_INT (GET_MODE_UNIT_PRECISION (mode)
>>> -                                            - INTVAL (trueop1)));
>>> +       {
>>> +         int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
>>> +         rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
>>> +         return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>>> +                                     mode, op0, new_amount_rtx);
>>> +       }
>>>  #endif
>>>        /* FALLTHRU */
>>>      case ASHIFTRT:
>>> @@ -3460,8 +3466,8 @@ simplify_binary_operation_1 (enum rtx_co
>>>               == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
>>>           && subreg_lowpart_p (op0))
>>>         {
>>> -         rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
>>> -                            + INTVAL (op1));
>>> +         rtx tmp = gen_int_shift_amount
>>> +           (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
>>>           tmp = simplify_gen_binary (code, inner_mode,
>>>                                      XEXP (SUBREG_REG (op0), 0),
>>>                                      tmp);
>>> @@ -3472,7 +3478,8 @@ simplify_binary_operation_1 (enum rtx_co
>>>         {
>>>           val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
>>>           if (val != INTVAL (op1))
>>> -           return simplify_gen_binary (code, mode, op0, GEN_INT (val));
>>> +           return simplify_gen_binary (code, mode, op0,
>>> +                                       gen_int_shift_amount (mode, val));
>>>         }
>>>        break;
>>>
>>> Index: gcc/combine.c
>>> ===================================================================
>>> --- gcc/combine.c       2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/combine.c       2017-11-20 20:37:51.659320782 +0000
>>> @@ -3792,8 +3792,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>>>               && INTVAL (XEXP (*split, 1)) > 0
>>>               && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
>>>             {
>>> +             rtx i_rtx = gen_int_shift_amount (split_mode, i);
>>>               SUBST (*split, gen_rtx_ASHIFT (split_mode,
>>> -                                            XEXP (*split, 0), GEN_INT (i)));
>>> +                                            XEXP (*split, 0), i_rtx));
>>>               /* Update split_code because we may not have a multiply
>>>                  anymore.  */
>>>               split_code = GET_CODE (*split);
>>> @@ -3807,8 +3808,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>>>               && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>>>             {
>>>               rtx nsplit = XEXP (*split, 0);
>>> +             rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
>>>               SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
>>> -                                            XEXP (nsplit, 0), GEN_INT (i)));
>>> +                                                      XEXP (nsplit, 0),
>>> +                                                      i_rtx));
>>>               /* Update split_code because we may not have a multiply
>>>                  anymore.  */
>>>               split_code = GET_CODE (*split);
>>> @@ -5077,12 +5080,12 @@ find_split_point (rtx *loc, rtx_insn *in
>>>                                       GET_MODE (XEXP (SET_SRC (x), 0))))))
>>>             {
>>>               machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
>>> -
>>> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>>>               SUBST (SET_SRC (x),
>>>                      gen_rtx_NEG (mode,
>>>                                   gen_rtx_LSHIFTRT (mode,
>>>                                                     XEXP (SET_SRC (x), 0),
>>> -                                                   GEN_INT (pos))));
>>> +                                                   pos_rtx)));
>>>
>>>               split = find_split_point (&SET_SRC (x), insn, true);
>>>               if (split && split != &SET_SRC (x))
>>> @@ -5140,11 +5143,11 @@ find_split_point (rtx *loc, rtx_insn *in
>>>             {
>>>               unsigned HOST_WIDE_INT mask
>>>                 = (HOST_WIDE_INT_1U << len) - 1;
>>> +             rtx pos_rtx = gen_int_shift_amount (mode, pos);
>>>               SUBST (SET_SRC (x),
>>>                      gen_rtx_AND (mode,
>>>                                   gen_rtx_LSHIFTRT
>>> -                                 (mode, gen_lowpart (mode, inner),
>>> -                                  GEN_INT (pos)),
>>> +                                 (mode, gen_lowpart (mode, inner), pos_rtx),
>>>                                   gen_int_mode (mask, mode)));
>>>
>>>               split = find_split_point (&SET_SRC (x), insn, true);
>>> @@ -5153,14 +5156,15 @@ find_split_point (rtx *loc, rtx_insn *in
>>>             }
>>>           else
>>>             {
>>> +             int left_bits = GET_MODE_PRECISION (mode) - len - pos;
>>> +             int right_bits = GET_MODE_PRECISION (mode) - len;
>>>               SUBST (SET_SRC (x),
>>>                      gen_rtx_fmt_ee
>>>                      (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
>>>                       gen_rtx_ASHIFT (mode,
>>>                                       gen_lowpart (mode, inner),
>>> -                                     GEN_INT (GET_MODE_PRECISION (mode)
>>> -                                              - len - pos)),
>>> -                     GEN_INT (GET_MODE_PRECISION (mode) - len)));
>>> +                                     gen_int_shift_amount (mode, left_bits)),
>>> +                     gen_int_shift_amount (mode, right_bits)));
>>>
>>>               split = find_split_point (&SET_SRC (x), insn, true);
>>>               if (split && split != &SET_SRC (x))
>>> @@ -8935,10 +8939,11 @@ force_int_to_mode (rtx x, scalar_int_mod
>>>           /* Must be more sign bit copies than the mask needs.  */
>>>           && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>>>               >= exact_log2 (mask + 1)))
>>> -       x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>>> -                                GEN_INT (GET_MODE_PRECISION (xmode)
>>> -                                         - exact_log2 (mask + 1)));
>>> -
>>> +       {
>>> +         int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
>>> +         x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>>> +                                  gen_int_shift_amount (xmode, nbits));
>>> +       }
>>>        goto shiftrt;
>>>
>>>      case ASHIFTRT:
>>> @@ -10431,7 +10436,7 @@ simplify_shift_const_1 (enum rtx_code co
>>>  {
>>>    enum rtx_code orig_code = code;
>>>    rtx orig_varop = varop;
>>> -  int count;
>>> +  int count, log2;
>>>    machine_mode mode = result_mode;
>>>    machine_mode shift_mode;
>>> scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode,
>> int_result_mode;
>>> @@ -10634,13 +10639,11 @@ simplify_shift_const_1 (enum rtx_code co
>>>              is cheaper.  But it is still better on those machines to
>>>              merge two shifts into one.  */
>>>           if (CONST_INT_P (XEXP (varop, 1))
>>> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>>> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>>>             {
>>> -             varop
>>> -               = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>>> -                                      XEXP (varop, 0),
>>> -                                      GEN_INT (exact_log2 (
>>> -                                               UINTVAL (XEXP (varop, 1)))));
>>> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>>> +             varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>>> +                                          XEXP (varop, 0), log2_rtx);
>>>               continue;
>>>             }
>>>           break;
>>> @@ -10648,13 +10651,11 @@ simplify_shift_const_1 (enum rtx_code co
>>>         case UDIV:
>>>           /* Similar, for when divides are cheaper.  */
>>>           if (CONST_INT_P (XEXP (varop, 1))
>>> -             && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>>> +             && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>>>             {
>>> -             varop
>>> -               = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>>> -                                      XEXP (varop, 0),
>>> -                                      GEN_INT (exact_log2 (
>>> -                                               UINTVAL (XEXP (varop, 1)))));
>>> +             rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>>> +             varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>>> +                                          XEXP (varop, 0), log2_rtx);
>>>               continue;
>>>             }
>>>           break;
>>> @@ -10789,10 +10790,10 @@ simplify_shift_const_1 (enum rtx_code co
>>>
>>>               mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>>>                                        int_result_mode);
>>> -
>>> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>>>               mask_rtx
>>>                 = simplify_const_binary_operation (code, int_result_mode,
>>> -                                                  mask_rtx, GEN_INT (count));
>>> +                                                  mask_rtx, count_rtx);
>>>
>>>               /* Give up if we can't compute an outer operation to use.  */
>>>               if (mask_rtx == 0
>>> @@ -10848,9 +10849,10 @@ simplify_shift_const_1 (enum rtx_code co
>>>               if (code == ASHIFTRT && int_mode != int_result_mode)
>>>                 break;
>>>
>>> +             rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>>>               rtx new_rtx = simplify_const_binary_operation (code, int_mode,
>>>                                                              XEXP (varop, 0),
>>> -                                                            GEN_INT (count));
>>> +                                                            count_rtx);
>>> varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
>>>               count = 0;
>>>               continue;
>>> @@ -10916,7 +10918,7 @@ simplify_shift_const_1 (enum rtx_code co
>>>               && (new_rtx = simplify_const_binary_operation
>>>                   (code, int_result_mode,
>>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>>> -                  GEN_INT (count))) != 0
>>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>>               && CONST_INT_P (new_rtx)
>>>               && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
>>>                                   INTVAL (new_rtx), int_result_mode,
>>> @@ -11059,7 +11061,7 @@ simplify_shift_const_1 (enum rtx_code co
>>>               && (new_rtx = simplify_const_binary_operation
>>>                   (ASHIFT, int_result_mode,
>>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>>> -                  GEN_INT (count))) != 0
>>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>>               && CONST_INT_P (new_rtx)
>>>               && merge_outer_ops (&outer_op, &outer_const, PLUS,
>>>                                   INTVAL (new_rtx), int_result_mode,
>>> @@ -11080,7 +11082,7 @@ simplify_shift_const_1 (enum rtx_code co
>>>               && (new_rtx = simplify_const_binary_operation
>>>                   (code, int_result_mode,
>>>                    gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>>> -                  GEN_INT (count))) != 0
>>> +                  gen_int_shift_amount (int_result_mode, count))) != 0
>>>               && CONST_INT_P (new_rtx)
>>>               && merge_outer_ops (&outer_op, &outer_const, XOR,
>>>                                   INTVAL (new_rtx), int_result_mode,
>>> @@ -11135,12 +11137,12 @@ simplify_shift_const_1 (enum rtx_code co
>>>                       - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
>>>             {
>>>               rtx varop_inner = XEXP (varop, 0);
>>> -
>>> -             varop_inner
>>> -               = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>>> -                                   XEXP (varop_inner, 0),
>>> -                                   GEN_INT
>>> -                                   (count + INTVAL (XEXP (varop_inner, 1))));
>>> +             int new_count = count + INTVAL (XEXP (varop_inner, 1));
>>> + rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
>>> +                                                       new_count);
>>> +             varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>>> +                                             XEXP (varop_inner, 0),
>>> +                                             new_count_rtx);
>>>               varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
>>>               count = 0;
>>>               continue;
>>> @@ -11192,7 +11194,8 @@ simplify_shift_const_1 (enum rtx_code co
>>>      x = NULL_RTX;
>>>
>>>    if (x == NULL_RTX)
>>> -    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
>>> +    x = simplify_gen_binary (code, shift_mode, varop,
>>> +                            gen_int_shift_amount (shift_mode, count));
>>>
>>>    /* If we were doing an LSHIFTRT in a wider mode than it was originally,
>>>       turn off all the bits that the shift would have turned off.  */
>>> @@ -11254,7 +11257,8 @@ simplify_shift_const (rtx x, enum rtx_co
>>>      return tem;
>>>
>>>    if (!x)
>>> -    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
>>> +    x = simplify_gen_binary (code, GET_MODE (varop), varop,
>>> +                            gen_int_shift_amount (GET_MODE (varop), count));
>>>    if (GET_MODE (x) != result_mode)
>>>      x = gen_lowpart (result_mode, x);
>>>    return x;
>>> @@ -11445,8 +11449,9 @@ change_zero_ext (rtx pat)
>>>           if (BITS_BIG_ENDIAN)
>>>             start = GET_MODE_PRECISION (inner_mode) - size - start;
>>>
>>> -         if (start)
>>> -           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
>>> +         if (start != 0)
>>> +           x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
>>> +                                 gen_int_shift_amount (inner_mode, start));
>>>           else
>>>             x = XEXP (x, 0);
>>>           if (mode != inner_mode)
>>> Index: gcc/optabs.c
>>> ===================================================================
>>> --- gcc/optabs.c        2017-11-20 20:37:41.918226976 +0000
>>> +++ gcc/optabs.c        2017-11-20 20:37:51.662320782 +0000
>>> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
>>>        if (binoptab != ashr_optab)
>>>         emit_move_insn (outof_target, CONST0_RTX (word_mode));
>>>        else
>>> -       if (!force_expand_binop (word_mode, binoptab,
>>> -                                outof_input, GEN_INT (BITS_PER_WORD - 1),
>>> +       if (!force_expand_binop (word_mode, binoptab, outof_input,
>>> +                                gen_int_shift_amount (word_mode,
>>> +                                                      BITS_PER_WORD - 1),
>>>                                  outof_target, unsignedp, methods))
>>>           return false;
>>>      }
>>> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
>>>  {
>>>    int low = (WORDS_BIG_ENDIAN ? 1 : 0);
>>>    int high = (WORDS_BIG_ENDIAN ? 0 : 1);
>>> -  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
>>> +  rtx wordm1 = (umulp ? NULL_RTX
>>> +               : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
>>>    rtx product, adjust, product_high, temp;
>>>
>>>    rtx op0_high = operand_subword_force (op0, high, mode);
>>> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
>>>        unsigned int bits = GET_MODE_PRECISION (int_mode);
>>>
>>>        if (CONST_INT_P (op1))
>>> -        newop1 = GEN_INT (bits - INTVAL (op1));
>>> +       newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
>>>        else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
>>>          newop1 = negate_rtx (GET_MODE (op1), op1);
>>>        else
>>> @@ -1403,7 +1405,7 @@ expand_binop (machine_mode mode, optab b
>>>
>>>        /* Apply the truncation to constant shifts.  */
>>>        if (double_shift_mask > 0 && CONST_INT_P (op1))
>>> -       op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
>>> +       op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>>>
>>>        if (op1 == CONST0_RTX (op1_mode))
>>>         return op0;
>>> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
>>>        else
>>>         {
>>>           rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
>>> -         rtx first_shift_count, second_shift_count;
>>> +         HOST_WIDE_INT first_shift_count, second_shift_count;
>>>           optab reverse_unsigned_shift, unsigned_shift;
>>>
>>>           reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
>>> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>>>
>>>           if (shift_count > BITS_PER_WORD)
>>>             {
>>> -             first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
>>> -             second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
>>> +             first_shift_count = shift_count - BITS_PER_WORD;
>>> +             second_shift_count = 2 * BITS_PER_WORD - shift_count;
>>>             }
>>>           else
>>>             {
>>> -             first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
>>> -             second_shift_count = GEN_INT (shift_count);
>>> +             first_shift_count = BITS_PER_WORD - shift_count;
>>> +             second_shift_count = shift_count;
>>>             }
>>> +         rtx first_shift_count_rtx
>>> +           = gen_int_shift_amount (word_mode, first_shift_count);
>>> +         rtx second_shift_count_rtx
>>> +           = gen_int_shift_amount (word_mode, second_shift_count);
>>>
>>>           into_temp1 = expand_binop (word_mode, unsigned_shift,
>>> -                                    outof_input, first_shift_count,
>>> +                                    outof_input, first_shift_count_rtx,
>>>                                      NULL_RTX, unsignedp, next_methods);
>>>           into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>>> -                                    into_input, second_shift_count,
>>> +                                    into_input, second_shift_count_rtx,
>>>                                      NULL_RTX, unsignedp, next_methods);
>>>
>>>           if (into_temp1 != 0 && into_temp2 != 0)
>>> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
>>>             emit_move_insn (into_target, inter);
>>>
>>>           outof_temp1 = expand_binop (word_mode, unsigned_shift,
>>> -                                     into_input, first_shift_count,
>>> +                                     into_input, first_shift_count_rtx,
>>>                                       NULL_RTX, unsignedp, next_methods);
>>>           outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>>> -                                     outof_input, second_shift_count,
>>> +                                     outof_input, second_shift_count_rtx,
>>>                                       NULL_RTX, unsignedp, next_methods);
>>>
>>>           if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
>>> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>>>
>>>           if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
>>>             {
>>> -             temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
>>> -                                  unsignedp, OPTAB_DIRECT);
>>> +             temp = expand_binop (mode, rotl_optab, op0,
>>> +                                  gen_int_shift_amount (mode, 8),
>>> +                                  target, unsignedp, OPTAB_DIRECT);
>>>               if (temp)
>>>                 return temp;
>>>              }
>>>
>>>           if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
>>>             {
>>> -             temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
>>> -                                  unsignedp, OPTAB_DIRECT);
>>> +             temp = expand_binop (mode, rotr_optab, op0,
>>> +                                  gen_int_shift_amount (mode, 8),
>>> +                                  target, unsignedp, OPTAB_DIRECT);
>>>               if (temp)
>>>                 return temp;
>>>             }
>>>
>>>           last = get_last_insn ();
>>>
>>> -         temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
>>> +         temp1 = expand_binop (mode, ashl_optab, op0,
>>> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>>>                                 unsignedp, OPTAB_WIDEN);
>>> -         temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
>>> +         temp2 = expand_binop (mode, lshr_optab, op0,
>>> +                               gen_int_shift_amount (mode, 8), NULL_RTX,
>>>                                 unsignedp, OPTAB_WIDEN);
>>>           if (temp1 && temp2)
>>>             {
>>> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
>>>  }
>>>
>>> /* Checks if vec_perm mask SEL is a constant equivalent to a shift of
>> the first
>>> - vec_perm operand, assuming the second operand is a constant vector
>> of zeroes.
>>> - Return the shift distance in bits if so, or NULL_RTX if the vec_perm
>> is not a
>>> -   shift.  */
>>> +   vec_perm operand (which has mode OP0_MODE), assuming the second
>>> +   operand is a constant vector of zeroes.  Return the shift distance in
>>> +   bits if so, or NULL_RTX if the vec_perm is not a shift.  */
>>>  static rtx
>>> -shift_amt_for_vec_perm_mask (rtx sel)
>>> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
>>>  {
>>>    unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>>>    unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
>>> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>>>         return NULL_RTX;
>>>      }
>>>
>>> -  return GEN_INT (first * bitsize);
>>> +  return gen_int_shift_amount (op0_mode, first * bitsize);
>>>  }
>>>
>>>  /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
>>> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
>>>           && (shift_code != CODE_FOR_nothing
>>>               || shift_code_qi != CODE_FOR_nothing))
>>>         {
>>> -         shift_amt = shift_amt_for_vec_perm_mask (sel);
>>> +         shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>>>           if (shift_amt)
>>>             {
>>>               struct expand_operand ops[3];
>>> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
>>>                                    NULL, 0, OPTAB_DIRECT);
>>>        else
>>>         sel = expand_simple_binop (selmode, ASHIFT, sel,
>>> -                                  GEN_INT (exact_log2 (u)),
>>> +                                  gen_int_shift_amount (selmode,
>>> +                                                        exact_log2 (u)),
>>>                                    NULL, 0, OPTAB_DIRECT);
>>>        gcc_assert (sel != NULL);
>>>

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-12-15  8:58     ` Richard Biener
@ 2017-12-15 12:52       ` Richard Sandiford
  2017-12-15 13:20         ` Richard Biener
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-12-15 12:52 UTC (permalink / raw)
  To: Richard Biener; +Cc: GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
>> isn't needed with the new VECTOR_CST layout.  It's really just the
>> original patch with bits removed, but just in case:
>>
>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>> OK to install?
>
> To keep things simple at this point OK.  Note that I'd eventually
> like to see this as VEC_PERM_EXPR <scalar_type_1, scalar_type_1, { 0, ... }>.
> For reductions when we need { x, 0, ... } we now have to use a
> VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR
> to merge it with {0, ... }, right?  Rather than VEC_PERM_EXPR <x_1, 0,
> { 0, 1, 1, 1.... }>

That's where the shift-left-and-insert-scalar thing (IFN_SHL_INSERT)
comes in.  But yeah, allowing scalars as operands to VEC_PERM_EXPRs
would mean it could represent both VEC_DUPLICATE_EXPR and IFN_SHL_INSERT.
I guess the question is whether that's better than extending CONSTRUCTOR
(or a replacement) to use the VECTOR_CST encoding.  I realise you don't
like CONSTRUCTOR in gimple though...

I promise to look at either of those for GCC 9 if you think they're
better, but they'll be more invasive for other targets.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
  2017-12-15 12:52       ` Richard Sandiford
@ 2017-12-15 13:20         ` Richard Biener
  0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-12-15 13:20 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, Richard Sandiford

On Fri, Dec 15, 2017 at 1:52 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Fri, Dec 15, 2017 at 1:29 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch just adds VEC_DUPLICATE_EXPR, since the VEC_DUPLICATE_CST
>>> isn't needed with the new VECTOR_CST layout.  It's really just the
>>> original patch with bits removed, but just in case:
>>>
>>> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64-linux-gnu.
>>> OK to install?
>>
>> To keep things simple at this point OK.  Note that I'd eventually
>> like to see this as VEC_PERM_EXPR <scalar_type_1, scalar_type_1, { 0, ... }>.
>> For reductions when we need { x, 0, ... } we now have to use a
>> VEC_DUPLICATE_EXPR to make x a vector and then a VEC_PERM_EXPR
>> to merge it with {0, ... }, right?  Rather than VEC_PERM_EXPR <x_1, 0,
>> { 0, 1, 1, 1.... }>
>
> That's where the shift-left-and-insert-scalar thing (IFN_SHL_INSERT)
> comes in.  But yeah, allowing scalars as operands to VEC_PERM_EXPRs
> would mean it could represent both VEC_DUPLICATE_EXPR and IFN_SHL_INSERT.
> I guess the question is whether that's better than extending CONSTRUCTOR
> (or a replacement) to use the VECTOR_CST encoding.  I realise you don't
> like CONSTRUCTOR in gimple though...
>
> I promise to look at either of those for GCC 9 if you think they're
> better, but they'll be more invasive for other targets.

Thanks.
Richard.

> Thanks,
> Richard

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-12-15  9:06             ` Richard Biener
@ 2017-12-15 15:17               ` Richard Sandiford
  2017-12-19 19:13                 ` Richard Sandiford
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-12-15 15:17 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

Richard Biener <richard.guenther@gmail.com> writes:
> On Fri, Dec 15, 2017 at 1:48 AM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>>>>> <richard.guenther@gmail.com> wrote:
>>>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>> This patch adds a stub helper routine to provide the mode
>>>>>>> of a scalar shift amount, given the mode of the values
>>>>>>> being shifted.
>>>>>>>
>>>>>>> One long-standing problem has been to decide what this mode
>>>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>>>> the corresponding target pattern says?  (In which case what
>>>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>>>
>>>>>>> For now the patch picks word_mode, which should be safe on
>>>>>>> all targets but could perhaps become suboptimal if the helper
>>>>>>> routine is used more often than it is in this patch.  As it
>>>>>>> stands the patch does not change the generated code.
>>>>>>>
>>>>>>> The patch also adds a helper function that constructs rtxes
>>>>>>> for constant shift amounts, again given the mode of the value
>>>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>>>
>>>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>>>> looks good to me I'd rather have that ??? in this function (and
>>>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>>
>>>> OK.  I'd gone for word_mode because that's what expand_binop uses
>>>> for CONST_INTs:
>>>>
>>>>       op1_mode = (GET_MODE (op1) != VOIDmode
>>>>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>>>>                   : word_mode);
>>>>
>>>> But using the inner mode should be fine too.  The patch below does that.
>>>>
>>>>>> In the end it's up to insn recognizing to convert the op to the
>>>>>> expected mode and for generic RTL it's us that should decide
>>>>>> on the mode -- on GENERIC the shift amount has to be an
>>>>>> integer so why not simply use a mode that is large enough to
>>>>>> make the constant fit?
>>>>
>>>> ...but I can do that instead if you think it's better.
>>>>
>>>>>> Just throwing in some comments here, RTL isn't my primary
>>>>>> expertise.
>>>>>
>>>>> To add a little bit - shift amounts is maybe the only(?) place
>>>>> where a modeless CONST_INT makes sense!  So "fixing"
>>>>> that first sounds backwards.
>>>>
>>>> But even here they have a mode conceptually, since out-of-range shift
>>>> amounts are target-defined rather than undefined.  E.g. if the target
>>>> interprets the shift amount as unsigned, then for a shift amount
>>>> (const_int -1) it matters whether the mode is QImode (and so we're
>>>> shifting by 255) or HImode (and so we're shifting by 65535.
>>>
>>> I think RTL is well-defined (at least I hope so ...) and machine constraints
>>> need to be modeled explicitely (like embedding an implicit bit_and in
>>> shift patterns).
>>
>> Well, RTL is well-defined in the sense that if you have
>>
>>   (ashift X (foo:HI ...))
>>
>> then the shift amount must be interpreted as HImode rather than some
>> other mode.  The problem here is to define a default choice of mode for
>> const_ints, in cases where the shift is being created out of the blue.
>>
>> Whether the shift amount is effectively signed or unsigned isn't defined
>> by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
>> out-of-range values, and the behaviour for out-of-range RTL shifts is
>> specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.
>>
>> I think the revised patch does implement your suggestion of using the
>> integer equivalent of the inner mode as the default, but we need to
>> decide whether to go with it, go with the original word_mode approach
>> (taken from existing expand_binop code) or something else.  Something
>> else could include the widest supported integer mode, so that we never
>> change the value.
>
> I guess it's pretty arbitrary what we choose (but we might need to adjust
> targets?).  For something like this an appealing choice would be sth
> that is host and target idependent, like [u]int32_t or given CONST_INT
> is always 64bits now and signed int64_t aka HOST_WIDE_INT (bad
> name now).  That means it's the "infinite precision" thing that fits
> into CONST_INT ;)

Sounds OK to me.  How about the attached?

Thanks,
Richard


2017-12-15  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* emit-rtl.h (gen_int_shift_amount): Declare.
	* emit-rtl.c (gen_int_shift_amount): New function.
	* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
	instead of GEN_INT.
	* calls.c (shift_return_value): Likewise.
	* cse.c (fold_rtx): Likewise.
	* dse.c (find_shift_sequence): Likewise.
	* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
	(expand_shift, expand_smod_pow2): Likewise.
	* lower-subreg.c (shift_cost): Likewise.
	* optabs.c (expand_superword_shift, expand_doubleword_mult)
	(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
	(expand_vec_perm_var): Likewise.
	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
	(simplify_binary_operation_1): Likewise.
	* combine.c (try_combine, find_split_point, force_int_to_mode)
	(simplify_shift_const_1, simplify_shift_const): Likewise.
	(change_zero_ext): Likewise.  Use simplify_gen_binary.

Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-12-15 15:14:43.101350556 +0000
+++ gcc/emit-rtl.h	2017-12-15 15:14:43.345343745 +0000
@@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
 extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
 extern void adjust_reg_mode (rtx, machine_mode);
 extern int mem_expr_equal_p (const_tree, const_tree);
+extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
 
 extern bool need_atomic_barrier_p (enum memmodel, bool);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/emit-rtl.c	2017-12-15 15:14:43.345343745 +0000
@@ -6418,6 +6418,21 @@ need_atomic_barrier_p (enum memmodel mod
     }
 }
 
+/* Return a constant shift amount for shifting a value of mode MODE
+   by VALUE bits.  */
+
+rtx
+gen_int_shift_amount (machine_mode, HOST_WIDE_INT value)
+{
+  /* Try to use a 64-bit mode, to avoid any truncation, but honor
+     MAX_FIXED_MODE_SIZE if that's smaller.
+
+     ??? Perhaps this should be automatically derived from the .md files
+     instead, or perhaps have a target hook.  */
+  scalar_int_mode shift_mode = int_mode_for_size (64, 1).require ();
+  return gen_int_mode (value, shift_mode);
+}
+
 /* Initialize fields of rtl_data related to stack alignment.  */
 
 void
Index: gcc/asan.c
===================================================================
--- gcc/asan.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/asan.c	2017-12-15 15:14:43.342343829 +0000
@@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
   TREE_ASM_WRITTEN (id) = 1;
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
-			      GEN_INT (ASAN_SHADOW_SHIFT),
+			      gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
 			      NULL_RTX, 1, OPTAB_DIRECT);
   shadow_base
     = plus_constant (Pmode, shadow_base,
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/calls.c	2017-12-15 15:14:43.342343829 +0000
@@ -2900,15 +2900,17 @@ shift_return_value (machine_mode mode, b
   HOST_WIDE_INT shift;
 
   gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
-  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
+  machine_mode value_mode = GET_MODE (value);
+  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
   if (shift == 0)
     return false;
 
   /* Use ashr rather than lshr for right shifts.  This is for the benefit
      of the MIPS port, which requires SImode values to be sign-extended
      when stored in 64-bit registers.  */
-  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
-			   value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
+  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
+			   value, gen_int_shift_amount (value_mode, shift),
+			   value, 1, OPTAB_WIDEN))
     gcc_unreachable ();
   return true;
 }
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/cse.c	2017-12-15 15:14:43.344343773 +0000
@@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (const_arg1) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
-						& (GET_MODE_UNIT_BITSIZE (mode)
-						   - 1));
+		    canon_const_arg1 = gen_int_shift_amount
+		      (mode, (INTVAL (const_arg1)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (inner_const) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    inner_const = GEN_INT (INTVAL (inner_const)
-					   & (GET_MODE_UNIT_BITSIZE (mode)
-					      - 1));
+		    inner_const = gen_int_shift_amount
+		      (mode, (INTVAL (inner_const)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
 		  /* As an exception, we can turn an ASHIFTRT of this
 		     form into a shift of the number of bits - 1.  */
 		  if (code == ASHIFTRT)
-		    new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
+		    new_const = gen_int_shift_amount
+		      (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
 		  else if (!side_effects_p (XEXP (y, 0)))
 		    return CONST0_RTX (mode);
 		  else
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/dse.c	2017-12-15 15:14:43.345343745 +0000
@@ -1642,8 +1642,9 @@ find_shift_sequence (int access_size,
 				     store_mode, byte);
 	  if (ret && CONSTANT_P (ret))
 	    {
+	      rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
 	      ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
-						     ret, GEN_INT (shift));
+						     ret, shift_rtx);
 	      if (ret && CONSTANT_P (ret))
 		{
 		  byte = subreg_lowpart_offset (read_mode, new_mode);
@@ -1679,7 +1680,8 @@ find_shift_sequence (int access_size,
 	 of one dsp where the cost of these two was not the same.  But
 	 this really is a rare case anyway.  */
       target = expand_binop (new_mode, lshr_optab, new_reg,
-			     GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+			     gen_int_shift_amount (new_mode, shift),
+			     new_reg, 1, OPTAB_DIRECT);
 
       shift_seq = get_insns ();
       end_sequence ();
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/expmed.c	2017-12-15 15:14:43.346343717 +0000
@@ -223,7 +223,8 @@ init_expmed_one_mode (struct init_expmed
 	  PUT_MODE (all->zext, wider_mode);
 	  PUT_MODE (all->wide_mult, wider_mode);
 	  PUT_MODE (all->wide_lshr, wider_mode);
-	  XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
+	  XEXP (all->wide_lshr, 1)
+	    = gen_int_shift_amount (wider_mode, mode_bitsize);
 
 	  set_mul_widen_cost (speed, wider_mode,
 			      set_src_cost (all->wide_mult, wider_mode, speed));
@@ -910,12 +911,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     to make sure that for big-endian machines the higher order
 	     bits are used.  */
 	  if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
-	    value_word = simplify_expand_binop (word_mode, lshr_optab,
-						value_word,
-						GEN_INT (BITS_PER_WORD
-							 - new_bitsize),
-						NULL_RTX, true,
-						OPTAB_LIB_WIDEN);
+	    {
+	      int shift = BITS_PER_WORD - new_bitsize;
+	      rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
+	      value_word = simplify_expand_binop (word_mode, lshr_optab,
+						  value_word, shift_rtx,
+						  NULL_RTX, true,
+						  OPTAB_LIB_WIDEN);
+	    }
 
 	  if (!store_bit_field_1 (op0, new_bitsize,
 				  bitnum + bit_offset,
@@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
       if (CONST_INT_P (op1)
 	  && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
 	      (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
-	op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
-		       % GET_MODE_BITSIZE (scalar_mode));
+	op1 = gen_int_shift_amount (mode,
+				    (unsigned HOST_WIDE_INT) INTVAL (op1)
+				    % GET_MODE_BITSIZE (scalar_mode));
       else if (GET_CODE (op1) == SUBREG
 	       && subreg_lowpart_p (op1)
 	       && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
@@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
       && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
 		   GET_MODE_BITSIZE (scalar_mode) - 1))
     {
-      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
+      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
+					 - INTVAL (op1)));
       left = !left;
       code = left ? LROTATE_EXPR : RROTATE_EXPR;
     }
@@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
 	      if (op1 == const0_rtx)
 		return shifted;
 	      else if (CONST_INT_P (op1))
-		other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
-					- INTVAL (op1));
+		other_amount = gen_int_shift_amount
+		  (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
 	      else
 		{
 		  other_amount
@@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 	      int amount, rtx target, int unsignedp)
 {
-  return expand_shift_1 (code, mode,
-			 shifted, GEN_INT (amount), target, unsignedp);
+  return expand_shift_1 (code, mode, shifted,
+			 gen_int_shift_amount (mode, amount),
+			 target, unsignedp);
 }
 
 /* Likewise, but return 0 if that cannot be done.  */
@@ -3857,7 +3863,7 @@ expand_smod_pow2 (scalar_int_mode mode,
 	{
 	  HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
 	  signmask = force_reg (mode, signmask);
-	  shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
+	  shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
 
 	  /* Use the rtx_cost of a LSHIFTRT instruction to determine
 	     which instruction sequence to use.  If logical right shifts
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/lower-subreg.c	2017-12-15 15:14:43.346343717 +0000
@@ -141,7 +141,7 @@ shift_cost (bool speed_p, struct cost_rt
   PUT_CODE (rtxes->shift, code);
   PUT_MODE (rtxes->shift, mode);
   PUT_MODE (rtxes->source, mode);
-  XEXP (rtxes->shift, 1) = GEN_INT (op1);
+  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
   return set_src_cost (rtxes->shift, mode, speed_p);
 }
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/optabs.c	2017-12-15 15:14:43.347343689 +0000
@@ -421,8 +421,9 @@ expand_superword_shift (optab binoptab,
       if (binoptab != ashr_optab)
 	emit_move_insn (outof_target, CONST0_RTX (word_mode));
       else
-	if (!force_expand_binop (word_mode, binoptab,
-				 outof_input, GEN_INT (BITS_PER_WORD - 1),
+	if (!force_expand_binop (word_mode, binoptab, outof_input,
+				 gen_int_shift_amount (word_mode,
+						       BITS_PER_WORD - 1),
 				 outof_target, unsignedp, methods))
 	  return false;
     }
@@ -779,7 +780,8 @@ expand_doubleword_mult (machine_mode mod
 {
   int low = (WORDS_BIG_ENDIAN ? 1 : 0);
   int high = (WORDS_BIG_ENDIAN ? 0 : 1);
-  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
+  rtx wordm1 = (umulp ? NULL_RTX
+		: gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
   rtx product, adjust, product_high, temp;
 
   rtx op0_high = operand_subword_force (op0, high, mode);
@@ -1180,7 +1182,7 @@ expand_binop (machine_mode mode, optab b
       unsigned int bits = GET_MODE_PRECISION (int_mode);
 
       if (CONST_INT_P (op1))
-        newop1 = GEN_INT (bits - INTVAL (op1));
+	newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
       else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
         newop1 = negate_rtx (GET_MODE (op1), op1);
       else
@@ -1402,7 +1404,7 @@ expand_binop (machine_mode mode, optab b
 
       /* Apply the truncation to constant shifts.  */
       if (double_shift_mask > 0 && CONST_INT_P (op1))
-	op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
+	op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
 
       if (op1 == CONST0_RTX (op1_mode))
 	return op0;
@@ -1512,7 +1514,7 @@ expand_binop (machine_mode mode, optab b
       else
 	{
 	  rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
-	  rtx first_shift_count, second_shift_count;
+	  HOST_WIDE_INT first_shift_count, second_shift_count;
 	  optab reverse_unsigned_shift, unsigned_shift;
 
 	  reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
@@ -1523,20 +1525,24 @@ expand_binop (machine_mode mode, optab b
 
 	  if (shift_count > BITS_PER_WORD)
 	    {
-	      first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
-	      second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
+	      first_shift_count = shift_count - BITS_PER_WORD;
+	      second_shift_count = 2 * BITS_PER_WORD - shift_count;
 	    }
 	  else
 	    {
-	      first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
-	      second_shift_count = GEN_INT (shift_count);
+	      first_shift_count = BITS_PER_WORD - shift_count;
+	      second_shift_count = shift_count;
 	    }
+	  rtx first_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, first_shift_count);
+	  rtx second_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, second_shift_count);
 
 	  into_temp1 = expand_binop (word_mode, unsigned_shift,
-				     outof_input, first_shift_count,
+				     outof_input, first_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 	  into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				     into_input, second_shift_count,
+				     into_input, second_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 
 	  if (into_temp1 != 0 && into_temp2 != 0)
@@ -1549,10 +1555,10 @@ expand_binop (machine_mode mode, optab b
 	    emit_move_insn (into_target, inter);
 
 	  outof_temp1 = expand_binop (word_mode, unsigned_shift,
-				      into_input, first_shift_count,
+				      into_input, first_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 	  outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				      outof_input, second_shift_count,
+				      outof_input, second_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 
 	  if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
@@ -2792,25 +2798,29 @@ expand_unop (machine_mode mode, optab un
 
 	  if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotl_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	     }
 
 	  if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotr_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	    }
 
 	  last = get_last_insn ();
 
-	  temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp1 = expand_binop (mode, ashl_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
-	  temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp2 = expand_binop (mode, lshr_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
 	  if (temp1 && temp2)
 	    {
@@ -5392,7 +5402,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
 	return NULL_RTX;
     }
 
-  return GEN_INT (first * bitsize);
+  return gen_int_shift_amount (GET_MODE (sel), first * bitsize);
 }
 
 /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
@@ -5562,7 +5572,8 @@ expand_vec_perm (machine_mode mode, rtx
 				   NULL, 0, OPTAB_DIRECT);
       else
 	sel = expand_simple_binop (selmode, ASHIFT, sel,
-				   GEN_INT (exact_log2 (u)),
+				   gen_int_shift_amount (selmode,
+							 exact_log2 (u)),
 				   NULL, 0, OPTAB_DIRECT);
       gcc_assert (sel != NULL);
 
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/simplify-rtx.c	2017-12-15 15:14:43.347343689 +0000
@@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  if (STORE_FLAG_VALUE == 1)
 	    {
 	      temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  else if (STORE_FLAG_VALUE == -1)
 	    {
 	      temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -2672,7 +2674,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
 	  if (val >= 0)
-	    return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (ASHIFT, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
 
       /* x*2 is x+x and x*(-1) is -x */
@@ -3296,7 +3299,8 @@ simplify_binary_operation_1 (enum rtx_co
       /* Convert divide by power of two into shift.  */
       if (CONST_INT_P (trueop1)
 	  && (val = exact_log2 (UINTVAL (trueop1))) > 0)
-	return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
+	return simplify_gen_binary (LSHIFTRT, mode, op0,
+				    gen_int_shift_amount (mode, val));
       break;
 
     case DIV:
@@ -3416,10 +3420,12 @@ simplify_binary_operation_1 (enum rtx_co
 	  && IN_RANGE (INTVAL (trueop1),
 		       GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
 		       GET_MODE_UNIT_PRECISION (mode) - 1))
-	return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
-				    mode, op0,
-				    GEN_INT (GET_MODE_UNIT_PRECISION (mode)
-					     - INTVAL (trueop1)));
+	{
+	  int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
+	  rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
+	  return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
+				      mode, op0, new_amount_rtx);
+	}
 #endif
       /* FALLTHRU */
     case ASHIFTRT:
@@ -3460,8 +3466,8 @@ simplify_binary_operation_1 (enum rtx_co
 	      == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
 	  && subreg_lowpart_p (op0))
 	{
-	  rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
-			     + INTVAL (op1));
+	  rtx tmp = gen_int_shift_amount
+	    (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
 	  tmp = simplify_gen_binary (code, inner_mode,
 				     XEXP (SUBREG_REG (op0), 0),
 				     tmp);
@@ -3472,7 +3478,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
 	  if (val != INTVAL (op1))
-	    return simplify_gen_binary (code, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (code, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
       break;
 
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-12-15 15:14:43.101350556 +0000
+++ gcc/combine.c	2017-12-15 15:14:43.344343773 +0000
@@ -3804,8 +3804,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && INTVAL (XEXP (*split, 1)) > 0
 	      && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
 	    {
+	      rtx i_rtx = gen_int_shift_amount (split_mode, i);
 	      SUBST (*split, gen_rtx_ASHIFT (split_mode,
-					     XEXP (*split, 0), GEN_INT (i)));
+					     XEXP (*split, 0), i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -3819,8 +3820,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
 	    {
 	      rtx nsplit = XEXP (*split, 0);
+	      rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
 	      SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
-					     XEXP (nsplit, 0), GEN_INT (i)));
+						       XEXP (nsplit, 0),
+						       i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -5088,12 +5091,12 @@ find_split_point (rtx *loc, rtx_insn *in
 				      GET_MODE (XEXP (SET_SRC (x), 0))))))
 	    {
 	      machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
-
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_NEG (mode,
 				  gen_rtx_LSHIFTRT (mode,
 						    XEXP (SET_SRC (x), 0),
-						    GEN_INT (pos))));
+						    pos_rtx)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -5151,11 +5154,11 @@ find_split_point (rtx *loc, rtx_insn *in
 	    {
 	      unsigned HOST_WIDE_INT mask
 		= (HOST_WIDE_INT_1U << len) - 1;
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_AND (mode,
 				  gen_rtx_LSHIFTRT
-				  (mode, gen_lowpart (mode, inner),
-				   GEN_INT (pos)),
+				  (mode, gen_lowpart (mode, inner), pos_rtx),
 				  gen_int_mode (mask, mode)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
@@ -5164,14 +5167,15 @@ find_split_point (rtx *loc, rtx_insn *in
 	    }
 	  else
 	    {
+	      int left_bits = GET_MODE_PRECISION (mode) - len - pos;
+	      int right_bits = GET_MODE_PRECISION (mode) - len;
 	      SUBST (SET_SRC (x),
 		     gen_rtx_fmt_ee
 		     (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
 		      gen_rtx_ASHIFT (mode,
 				      gen_lowpart (mode, inner),
-				      GEN_INT (GET_MODE_PRECISION (mode)
-					       - len - pos)),
-		      GEN_INT (GET_MODE_PRECISION (mode) - len)));
+				      gen_int_shift_amount (mode, left_bits)),
+		      gen_int_shift_amount (mode, right_bits)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -8952,10 +8956,11 @@ force_int_to_mode (rtx x, scalar_int_mod
 	  /* Must be more sign bit copies than the mask needs.  */
 	  && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
 	      >= exact_log2 (mask + 1)))
-	x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
-				 GEN_INT (GET_MODE_PRECISION (xmode)
-					  - exact_log2 (mask + 1)));
-
+	{
+	  int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
+	  x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
+				   gen_int_shift_amount (xmode, nbits));
+	}
       goto shiftrt;
 
     case ASHIFTRT:
@@ -10448,7 +10453,7 @@ simplify_shift_const_1 (enum rtx_code co
 {
   enum rtx_code orig_code = code;
   rtx orig_varop = varop;
-  int count;
+  int count, log2;
   machine_mode mode = result_mode;
   machine_mode shift_mode;
   scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
@@ -10651,13 +10656,11 @@ simplify_shift_const_1 (enum rtx_code co
 	     is cheaper.  But it is still better on those machines to
 	     merge two shifts into one.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (ASHIFT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10665,13 +10668,11 @@ simplify_shift_const_1 (enum rtx_code co
 	case UDIV:
 	  /* Similar, for when divides are cheaper.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10806,10 +10807,10 @@ simplify_shift_const_1 (enum rtx_code co
 
 	      mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
 				       int_result_mode);
-
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      mask_rtx
 		= simplify_const_binary_operation (code, int_result_mode,
-						   mask_rtx, GEN_INT (count));
+						   mask_rtx, count_rtx);
 
 	      /* Give up if we can't compute an outer operation to use.  */
 	      if (mask_rtx == 0
@@ -10865,9 +10866,10 @@ simplify_shift_const_1 (enum rtx_code co
 	      if (code == ASHIFTRT && int_mode != int_result_mode)
 		break;
 
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      rtx new_rtx = simplify_const_binary_operation (code, int_mode,
 							     XEXP (varop, 0),
-							     GEN_INT (count));
+							     count_rtx);
 	      varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
 	      count = 0;
 	      continue;
@@ -10933,7 +10935,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
 				  INTVAL (new_rtx), int_result_mode,
@@ -11076,7 +11078,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (ASHIFT, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, PLUS,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11097,7 +11099,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, XOR,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11152,12 +11154,12 @@ simplify_shift_const_1 (enum rtx_code co
 		      - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
 	    {
 	      rtx varop_inner = XEXP (varop, 0);
-
-	      varop_inner
-		= gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
-				    XEXP (varop_inner, 0),
-				    GEN_INT
-				    (count + INTVAL (XEXP (varop_inner, 1))));
+	      int new_count = count + INTVAL (XEXP (varop_inner, 1));
+	      rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
+							new_count);
+	      varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
+					      XEXP (varop_inner, 0),
+					      new_count_rtx);
 	      varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
 	      count = 0;
 	      continue;
@@ -11209,7 +11211,8 @@ simplify_shift_const_1 (enum rtx_code co
     x = NULL_RTX;
 
   if (x == NULL_RTX)
-    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
+    x = simplify_gen_binary (code, shift_mode, varop,
+			     gen_int_shift_amount (shift_mode, count));
 
   /* If we were doing an LSHIFTRT in a wider mode than it was originally,
      turn off all the bits that the shift would have turned off.  */
@@ -11271,7 +11274,8 @@ simplify_shift_const (rtx x, enum rtx_co
     return tem;
 
   if (!x)
-    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
+    x = simplify_gen_binary (code, GET_MODE (varop), varop,
+			     gen_int_shift_amount (GET_MODE (varop), count));
   if (GET_MODE (x) != result_mode)
     x = gen_lowpart (result_mode, x);
   return x;
@@ -11462,8 +11466,9 @@ change_zero_ext (rtx pat)
 	  if (BITS_BIG_ENDIAN)
 	    start = GET_MODE_PRECISION (inner_mode) - size - start;
 
-	  if (start)
-	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
+	  if (start != 0)
+	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
+				  gen_int_shift_amount (inner_mode, start));
 	  else
 	    x = XEXP (x, 0);
 	  if (mode != inner_mode)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-12-15 15:17               ` Richard Sandiford
@ 2017-12-19 19:13                 ` Richard Sandiford
  2017-12-20  0:27                   ` Jeff Law
  0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-12-19 19:13 UTC (permalink / raw)
  To: Richard Biener; +Cc: Jeff Law, GCC Patches

Richard Sandiford <richard.sandiford@linaro.org> writes:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Fri, Dec 15, 2017 at 1:48 AM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
>>>> <richard.sandiford@linaro.org> wrote:
>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>>> This patch adds a stub helper routine to provide the mode
>>>>>>>> of a scalar shift amount, given the mode of the values
>>>>>>>> being shifted.
>>>>>>>>
>>>>>>>> One long-standing problem has been to decide what this mode
>>>>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>>>>> the corresponding target pattern says?  (In which case what
>>>>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>>>>
>>>>>>>> For now the patch picks word_mode, which should be safe on
>>>>>>>> all targets but could perhaps become suboptimal if the helper
>>>>>>>> routine is used more often than it is in this patch.  As it
>>>>>>>> stands the patch does not change the generated code.
>>>>>>>>
>>>>>>>> The patch also adds a helper function that constructs rtxes
>>>>>>>> for constant shift amounts, again given the mode of the value
>>>>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>>>>
>>>>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>>>>> looks good to me I'd rather have that ??? in this function (and
>>>>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>>>
>>>>> OK.  I'd gone for word_mode because that's what expand_binop uses
>>>>> for CONST_INTs:
>>>>>
>>>>>       op1_mode = (GET_MODE (op1) != VOIDmode
>>>>>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>>>>>                   : word_mode);
>>>>>
>>>>> But using the inner mode should be fine too.  The patch below does that.
>>>>>
>>>>>>> In the end it's up to insn recognizing to convert the op to the
>>>>>>> expected mode and for generic RTL it's us that should decide
>>>>>>> on the mode -- on GENERIC the shift amount has to be an
>>>>>>> integer so why not simply use a mode that is large enough to
>>>>>>> make the constant fit?
>>>>>
>>>>> ...but I can do that instead if you think it's better.
>>>>>
>>>>>>> Just throwing in some comments here, RTL isn't my primary
>>>>>>> expertise.
>>>>>>
>>>>>> To add a little bit - shift amounts is maybe the only(?) place
>>>>>> where a modeless CONST_INT makes sense!  So "fixing"
>>>>>> that first sounds backwards.
>>>>>
>>>>> But even here they have a mode conceptually, since out-of-range shift
>>>>> amounts are target-defined rather than undefined.  E.g. if the target
>>>>> interprets the shift amount as unsigned, then for a shift amount
>>>>> (const_int -1) it matters whether the mode is QImode (and so we're
>>>>> shifting by 255) or HImode (and so we're shifting by 65535.
>>>>
>>>> I think RTL is well-defined (at least I hope so ...) and machine constraints
>>>> need to be modeled explicitely (like embedding an implicit bit_and in
>>>> shift patterns).
>>>
>>> Well, RTL is well-defined in the sense that if you have
>>>
>>>   (ashift X (foo:HI ...))
>>>
>>> then the shift amount must be interpreted as HImode rather than some
>>> other mode.  The problem here is to define a default choice of mode for
>>> const_ints, in cases where the shift is being created out of the blue.
>>>
>>> Whether the shift amount is effectively signed or unsigned isn't defined
>>> by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
>>> out-of-range values, and the behaviour for out-of-range RTL shifts is
>>> specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.
>>>
>>> I think the revised patch does implement your suggestion of using the
>>> integer equivalent of the inner mode as the default, but we need to
>>> decide whether to go with it, go with the original word_mode approach
>>> (taken from existing expand_binop code) or something else.  Something
>>> else could include the widest supported integer mode, so that we never
>>> change the value.
>>
>> I guess it's pretty arbitrary what we choose (but we might need to adjust
>> targets?).  For something like this an appealing choice would be sth
>> that is host and target idependent, like [u]int32_t or given CONST_INT
>> is always 64bits now and signed int64_t aka HOST_WIDE_INT (bad
>> name now).  That means it's the "infinite precision" thing that fits
>> into CONST_INT ;)
>
> Sounds OK to me.  How about the attached?

Taking MAX_FIXED_MODE_SIZE into account was bogus, since we'd then just
fail to find a mode.  This version has survived the full cross-target
testing.  Also bootstrapped & regression-tested on aarch64-linux-gnu,
x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?

At this stage this is the patch that is holding up the rest of the
approved ones.

Thanks,
Richard


2017-12-19  Richard Sandiford  <richard.sandiford@linaro.org>
	    Alan Hayward  <alan.hayward@arm.com>
	    David Sherwood  <david.sherwood@arm.com>

gcc/
	* emit-rtl.h (gen_int_shift_amount): Declare.
	* emit-rtl.c (gen_int_shift_amount): New function.
	* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
	instead of GEN_INT.
	* calls.c (shift_return_value): Likewise.
	* cse.c (fold_rtx): Likewise.
	* dse.c (find_shift_sequence): Likewise.
	* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
	(expand_shift, expand_smod_pow2): Likewise.
	* lower-subreg.c (shift_cost): Likewise.
	* optabs.c (expand_superword_shift, expand_doubleword_mult)
	(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
	(expand_vec_perm_var): Likewise.
	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
	(simplify_binary_operation_1): Likewise.
	* combine.c (try_combine, find_split_point, force_int_to_mode)
	(simplify_shift_const_1, simplify_shift_const): Likewise.
	(change_zero_ext): Likewise.  Use simplify_gen_binary.

Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h	2017-12-16 14:23:26.068200011 +0000
+++ gcc/emit-rtl.h	2017-12-19 19:09:23.877365740 +0000
@@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
 extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
 extern void adjust_reg_mode (rtx, machine_mode);
 extern int mem_expr_equal_p (const_tree, const_tree);
+extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
 
 extern bool need_atomic_barrier_p (enum memmodel, bool);
 
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c	2017-12-16 14:23:26.068200011 +0000
+++ gcc/emit-rtl.c	2017-12-19 19:09:23.877365740 +0000
@@ -6418,6 +6418,22 @@ need_atomic_barrier_p (enum memmodel mod
     }
 }
 
+/* Return a constant shift amount for shifting a value of mode MODE
+   by VALUE bits.  */
+
+rtx
+gen_int_shift_amount (machine_mode, HOST_WIDE_INT value)
+{
+  /* Use a 64-bit mode, to avoid any truncation.
+
+     ??? Perhaps this should be automatically derived from the .md files
+     instead, or perhaps have a target hook.  */
+  scalar_int_mode shift_mode = (BITS_PER_UNIT == 8
+				? DImode
+				: int_mode_for_size (64, 0).require ());
+  return gen_int_mode (value, shift_mode);
+}
+
 /* Initialize fields of rtl_data related to stack alignment.  */
 
 void
Index: gcc/asan.c
===================================================================
--- gcc/asan.c	2017-12-16 14:23:26.065200853 +0000
+++ gcc/asan.c	2017-12-19 19:09:23.867366133 +0000
@@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
   TREE_ASM_WRITTEN (id) = 1;
   emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
   shadow_base = expand_binop (Pmode, lshr_optab, base,
-			      GEN_INT (ASAN_SHADOW_SHIFT),
+			      gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
 			      NULL_RTX, 1, OPTAB_DIRECT);
   shadow_base
     = plus_constant (Pmode, shadow_base,
Index: gcc/calls.c
===================================================================
--- gcc/calls.c	2017-12-16 14:23:26.066200572 +0000
+++ gcc/calls.c	2017-12-19 19:09:23.868366094 +0000
@@ -2900,15 +2900,17 @@ shift_return_value (machine_mode mode, b
   HOST_WIDE_INT shift;
 
   gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
-  shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
+  machine_mode value_mode = GET_MODE (value);
+  shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
   if (shift == 0)
     return false;
 
   /* Use ashr rather than lshr for right shifts.  This is for the benefit
      of the MIPS port, which requires SImode values to be sign-extended
      when stored in 64-bit registers.  */
-  if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
-			   value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
+  if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
+			   value, gen_int_shift_amount (value_mode, shift),
+			   value, 1, OPTAB_WIDEN))
     gcc_unreachable ();
   return true;
 }
Index: gcc/cse.c
===================================================================
--- gcc/cse.c	2017-12-16 14:23:26.067200292 +0000
+++ gcc/cse.c	2017-12-19 19:09:23.874365858 +0000
@@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (const_arg1) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
-						& (GET_MODE_UNIT_BITSIZE (mode)
-						   - 1));
+		    canon_const_arg1 = gen_int_shift_amount
+		      (mode, (INTVAL (const_arg1)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
 		      || INTVAL (inner_const) < 0))
 		{
 		  if (SHIFT_COUNT_TRUNCATED)
-		    inner_const = GEN_INT (INTVAL (inner_const)
-					   & (GET_MODE_UNIT_BITSIZE (mode)
-					      - 1));
+		    inner_const = gen_int_shift_amount
+		      (mode, (INTVAL (inner_const)
+			      & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
 		  else
 		    break;
 		}
@@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
 		  /* As an exception, we can turn an ASHIFTRT of this
 		     form into a shift of the number of bits - 1.  */
 		  if (code == ASHIFTRT)
-		    new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
+		    new_const = gen_int_shift_amount
+		      (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
 		  else if (!side_effects_p (XEXP (y, 0)))
 		    return CONST0_RTX (mode);
 		  else
Index: gcc/dse.c
===================================================================
--- gcc/dse.c	2017-12-16 14:23:26.068200011 +0000
+++ gcc/dse.c	2017-12-19 19:09:23.875365819 +0000
@@ -1642,8 +1642,9 @@ find_shift_sequence (int access_size,
 				     store_mode, byte);
 	  if (ret && CONSTANT_P (ret))
 	    {
+	      rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
 	      ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
-						     ret, GEN_INT (shift));
+						     ret, shift_rtx);
 	      if (ret && CONSTANT_P (ret))
 		{
 		  byte = subreg_lowpart_offset (read_mode, new_mode);
@@ -1679,7 +1680,8 @@ find_shift_sequence (int access_size,
 	 of one dsp where the cost of these two was not the same.  But
 	 this really is a rare case anyway.  */
       target = expand_binop (new_mode, lshr_optab, new_reg,
-			     GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+			     gen_int_shift_amount (new_mode, shift),
+			     new_reg, 1, OPTAB_DIRECT);
 
       shift_seq = get_insns ();
       end_sequence ();
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c	2017-12-16 14:23:26.069199731 +0000
+++ gcc/expmed.c	2017-12-19 19:09:23.879365662 +0000
@@ -223,7 +223,8 @@ init_expmed_one_mode (struct init_expmed
 	  PUT_MODE (all->zext, wider_mode);
 	  PUT_MODE (all->wide_mult, wider_mode);
 	  PUT_MODE (all->wide_lshr, wider_mode);
-	  XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
+	  XEXP (all->wide_lshr, 1)
+	    = gen_int_shift_amount (wider_mode, mode_bitsize);
 
 	  set_mul_widen_cost (speed, wider_mode,
 			      set_src_cost (all->wide_mult, wider_mode, speed));
@@ -910,12 +911,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
 	     to make sure that for big-endian machines the higher order
 	     bits are used.  */
 	  if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
-	    value_word = simplify_expand_binop (word_mode, lshr_optab,
-						value_word,
-						GEN_INT (BITS_PER_WORD
-							 - new_bitsize),
-						NULL_RTX, true,
-						OPTAB_LIB_WIDEN);
+	    {
+	      int shift = BITS_PER_WORD - new_bitsize;
+	      rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
+	      value_word = simplify_expand_binop (word_mode, lshr_optab,
+						  value_word, shift_rtx,
+						  NULL_RTX, true,
+						  OPTAB_LIB_WIDEN);
+	    }
 
 	  if (!store_bit_field_1 (op0, new_bitsize,
 				  bitnum + bit_offset,
@@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
       if (CONST_INT_P (op1)
 	  && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
 	      (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
-	op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
-		       % GET_MODE_BITSIZE (scalar_mode));
+	op1 = gen_int_shift_amount (mode,
+				    (unsigned HOST_WIDE_INT) INTVAL (op1)
+				    % GET_MODE_BITSIZE (scalar_mode));
       else if (GET_CODE (op1) == SUBREG
 	       && subreg_lowpart_p (op1)
 	       && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
@@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
       && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
 		   GET_MODE_BITSIZE (scalar_mode) - 1))
     {
-      op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
+      op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
+					 - INTVAL (op1)));
       left = !left;
       code = left ? LROTATE_EXPR : RROTATE_EXPR;
     }
@@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
 	      if (op1 == const0_rtx)
 		return shifted;
 	      else if (CONST_INT_P (op1))
-		other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
-					- INTVAL (op1));
+		other_amount = gen_int_shift_amount
+		  (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
 	      else
 		{
 		  other_amount
@@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
 expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
 	      int amount, rtx target, int unsignedp)
 {
-  return expand_shift_1 (code, mode,
-			 shifted, GEN_INT (amount), target, unsignedp);
+  return expand_shift_1 (code, mode, shifted,
+			 gen_int_shift_amount (mode, amount),
+			 target, unsignedp);
 }
 
 /* Likewise, but return 0 if that cannot be done.  */
@@ -3857,7 +3863,7 @@ expand_smod_pow2 (scalar_int_mode mode,
 	{
 	  HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
 	  signmask = force_reg (mode, signmask);
-	  shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
+	  shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
 
 	  /* Use the rtx_cost of a LSHIFTRT instruction to determine
 	     which instruction sequence to use.  If logical right shifts
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c	2017-12-16 14:23:26.069199731 +0000
+++ gcc/lower-subreg.c	2017-12-19 19:09:23.879365662 +0000
@@ -141,7 +141,7 @@ shift_cost (bool speed_p, struct cost_rt
   PUT_CODE (rtxes->shift, code);
   PUT_MODE (rtxes->shift, mode);
   PUT_MODE (rtxes->source, mode);
-  XEXP (rtxes->shift, 1) = GEN_INT (op1);
+  XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
   return set_src_cost (rtxes->shift, mode, speed_p);
 }
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-16 14:23:26.070199450 +0000
+++ gcc/optabs.c	2017-12-19 19:09:23.882365544 +0000
@@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
       if (binoptab != ashr_optab)
 	emit_move_insn (outof_target, CONST0_RTX (word_mode));
       else
-	if (!force_expand_binop (word_mode, binoptab,
-				 outof_input, GEN_INT (BITS_PER_WORD - 1),
+	if (!force_expand_binop (word_mode, binoptab, outof_input,
+				 gen_int_shift_amount (word_mode,
+						       BITS_PER_WORD - 1),
 				 outof_target, unsignedp, methods))
 	  return false;
     }
@@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
 {
   int low = (WORDS_BIG_ENDIAN ? 1 : 0);
   int high = (WORDS_BIG_ENDIAN ? 0 : 1);
-  rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
+  rtx wordm1 = (umulp ? NULL_RTX
+		: gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
   rtx product, adjust, product_high, temp;
 
   rtx op0_high = operand_subword_force (op0, high, mode);
@@ -1190,7 +1192,7 @@ expand_binop (machine_mode mode, optab b
       unsigned int bits = GET_MODE_PRECISION (int_mode);
 
       if (CONST_INT_P (op1))
-        newop1 = GEN_INT (bits - INTVAL (op1));
+	newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
       else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
         newop1 = negate_rtx (GET_MODE (op1), op1);
       else
@@ -1412,7 +1414,7 @@ expand_binop (machine_mode mode, optab b
 
       /* Apply the truncation to constant shifts.  */
       if (double_shift_mask > 0 && CONST_INT_P (op1))
-	op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
+	op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
 
       if (op1 == CONST0_RTX (op1_mode))
 	return op0;
@@ -1522,7 +1524,7 @@ expand_binop (machine_mode mode, optab b
       else
 	{
 	  rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
-	  rtx first_shift_count, second_shift_count;
+	  HOST_WIDE_INT first_shift_count, second_shift_count;
 	  optab reverse_unsigned_shift, unsigned_shift;
 
 	  reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
@@ -1533,20 +1535,24 @@ expand_binop (machine_mode mode, optab b
 
 	  if (shift_count > BITS_PER_WORD)
 	    {
-	      first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
-	      second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
+	      first_shift_count = shift_count - BITS_PER_WORD;
+	      second_shift_count = 2 * BITS_PER_WORD - shift_count;
 	    }
 	  else
 	    {
-	      first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
-	      second_shift_count = GEN_INT (shift_count);
+	      first_shift_count = BITS_PER_WORD - shift_count;
+	      second_shift_count = shift_count;
 	    }
+	  rtx first_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, first_shift_count);
+	  rtx second_shift_count_rtx
+	    = gen_int_shift_amount (word_mode, second_shift_count);
 
 	  into_temp1 = expand_binop (word_mode, unsigned_shift,
-				     outof_input, first_shift_count,
+				     outof_input, first_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 	  into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				     into_input, second_shift_count,
+				     into_input, second_shift_count_rtx,
 				     NULL_RTX, unsignedp, next_methods);
 
 	  if (into_temp1 != 0 && into_temp2 != 0)
@@ -1559,10 +1565,10 @@ expand_binop (machine_mode mode, optab b
 	    emit_move_insn (into_target, inter);
 
 	  outof_temp1 = expand_binop (word_mode, unsigned_shift,
-				      into_input, first_shift_count,
+				      into_input, first_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 	  outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
-				      outof_input, second_shift_count,
+				      outof_input, second_shift_count_rtx,
 				      NULL_RTX, unsignedp, next_methods);
 
 	  if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
@@ -2802,25 +2808,29 @@ expand_unop (machine_mode mode, optab un
 
 	  if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotl_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	     }
 
 	  if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
 	    {
-	      temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
-				   unsignedp, OPTAB_DIRECT);
+	      temp = expand_binop (mode, rotr_optab, op0,
+				   gen_int_shift_amount (mode, 8),
+				   target, unsignedp, OPTAB_DIRECT);
 	      if (temp)
 		return temp;
 	    }
 
 	  last = get_last_insn ();
 
-	  temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp1 = expand_binop (mode, ashl_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
-	  temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
+	  temp2 = expand_binop (mode, lshr_optab, op0,
+				gen_int_shift_amount (mode, 8), NULL_RTX,
 			        unsignedp, OPTAB_WIDEN);
 	  if (temp1 && temp2)
 	    {
@@ -5402,7 +5412,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
 	return NULL_RTX;
     }
 
-  return GEN_INT (first * bitsize);
+  return gen_int_shift_amount (GET_MODE (sel), first * bitsize);
 }
 
 /* A subroutine of expand_vec_perm for expanding one vec_perm insn.  */
@@ -5572,7 +5582,8 @@ expand_vec_perm (machine_mode mode, rtx
 				   NULL, 0, OPTAB_DIRECT);
       else
 	sel = expand_simple_binop (selmode, ASHIFT, sel,
-				   GEN_INT (exact_log2 (u)),
+				   gen_int_shift_amount (selmode,
+							 exact_log2 (u)),
 				   NULL, 0, OPTAB_DIRECT);
       gcc_assert (sel != NULL);
 
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c	2017-12-16 14:23:26.070199450 +0000
+++ gcc/simplify-rtx.c	2017-12-19 19:09:23.884365465 +0000
@@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  if (STORE_FLAG_VALUE == 1)
 	    {
 	      temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
 	  else if (STORE_FLAG_VALUE == -1)
 	    {
 	      temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
-					  GEN_INT (isize - 1));
+					  gen_int_shift_amount (inner,
+								isize - 1));
 	      if (int_mode == inner)
 		return temp;
 	      if (GET_MODE_PRECISION (int_mode) > isize)
@@ -2672,7 +2674,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
 	  if (val >= 0)
-	    return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (ASHIFT, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
 
       /* x*2 is x+x and x*(-1) is -x */
@@ -3296,7 +3299,8 @@ simplify_binary_operation_1 (enum rtx_co
       /* Convert divide by power of two into shift.  */
       if (CONST_INT_P (trueop1)
 	  && (val = exact_log2 (UINTVAL (trueop1))) > 0)
-	return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
+	return simplify_gen_binary (LSHIFTRT, mode, op0,
+				    gen_int_shift_amount (mode, val));
       break;
 
     case DIV:
@@ -3416,10 +3420,12 @@ simplify_binary_operation_1 (enum rtx_co
 	  && IN_RANGE (INTVAL (trueop1),
 		       GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
 		       GET_MODE_UNIT_PRECISION (mode) - 1))
-	return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
-				    mode, op0,
-				    GEN_INT (GET_MODE_UNIT_PRECISION (mode)
-					     - INTVAL (trueop1)));
+	{
+	  int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
+	  rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
+	  return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
+				      mode, op0, new_amount_rtx);
+	}
 #endif
       /* FALLTHRU */
     case ASHIFTRT:
@@ -3460,8 +3466,8 @@ simplify_binary_operation_1 (enum rtx_co
 	      == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
 	  && subreg_lowpart_p (op0))
 	{
-	  rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
-			     + INTVAL (op1));
+	  rtx tmp = gen_int_shift_amount
+	    (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
 	  tmp = simplify_gen_binary (code, inner_mode,
 				     XEXP (SUBREG_REG (op0), 0),
 				     tmp);
@@ -3472,7 +3478,8 @@ simplify_binary_operation_1 (enum rtx_co
 	{
 	  val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
 	  if (val != INTVAL (op1))
-	    return simplify_gen_binary (code, mode, op0, GEN_INT (val));
+	    return simplify_gen_binary (code, mode, op0,
+					gen_int_shift_amount (mode, val));
 	}
       break;
 
Index: gcc/combine.c
===================================================================
--- gcc/combine.c	2017-12-16 14:23:26.067200292 +0000
+++ gcc/combine.c	2017-12-19 19:09:23.873365897 +0000
@@ -3804,8 +3804,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && INTVAL (XEXP (*split, 1)) > 0
 	      && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
 	    {
+	      rtx i_rtx = gen_int_shift_amount (split_mode, i);
 	      SUBST (*split, gen_rtx_ASHIFT (split_mode,
-					     XEXP (*split, 0), GEN_INT (i)));
+					     XEXP (*split, 0), i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -3819,8 +3820,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
 	      && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
 	    {
 	      rtx nsplit = XEXP (*split, 0);
+	      rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
 	      SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
-					     XEXP (nsplit, 0), GEN_INT (i)));
+						       XEXP (nsplit, 0),
+						       i_rtx));
 	      /* Update split_code because we may not have a multiply
 		 anymore.  */
 	      split_code = GET_CODE (*split);
@@ -5088,12 +5091,12 @@ find_split_point (rtx *loc, rtx_insn *in
 				      GET_MODE (XEXP (SET_SRC (x), 0))))))
 	    {
 	      machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
-
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_NEG (mode,
 				  gen_rtx_LSHIFTRT (mode,
 						    XEXP (SET_SRC (x), 0),
-						    GEN_INT (pos))));
+						    pos_rtx)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -5151,11 +5154,11 @@ find_split_point (rtx *loc, rtx_insn *in
 	    {
 	      unsigned HOST_WIDE_INT mask
 		= (HOST_WIDE_INT_1U << len) - 1;
+	      rtx pos_rtx = gen_int_shift_amount (mode, pos);
 	      SUBST (SET_SRC (x),
 		     gen_rtx_AND (mode,
 				  gen_rtx_LSHIFTRT
-				  (mode, gen_lowpart (mode, inner),
-				   GEN_INT (pos)),
+				  (mode, gen_lowpart (mode, inner), pos_rtx),
 				  gen_int_mode (mask, mode)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
@@ -5164,14 +5167,15 @@ find_split_point (rtx *loc, rtx_insn *in
 	    }
 	  else
 	    {
+	      int left_bits = GET_MODE_PRECISION (mode) - len - pos;
+	      int right_bits = GET_MODE_PRECISION (mode) - len;
 	      SUBST (SET_SRC (x),
 		     gen_rtx_fmt_ee
 		     (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
 		      gen_rtx_ASHIFT (mode,
 				      gen_lowpart (mode, inner),
-				      GEN_INT (GET_MODE_PRECISION (mode)
-					       - len - pos)),
-		      GEN_INT (GET_MODE_PRECISION (mode) - len)));
+				      gen_int_shift_amount (mode, left_bits)),
+		      gen_int_shift_amount (mode, right_bits)));
 
 	      split = find_split_point (&SET_SRC (x), insn, true);
 	      if (split && split != &SET_SRC (x))
@@ -8952,10 +8956,11 @@ force_int_to_mode (rtx x, scalar_int_mod
 	  /* Must be more sign bit copies than the mask needs.  */
 	  && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
 	      >= exact_log2 (mask + 1)))
-	x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
-				 GEN_INT (GET_MODE_PRECISION (xmode)
-					  - exact_log2 (mask + 1)));
-
+	{
+	  int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
+	  x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
+				   gen_int_shift_amount (xmode, nbits));
+	}
       goto shiftrt;
 
     case ASHIFTRT:
@@ -10448,7 +10453,7 @@ simplify_shift_const_1 (enum rtx_code co
 {
   enum rtx_code orig_code = code;
   rtx orig_varop = varop;
-  int count;
+  int count, log2;
   machine_mode mode = result_mode;
   machine_mode shift_mode;
   scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
@@ -10651,13 +10656,11 @@ simplify_shift_const_1 (enum rtx_code co
 	     is cheaper.  But it is still better on those machines to
 	     merge two shifts into one.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (ASHIFT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10665,13 +10668,11 @@ simplify_shift_const_1 (enum rtx_code co
 	case UDIV:
 	  /* Similar, for when divides are cheaper.  */
 	  if (CONST_INT_P (XEXP (varop, 1))
-	      && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+	      && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
 	    {
-	      varop
-		= simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
-				       XEXP (varop, 0),
-				       GEN_INT (exact_log2 (
-						UINTVAL (XEXP (varop, 1)))));
+	      rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+	      varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
+					   XEXP (varop, 0), log2_rtx);
 	      continue;
 	    }
 	  break;
@@ -10806,10 +10807,10 @@ simplify_shift_const_1 (enum rtx_code co
 
 	      mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
 				       int_result_mode);
-
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      mask_rtx
 		= simplify_const_binary_operation (code, int_result_mode,
-						   mask_rtx, GEN_INT (count));
+						   mask_rtx, count_rtx);
 
 	      /* Give up if we can't compute an outer operation to use.  */
 	      if (mask_rtx == 0
@@ -10865,9 +10866,10 @@ simplify_shift_const_1 (enum rtx_code co
 	      if (code == ASHIFTRT && int_mode != int_result_mode)
 		break;
 
+	      rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
 	      rtx new_rtx = simplify_const_binary_operation (code, int_mode,
 							     XEXP (varop, 0),
-							     GEN_INT (count));
+							     count_rtx);
 	      varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
 	      count = 0;
 	      continue;
@@ -10933,7 +10935,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
 				  INTVAL (new_rtx), int_result_mode,
@@ -11076,7 +11078,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (ASHIFT, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, PLUS,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11097,7 +11099,7 @@ simplify_shift_const_1 (enum rtx_code co
 	      && (new_rtx = simplify_const_binary_operation
 		  (code, int_result_mode,
 		   gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
-		   GEN_INT (count))) != 0
+		   gen_int_shift_amount (int_result_mode, count))) != 0
 	      && CONST_INT_P (new_rtx)
 	      && merge_outer_ops (&outer_op, &outer_const, XOR,
 				  INTVAL (new_rtx), int_result_mode,
@@ -11152,12 +11154,12 @@ simplify_shift_const_1 (enum rtx_code co
 		      - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
 	    {
 	      rtx varop_inner = XEXP (varop, 0);
-
-	      varop_inner
-		= gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
-				    XEXP (varop_inner, 0),
-				    GEN_INT
-				    (count + INTVAL (XEXP (varop_inner, 1))));
+	      int new_count = count + INTVAL (XEXP (varop_inner, 1));
+	      rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
+							new_count);
+	      varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
+					      XEXP (varop_inner, 0),
+					      new_count_rtx);
 	      varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
 	      count = 0;
 	      continue;
@@ -11209,7 +11211,8 @@ simplify_shift_const_1 (enum rtx_code co
     x = NULL_RTX;
 
   if (x == NULL_RTX)
-    x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
+    x = simplify_gen_binary (code, shift_mode, varop,
+			     gen_int_shift_amount (shift_mode, count));
 
   /* If we were doing an LSHIFTRT in a wider mode than it was originally,
      turn off all the bits that the shift would have turned off.  */
@@ -11271,7 +11274,8 @@ simplify_shift_const (rtx x, enum rtx_co
     return tem;
 
   if (!x)
-    x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
+    x = simplify_gen_binary (code, GET_MODE (varop), varop,
+			     gen_int_shift_amount (GET_MODE (varop), count));
   if (GET_MODE (x) != result_mode)
     x = gen_lowpart (result_mode, x);
   return x;
@@ -11462,8 +11466,9 @@ change_zero_ext (rtx pat)
 	  if (BITS_BIG_ENDIAN)
 	    start = GET_MODE_PRECISION (inner_mode) - size - start;
 
-	  if (start)
-	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
+	  if (start != 0)
+	    x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
+				  gen_int_shift_amount (inner_mode, start));
 	  else
 	    x = XEXP (x, 0);
 	  if (mode != inner_mode)

^ permalink raw reply	[flat|nested] 90+ messages in thread

* Re: [14/nn] Add helpers for shift count modes
  2017-12-19 19:13                 ` Richard Sandiford
@ 2017-12-20  0:27                   ` Jeff Law
  0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-12-20  0:27 UTC (permalink / raw)
  To: Richard Biener, GCC Patches, richard.sandiford

On 12/19/2017 12:13 PM, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Fri, Dec 15, 2017 at 1:48 AM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Mon, Nov 20, 2017 at 10:02 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>>>> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
>>>>>>> <richard.guenther@gmail.com> wrote:
>>>>>>>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>>>>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>>>>> This patch adds a stub helper routine to provide the mode
>>>>>>>>> of a scalar shift amount, given the mode of the values
>>>>>>>>> being shifted.
>>>>>>>>>
>>>>>>>>> One long-standing problem has been to decide what this mode
>>>>>>>>> should be for arbitrary rtxes (as opposed to those directly
>>>>>>>>> tied to a target pattern).  Is it the mode of the shifted
>>>>>>>>> elements?  Is it word_mode?  Or maybe QImode?  Is it whatever
>>>>>>>>> the corresponding target pattern says?  (In which case what
>>>>>>>>> should the mode be when the target doesn't have a pattern?)
>>>>>>>>>
>>>>>>>>> For now the patch picks word_mode, which should be safe on
>>>>>>>>> all targets but could perhaps become suboptimal if the helper
>>>>>>>>> routine is used more often than it is in this patch.  As it
>>>>>>>>> stands the patch does not change the generated code.
>>>>>>>>>
>>>>>>>>> The patch also adds a helper function that constructs rtxes
>>>>>>>>> for constant shift amounts, again given the mode of the value
>>>>>>>>> being shifted.  As well as helping with the SVE patches, this
>>>>>>>>> is one step towards allowing CONST_INTs to have a real mode.
>>>>>>>>
>>>>>>>> I think gen_shift_amount_mode is flawed and while encapsulating
>>>>>>>> constant shift amount RTX generation into a gen_int_shift_amount
>>>>>>>> looks good to me I'd rather have that ??? in this function (and
>>>>>>>> I'd use the mode of the RTX shifted, not word_mode...).
>>>>>>
>>>>>> OK.  I'd gone for word_mode because that's what expand_binop uses
>>>>>> for CONST_INTs:
>>>>>>
>>>>>>       op1_mode = (GET_MODE (op1) != VOIDmode
>>>>>>                   ? as_a <scalar_int_mode> (GET_MODE (op1))
>>>>>>                   : word_mode);
>>>>>>
>>>>>> But using the inner mode should be fine too.  The patch below does that.
>>>>>>
>>>>>>>> In the end it's up to insn recognizing to convert the op to the
>>>>>>>> expected mode and for generic RTL it's us that should decide
>>>>>>>> on the mode -- on GENERIC the shift amount has to be an
>>>>>>>> integer so why not simply use a mode that is large enough to
>>>>>>>> make the constant fit?
>>>>>>
>>>>>> ...but I can do that instead if you think it's better.
>>>>>>
>>>>>>>> Just throwing in some comments here, RTL isn't my primary
>>>>>>>> expertise.
>>>>>>>
>>>>>>> To add a little bit - shift amounts is maybe the only(?) place
>>>>>>> where a modeless CONST_INT makes sense!  So "fixing"
>>>>>>> that first sounds backwards.
>>>>>>
>>>>>> But even here they have a mode conceptually, since out-of-range shift
>>>>>> amounts are target-defined rather than undefined.  E.g. if the target
>>>>>> interprets the shift amount as unsigned, then for a shift amount
>>>>>> (const_int -1) it matters whether the mode is QImode (and so we're
>>>>>> shifting by 255) or HImode (and so we're shifting by 65535.
>>>>>
>>>>> I think RTL is well-defined (at least I hope so ...) and machine constraints
>>>>> need to be modeled explicitely (like embedding an implicit bit_and in
>>>>> shift patterns).
>>>>
>>>> Well, RTL is well-defined in the sense that if you have
>>>>
>>>>   (ashift X (foo:HI ...))
>>>>
>>>> then the shift amount must be interpreted as HImode rather than some
>>>> other mode.  The problem here is to define a default choice of mode for
>>>> const_ints, in cases where the shift is being created out of the blue.
>>>>
>>>> Whether the shift amount is effectively signed or unsigned isn't defined
>>>> by RTL without SHIFT_COUNT_TRUNCATED, since the choice only matters for
>>>> out-of-range values, and the behaviour for out-of-range RTL shifts is
>>>> specifically treated as target-defined without SHIFT_COUNT_TRUNCATED.
>>>>
>>>> I think the revised patch does implement your suggestion of using the
>>>> integer equivalent of the inner mode as the default, but we need to
>>>> decide whether to go with it, go with the original word_mode approach
>>>> (taken from existing expand_binop code) or something else.  Something
>>>> else could include the widest supported integer mode, so that we never
>>>> change the value.
>>>
>>> I guess it's pretty arbitrary what we choose (but we might need to adjust
>>> targets?).  For something like this an appealing choice would be sth
>>> that is host and target idependent, like [u]int32_t or given CONST_INT
>>> is always 64bits now and signed int64_t aka HOST_WIDE_INT (bad
>>> name now).  That means it's the "infinite precision" thing that fits
>>> into CONST_INT ;)
>>
>> Sounds OK to me.  How about the attached?
> 
> Taking MAX_FIXED_MODE_SIZE into account was bogus, since we'd then just
> fail to find a mode.  This version has survived the full cross-target
> testing.  Also bootstrapped & regression-tested on aarch64-linux-gnu,
> x86_64-linux-gnu and powerpc64-linux-gnu.  OK to install?
> 
> At this stage this is the patch that is holding up the rest of the
> approved ones.
> 
> Thanks,
> Richard
> 
> 
> 2017-12-19  Richard Sandiford  <richard.sandiford@linaro.org>
> 	    Alan Hayward  <alan.hayward@arm.com>
> 	    David Sherwood  <david.sherwood@arm.com>
> 
> gcc/
> 	* emit-rtl.h (gen_int_shift_amount): Declare.
> 	* emit-rtl.c (gen_int_shift_amount): New function.
> 	* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
> 	instead of GEN_INT.
> 	* calls.c (shift_return_value): Likewise.
> 	* cse.c (fold_rtx): Likewise.
> 	* dse.c (find_shift_sequence): Likewise.
> 	* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
> 	(expand_shift, expand_smod_pow2): Likewise.
> 	* lower-subreg.c (shift_cost): Likewise.
> 	* optabs.c (expand_superword_shift, expand_doubleword_mult)
> 	(expand_unop, expand_binop, shift_amt_for_vec_perm_mask)
> 	(expand_vec_perm_var): Likewise.
> 	* simplify-rtx.c (simplify_unary_operation_1): Likewise.
> 	(simplify_binary_operation_1): Likewise.
> 	* combine.c (try_combine, find_split_point, force_int_to_mode)
> 	(simplify_shift_const_1, simplify_shift_const): Likewise.
> 	(change_zero_ext): Likewise.  Use simplify_gen_binary.
> 
OK.
jeff

^ permalink raw reply	[flat|nested] 90+ messages in thread

end of thread, other threads:[~2017-12-20  0:27 UTC | newest]

Thread overview: 90+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
2017-10-25 16:29   ` Jeff Law
2017-10-27 16:12     ` Richard Sandiford
2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
2017-10-25 16:59   ` Jeff Law
2017-10-27 16:19     ` Richard Sandiford
2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
2017-10-25 16:35   ` Jeff Law
2017-11-10  9:42     ` Christophe Lyon
2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
2017-10-26 11:49   ` Richard Biener
2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
2017-10-26 11:53   ` Richard Biener
2017-11-06 15:09     ` Richard Sandiford
2017-11-07 10:37       ` Richard Biener
2017-12-15  0:29   ` Richard Sandiford
2017-12-15  8:58     ` Richard Biener
2017-12-15 12:52       ` Richard Sandiford
2017-12-15 13:20         ` Richard Biener
2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
2017-10-27 15:51   ` Jeff Law
2017-10-27 15:58     ` Richard Sandiford
2017-10-30 14:49       ` Jeff Law
2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab Richard Sandiford
2017-10-26 12:26   ` Richard Biener
2017-10-26 12:43     ` Richard Biener
2017-11-06 15:21       ` Richard Sandiford
2017-11-07 10:38         ` Richard Biener
2017-12-15  0:34   ` Richard Sandiford
2017-12-15  9:03     ` Richard Biener
2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
2017-10-26 11:57   ` Richard Biener
2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
2017-10-26 11:59   ` Richard Biener
2017-10-26 12:18     ` Richard Sandiford
2017-10-26 12:46       ` Richard Biener
2017-10-26 19:42         ` Eric Botcazou
2017-10-27  8:34           ` Richard Biener
2017-10-27  9:28             ` Eric Botcazou
2017-10-30  3:14           ` Trevor Saunders
2017-10-30  8:52             ` Richard Sandiford
2017-10-30 10:13             ` Eric Botcazou
2017-10-31 10:39               ` Trevor Saunders
2017-10-31 17:29                 ` Eric Botcazou
2017-10-31 17:57                   ` Jeff Law
2017-11-01  2:50                     ` Trevor Saunders
2017-11-01 16:30                       ` Jeff Law
2017-11-02  4:28                         ` Trevor Saunders
2017-10-26 19:44         ` Richard Sandiford
2017-10-26 19:45         ` Jakub Jelinek
2017-10-27  8:43           ` Richard Biener
2017-10-27  8:45             ` Jakub Jelinek
2017-10-27 10:19             ` Pedro Alves
2017-10-27 15:23             ` Jeff Law
2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
2017-10-30 18:32   ` Jeff Law
2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
2017-10-30 15:06   ` Jeff Law
2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
2017-10-26 12:03   ` Richard Biener
2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
2017-10-26 11:59   ` Richard Biener
2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
2017-10-26 12:07   ` Richard Biener
2017-10-26 12:07     ` Richard Biener
2017-11-20 21:04       ` Richard Sandiford
2017-11-21 15:00         ` Richard Biener
2017-12-15  0:48           ` Richard Sandiford
2017-12-15  9:06             ` Richard Biener
2017-12-15 15:17               ` Richard Sandiford
2017-12-19 19:13                 ` Richard Sandiford
2017-12-20  0:27                   ` Jeff Law
2017-10-30 15:03     ` Jeff Law
2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
2017-10-26 12:09   ` Richard Biener
2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
2017-10-26 12:08   ` Richard Biener
2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
2017-10-26 12:10   ` Richard Biener
2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
2017-10-26 12:14   ` Richard Biener
2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
2017-10-26 12:13   ` Richard Biener
2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
2017-10-30 17:49   ` Jeff Law
2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
2017-10-26 12:18   ` Richard Biener
2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
2017-10-26 12:18   ` Richard Biener

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).