* [00/nn] Patches preparing for runtime offsets and sizes
@ 2017-10-23 11:16 Richard Sandiford
2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
` (21 more replies)
0 siblings, 22 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:16 UTC (permalink / raw)
To: gcc-patches
This series of patches adds or does things are needed for SVE
runtime offsets and sizes, but aren't directly related to offsets
and sizes themselves. It's a prerequisite to the main series that
I'll post later today.
Tested by compiling the testsuite before and after the series on:
aarch64-linux-gnu aarch64_be-linux-gnu alpha-linux-gnu arc-elf
arm-linux-gnueabi arm-linux-gnueabihf avr-elf bfin-elf c6x-elf
cr16-elf cris-elf epiphany-elf fr30-elf frv-linux-gnu ft32-elf
h8300-elf hppa64-hp-hpux11.23 ia64-linux-gnu i686-pc-linux-gnu
i686-apple-darwin iq2000-elf lm32-elf m32c-elf m32r-elf
m68k-linux-gnu mcore-elf microblaze-elf mipsel-linux-gnu
mipsisa64-linux-gnu mmix mn10300-elf moxie-rtems msp430-elf
nds32le-elf nios2-linux-gnu nvptx-none pdp11 powerpc-linux-gnuspe
powerpc-eabispe powerpc64-linux-gnu powerpc64le-linux-gnu
powerpc-ibm-aix7.0 riscv32-elf riscv64-elf rl78-elf rx-elf
s390-linux-gnu s390x-linux-gnu sh-linux-gnu sparc-linux-gnu
sparc64-linux-gnu sparc-wrs-vxworks spu-elf tilegx-elf tilepro-elf
xstormy16-elf v850-elf vax-netbsdelf visium-elf x86_64-darwin
x86_64-linux-gnu xtensa-elf
There were no differences besides the ones described in the
covering notes (except on powerpc-ibm-aix7.0, where symbol names
aren't stable).
Also tested normally on aarch64-linux-gnu, x86_64-linux-gnu and
powerpc64le-linux-gnu.
Thanks,
Richard
^ permalink raw reply [flat|nested] 90+ messages in thread
* [01/nn] Add gen_(const_)vec_duplicate helpers
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
@ 2017-10-23 11:17 ` Richard Sandiford
2017-10-25 16:29 ` Jeff Law
2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
` (20 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:17 UTC (permalink / raw)
To: gcc-patches
This patch adds helper functions for generating constant and
non-constant vector duplicates. These routines help with SVE because
it is then easier to use:
(const:M (vec_duplicate:M X))
for a broadcast of X, even if the number of elements in M isn't known
at compile time. It also makes it easier for general rtx code to treat
constant and non-constant duplicates in the same way.
In the target code, the patch uses gen_vec_duplicate instead of
gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
useful. It might be that some or all of the call sites only handle
non-constants in practice, in which case the change is a harmless
no-op (and a saving of a few characters).
Otherwise, the target changes use gen_const_vec_duplicate instead
of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
They also include some changes to use CONSTxx_RTX for easy global
constants.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_const_vec_duplicate): Declare.
(gen_vec_duplicate): Likewise.
* emit-rtl.c (gen_const_vec_duplicate_1): New function, split
out from...
(gen_const_vector): ...here.
(gen_const_vec_duplicate, gen_vec_duplicate): New functions.
(gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
whose elements are all equal.
* optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
* simplify-rtx.c (simplify_const_unary_operation): Likewise.
(simplify_relational_operation): Likewise.
* config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
Likewise.
(aarch64_simd_dup_constant): Use gen_vec_duplicate.
(aarch64_expand_vector_init): Likewise.
* config/arm/arm.c (neon_vdup_constant): Likewise.
(neon_expand_vector_init): Likewise.
(arm_expand_vec_perm): Use gen_const_vec_duplicate.
(arm_block_set_unaligned_vect): Likewise.
(arm_block_set_aligned_vect): Likewise.
* config/arm/neon.md (neon_copysignf<mode>): Likewise.
* config/i386/i386.c (ix86_expand_vec_perm): Likewise.
(expand_vec_perm_even_odd_pack): Likewise.
(ix86_vector_duplicate_value): Use gen_vec_duplicate.
* config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
* config/ia64/ia64.c (ia64_expand_vecint_compare): Use
gen_const_vec_duplicate.
* config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
* config/mips/mips.c (mips_gen_const_int_vector): Use
gen_const_vec_duplicate.
(mips_expand_vector_init): Use CONST0_RTX.
* config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
(define_split): Use gen_const_vec_duplicate.
* config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
(define_split): Use gen_const_vec_duplicate.
* config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
(vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
* config/spu/spu.c (spu_const): Likewise.
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h 2017-10-23 11:40:11.561479591 +0100
+++ gcc/emit-rtl.h 2017-10-23 11:41:32.369050264 +0100
@@ -438,6 +438,9 @@ get_max_uid (void)
return crtl->emit.x_cur_insn_uid;
}
+extern rtx gen_const_vec_duplicate (machine_mode, rtx);
+extern rtx gen_vec_duplicate (machine_mode, rtx);
+
extern void set_decl_incoming_rtl (tree, rtx, bool);
/* Return a memory reference like MEMREF, but with its mode changed
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-10-23 11:41:25.541909864 +0100
+++ gcc/emit-rtl.c 2017-10-23 11:41:32.369050264 +0100
@@ -5756,32 +5756,60 @@ init_emit (void)
#endif
}
-/* Generate a vector constant for mode MODE and constant value CONSTANT. */
+/* Like gen_const_vec_duplicate, but ignore const_tiny_rtx. */
static rtx
-gen_const_vector (machine_mode mode, int constant)
+gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
{
- rtx tem;
- rtvec v;
- int units, i;
- machine_mode inner;
+ int nunits = GET_MODE_NUNITS (mode);
+ rtvec v = rtvec_alloc (nunits);
+ for (int i = 0; i < nunits; ++i)
+ RTVEC_ELT (v, i) = el;
+ return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
- units = GET_MODE_NUNITS (mode);
- inner = GET_MODE_INNER (mode);
+/* Generate a vector constant of mode MODE in which every element has
+ value ELT. */
- gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+rtx
+gen_const_vec_duplicate (machine_mode mode, rtx elt)
+{
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ if (elt == CONST0_RTX (inner_mode))
+ return CONST0_RTX (mode);
+ else if (elt == CONST1_RTX (inner_mode))
+ return CONST1_RTX (mode);
+ else if (elt == CONSTM1_RTX (inner_mode))
+ return CONSTM1_RTX (mode);
+
+ return gen_const_vec_duplicate_1 (mode, elt);
+}
- v = rtvec_alloc (units);
+/* Return a vector rtx of mode MODE in which every element has value X.
+ The result will be a constant if X is constant. */
- /* We need to call this function after we set the scalar const_tiny_rtx
- entries. */
- gcc_assert (const_tiny_rtx[constant][(int) inner]);
+rtx
+gen_vec_duplicate (machine_mode mode, rtx x)
+{
+ if (CONSTANT_P (x))
+ return gen_const_vec_duplicate (mode, x);
+ return gen_rtx_VEC_DUPLICATE (mode, x);
+}
- for (i = 0; i < units; ++i)
- RTVEC_ELT (v, i) = const_tiny_rtx[constant][(int) inner];
+/* Generate a new vector constant for mode MODE and constant value
+ CONSTANT. */
- tem = gen_rtx_raw_CONST_VECTOR (mode, v);
- return tem;
+static rtx
+gen_const_vector (machine_mode mode, int constant)
+{
+ machine_mode inner = GET_MODE_INNER (mode);
+
+ gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+
+ rtx el = const_tiny_rtx[constant][(int) inner];
+ gcc_assert (el);
+
+ return gen_const_vec_duplicate_1 (mode, el);
}
/* Generate a vector like gen_rtx_raw_CONST_VEC, but use the zero vector when
@@ -5789,28 +5817,12 @@ gen_const_vector (machine_mode mode, int
rtx
gen_rtx_CONST_VECTOR (machine_mode mode, rtvec v)
{
- machine_mode inner = GET_MODE_INNER (mode);
- int nunits = GET_MODE_NUNITS (mode);
- rtx x;
- int i;
-
- /* Check to see if all of the elements have the same value. */
- x = RTVEC_ELT (v, nunits - 1);
- for (i = nunits - 2; i >= 0; i--)
- if (RTVEC_ELT (v, i) != x)
- break;
+ gcc_assert (GET_MODE_NUNITS (mode) == GET_NUM_ELEM (v));
/* If the values are all the same, check to see if we can use one of the
standard constant vectors. */
- if (i == -1)
- {
- if (x == CONST0_RTX (inner))
- return CONST0_RTX (mode);
- else if (x == CONST1_RTX (inner))
- return CONST1_RTX (mode);
- else if (x == CONSTM1_RTX (inner))
- return CONSTM1_RTX (mode);
- }
+ if (rtvec_all_equal_p (v))
+ return gen_const_vec_duplicate (mode, RTVEC_ELT (v, 0));
return gen_rtx_raw_CONST_VECTOR (mode, v);
}
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:41:23.502006982 +0100
+++ gcc/optabs.c 2017-10-23 11:41:32.369050264 +0100
@@ -377,13 +377,8 @@ expand_vector_broadcast (machine_mode vm
gcc_checking_assert (VECTOR_MODE_P (vmode));
- n = GET_MODE_NUNITS (vmode);
- vec = rtvec_alloc (n);
- for (i = 0; i < n; ++i)
- RTVEC_ELT (vec, i) = op;
-
if (CONSTANT_P (op))
- return gen_rtx_CONST_VECTOR (vmode, vec);
+ return gen_const_vec_duplicate (vmode, op);
/* ??? If the target doesn't have a vec_init, then we have no easy way
of performing this operation. Most of this sort of generic support
@@ -393,6 +388,10 @@ expand_vector_broadcast (machine_mode vm
if (icode == CODE_FOR_nothing)
return NULL;
+ n = GET_MODE_NUNITS (vmode);
+ vec = rtvec_alloc (n);
+ for (i = 0; i < n; ++i)
+ RTVEC_ELT (vec, i) = op;
ret = gen_reg_rtx (vmode);
emit_insn (GEN_FCN (icode) (ret, gen_rtx_PARALLEL (vmode, vec)));
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c 2017-10-23 11:41:25.549647760 +0100
+++ gcc/simplify-rtx.c 2017-10-23 11:41:32.370050264 +0100
@@ -1704,28 +1704,23 @@ simplify_const_unary_operation (enum rtx
gcc_assert (GET_MODE_INNER (mode) == GET_MODE_INNER
(GET_MODE (op)));
}
- if (CONST_SCALAR_INT_P (op) || CONST_DOUBLE_AS_FLOAT_P (op)
- || GET_CODE (op) == CONST_VECTOR)
+ if (CONST_SCALAR_INT_P (op) || CONST_DOUBLE_AS_FLOAT_P (op))
+ return gen_const_vec_duplicate (mode, op);
+ if (GET_CODE (op) == CONST_VECTOR)
{
int elt_size = GET_MODE_UNIT_SIZE (mode);
unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
rtvec v = rtvec_alloc (n_elts);
unsigned int i;
- if (GET_CODE (op) != CONST_VECTOR)
- for (i = 0; i < n_elts; i++)
- RTVEC_ELT (v, i) = op;
- else
- {
- machine_mode inmode = GET_MODE (op);
- int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
- unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
-
- gcc_assert (in_n_elts < n_elts);
- gcc_assert ((n_elts % in_n_elts) == 0);
- for (i = 0; i < n_elts; i++)
- RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
- }
+ machine_mode inmode = GET_MODE (op);
+ int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
+ unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
+
+ gcc_assert (in_n_elts < n_elts);
+ gcc_assert ((n_elts % in_n_elts) == 0);
+ for (i = 0; i < n_elts; i++)
+ RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
return gen_rtx_CONST_VECTOR (mode, v);
}
}
@@ -4632,20 +4627,13 @@ simplify_relational_operation (enum rtx_
return CONST0_RTX (mode);
#ifdef VECTOR_STORE_FLAG_VALUE
{
- int i, units;
- rtvec v;
-
rtx val = VECTOR_STORE_FLAG_VALUE (mode);
if (val == NULL_RTX)
return NULL_RTX;
if (val == const1_rtx)
return CONST1_RTX (mode);
- units = GET_MODE_NUNITS (mode);
- v = rtvec_alloc (units);
- for (i = 0; i < units; i++)
- RTVEC_ELT (v, i) = val;
- return gen_rtx_raw_CONST_VECTOR (mode, v);
+ return gen_const_vec_duplicate (mode, val);
}
#else
return NULL_RTX;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-23 11:41:23.125751780 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-23 11:41:32.352050263 +0100
@@ -11726,16 +11726,8 @@ aarch64_mov_operand_p (rtx x, machine_mo
rtx
aarch64_simd_gen_const_vector_dup (machine_mode mode, HOST_WIDE_INT val)
{
- int nunits = GET_MODE_NUNITS (mode);
- rtvec v = rtvec_alloc (nunits);
- int i;
-
- rtx cache = GEN_INT (val);
-
- for (i=0; i < nunits; i++)
- RTVEC_ELT (v, i) = cache;
-
- return gen_rtx_CONST_VECTOR (mode, v);
+ rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+ return gen_const_vec_duplicate (mode, c);
}
/* Check OP is a legal scalar immediate for the MOVI instruction. */
@@ -11947,7 +11939,7 @@ aarch64_simd_dup_constant (rtx vals)
single ARM register. This will be cheaper than a vector
load. */
x = copy_to_mode_reg (inner_mode, x);
- return gen_rtx_VEC_DUPLICATE (mode, x);
+ return gen_vec_duplicate (mode, x);
}
@@ -12046,7 +12038,7 @@ aarch64_expand_vector_init (rtx target,
if (all_same)
{
rtx x = copy_to_mode_reg (inner_mode, v0);
- aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+ aarch64_emit_move (target, gen_vec_duplicate (mode, x));
return;
}
@@ -12087,7 +12079,7 @@ aarch64_expand_vector_init (rtx target,
/* Create a duplicate of the most common element. */
rtx x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, maxelement));
- aarch64_emit_move (target, gen_rtx_VEC_DUPLICATE (mode, x));
+ aarch64_emit_move (target, gen_vec_duplicate (mode, x));
/* Insert the rest. */
for (int i = 0; i < n_elts; i++)
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c 2017-10-23 11:41:22.965190434 +0100
+++ gcc/config/arm/arm.c 2017-10-23 11:41:32.355050263 +0100
@@ -12151,7 +12151,7 @@ neon_vdup_constant (rtx vals)
load. */
x = copy_to_mode_reg (inner_mode, x);
- return gen_rtx_VEC_DUPLICATE (mode, x);
+ return gen_vec_duplicate (mode, x);
}
/* Generate code to load VALS, which is a PARALLEL containing only
@@ -12246,7 +12246,7 @@ neon_expand_vector_init (rtx target, rtx
if (all_same && GET_MODE_SIZE (inner_mode) <= 4)
{
x = copy_to_mode_reg (inner_mode, XVECEXP (vals, 0, 0));
- emit_insn (gen_rtx_SET (target, gen_rtx_VEC_DUPLICATE (mode, x)));
+ emit_insn (gen_rtx_SET (target, gen_vec_duplicate (mode, x)));
return;
}
@@ -28731,9 +28731,9 @@ arm_expand_vec_perm_1 (rtx target, rtx o
arm_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
{
machine_mode vmode = GET_MODE (target);
- unsigned int i, nelt = GET_MODE_NUNITS (vmode);
+ unsigned int nelt = GET_MODE_NUNITS (vmode);
bool one_vector_p = rtx_equal_p (op0, op1);
- rtx rmask[MAX_VECT_LEN], mask;
+ rtx mask;
/* TODO: ARM's VTBL indexing is little-endian. In order to handle GCC's
numbering of elements for big-endian, we must reverse the order. */
@@ -28742,9 +28742,7 @@ arm_expand_vec_perm (rtx target, rtx op0
/* The VTBL instruction does not use a modulo index, so we must take care
of that ourselves. */
mask = GEN_INT (one_vector_p ? nelt - 1 : 2 * nelt - 1);
- for (i = 0; i < nelt; ++i)
- rmask[i] = mask;
- mask = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rmask));
+ mask = gen_const_vec_duplicate (vmode, mask);
sel = expand_simple_binop (vmode, AND, sel, mask, NULL, 0, OPTAB_LIB_WIDEN);
arm_expand_vec_perm_1 (target, op0, op1, sel);
@@ -29798,10 +29796,9 @@ arm_block_set_unaligned_vect (rtx dstbas
unsigned HOST_WIDE_INT value,
unsigned HOST_WIDE_INT align)
{
- unsigned int i, j, nelt_v16, nelt_v8, nelt_mode;
+ unsigned int i, nelt_v16, nelt_v8, nelt_mode;
rtx dst, mem;
- rtx val_elt, val_vec, reg;
- rtx rval[MAX_VECT_LEN];
+ rtx val_vec, reg;
rtx (*gen_func) (rtx, rtx);
machine_mode mode;
unsigned HOST_WIDE_INT v = value;
@@ -29829,12 +29826,9 @@ arm_block_set_unaligned_vect (rtx dstbas
mem = adjust_automodify_address (dstbase, mode, dst, offset);
v = sext_hwi (v, BITS_PER_WORD);
- val_elt = GEN_INT (v);
- for (j = 0; j < nelt_mode; j++)
- rval[j] = val_elt;
reg = gen_reg_rtx (mode);
- val_vec = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (nelt_mode, rval));
+ val_vec = gen_const_vec_duplicate (mode, GEN_INT (v));
/* Emit instruction loading the constant value. */
emit_move_insn (reg, val_vec);
@@ -29898,10 +29892,9 @@ arm_block_set_aligned_vect (rtx dstbase,
unsigned HOST_WIDE_INT value,
unsigned HOST_WIDE_INT align)
{
- unsigned int i, j, nelt_v8, nelt_v16, nelt_mode;
+ unsigned int i, nelt_v8, nelt_v16, nelt_mode;
rtx dst, addr, mem;
- rtx val_elt, val_vec, reg;
- rtx rval[MAX_VECT_LEN];
+ rtx val_vec, reg;
machine_mode mode;
unsigned HOST_WIDE_INT v = value;
unsigned int offset = 0;
@@ -29923,12 +29916,9 @@ arm_block_set_aligned_vect (rtx dstbase,
dst = copy_addr_to_reg (XEXP (dstbase, 0));
v = sext_hwi (v, BITS_PER_WORD);
- val_elt = GEN_INT (v);
- for (j = 0; j < nelt_mode; j++)
- rval[j] = val_elt;
reg = gen_reg_rtx (mode);
- val_vec = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (nelt_mode, rval));
+ val_vec = gen_const_vec_duplicate (mode, GEN_INT (v));
/* Emit instruction loading the constant value. */
emit_move_insn (reg, val_vec);
Index: gcc/config/arm/neon.md
===================================================================
--- gcc/config/arm/neon.md 2017-10-23 11:41:22.968092145 +0100
+++ gcc/config/arm/neon.md 2017-10-23 11:41:32.356050263 +0100
@@ -3052,15 +3052,10 @@ (define_expand "neon_copysignf<mode>"
"{
rtx v_bitmask_cast;
rtx v_bitmask = gen_reg_rtx (<VCVTF:V_cmp_result>mode);
- int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
- rtvec v = rtvec_alloc (n_elt);
-
- /* Create bitmask for vector select. */
- for (i = 0; i < n_elt; ++i)
- RTVEC_ELT (v, i) = GEN_INT (0x80000000);
+ rtx c = GEN_INT (0x80000000);
emit_move_insn (v_bitmask,
- gen_rtx_CONST_VECTOR (<VCVTF:V_cmp_result>mode, v));
+ gen_const_vec_duplicate (<VCVTF:V_cmp_result>mode, c));
emit_move_insn (operands[0], operands[2]);
v_bitmask_cast = simplify_gen_subreg (<MODE>mode, v_bitmask,
<VCVTF:V_cmp_result>mode, 0);
Index: gcc/config/i386/i386.c
===================================================================
--- gcc/config/i386/i386.c 2017-10-23 11:41:22.913926872 +0100
+++ gcc/config/i386/i386.c 2017-10-23 11:41:32.360050263 +0100
@@ -24117,9 +24117,7 @@ ix86_expand_vec_perm (rtx operands[])
t2 = gen_reg_rtx (V32QImode);
t3 = gen_reg_rtx (V32QImode);
vt2 = GEN_INT (-128);
- for (i = 0; i < 32; i++)
- vec[i] = vt2;
- vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
+ vt = gen_const_vec_duplicate (V32QImode, vt2);
vt = force_reg (V32QImode, vt);
for (i = 0; i < 32; i++)
vec[i] = i < 16 ? vt2 : const0_rtx;
@@ -24227,9 +24225,7 @@ ix86_expand_vec_perm (rtx operands[])
vt = GEN_INT (w - 1);
}
- for (i = 0; i < w; i++)
- vec[i] = vt;
- vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+ vt = gen_const_vec_duplicate (maskmode, vt);
mask = expand_simple_binop (maskmode, AND, mask, vt,
NULL_RTX, 0, OPTAB_DIRECT);
@@ -24319,9 +24315,7 @@ ix86_expand_vec_perm (rtx operands[])
e = w = 4;
}
- for (i = 0; i < w; i++)
- vec[i] = vt;
- vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+ vt = gen_const_vec_duplicate (maskmode, vt);
vt = force_reg (maskmode, vt);
mask = expand_simple_binop (maskmode, AND, mask, vt,
NULL_RTX, 0, OPTAB_DIRECT);
@@ -40814,7 +40808,7 @@ ix86_vector_duplicate_value (machine_mod
rtx dup;
/* First attempt to recognize VAL as-is. */
- dup = gen_rtx_VEC_DUPLICATE (mode, val);
+ dup = gen_vec_duplicate (mode, val);
insn = emit_insn (gen_rtx_SET (target, dup));
if (recog_memoized (insn) < 0)
{
@@ -46120,7 +46114,7 @@ expand_vec_perm_vpshufb2_vpermq_even_odd
static bool
expand_vec_perm_even_odd_pack (struct expand_vec_perm_d *d)
{
- rtx op, dop0, dop1, t, rperm[16];
+ rtx op, dop0, dop1, t;
unsigned i, odd, c, s, nelt = d->nelt;
bool end_perm = false;
machine_mode half_mode;
@@ -46197,9 +46191,7 @@ expand_vec_perm_even_odd_pack (struct ex
dop1 = gen_reg_rtx (half_mode);
if (odd == 0)
{
- for (i = 0; i < nelt / 2; i++)
- rperm[i] = GEN_INT (c);
- t = gen_rtx_CONST_VECTOR (half_mode, gen_rtvec_v (nelt / 2, rperm));
+ t = gen_const_vec_duplicate (half_mode, GEN_INT (c));
t = force_reg (half_mode, t);
emit_insn (gen_and (dop0, t, gen_lowpart (half_mode, d->op0)));
emit_insn (gen_and (dop1, t, gen_lowpart (half_mode, d->op1)));
Index: gcc/config/i386/sse.md
===================================================================
--- gcc/config/i386/sse.md 2017-10-23 11:41:22.905221739 +0100
+++ gcc/config/i386/sse.md 2017-10-23 11:41:32.362050263 +0100
@@ -11529,13 +11529,7 @@ (define_expand "one_cmpl<mode>2"
(match_dup 2)))]
"TARGET_SSE"
{
- int i, n = GET_MODE_NUNITS (<MODE>mode);
- rtvec v = rtvec_alloc (n);
-
- for (i = 0; i < n; ++i)
- RTVEC_ELT (v, i) = constm1_rtx;
-
- operands[2] = force_reg (<MODE>mode, gen_rtx_CONST_VECTOR (<MODE>mode, v));
+ operands[2] = force_reg (<MODE>mode, CONSTM1_RTX (<MODE>mode));
})
(define_expand "<sse2_avx2>_andnot<mode>3"
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c 2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/ia64/ia64.c 2017-10-23 11:41:32.363050263 +0100
@@ -1938,7 +1938,7 @@ ia64_expand_vecint_compare (enum rtx_cod
/* Subtract (-(INT MAX) - 1) from both operands to make
them signed. */
mask = gen_int_mode (0x80000000, SImode);
- mask = gen_rtx_CONST_VECTOR (V2SImode, gen_rtvec (2, mask, mask));
+ mask = gen_const_vec_duplicate (V2SImode, mask);
mask = force_reg (mode, mask);
t1 = gen_reg_rtx (mode);
emit_insn (gen_subv2si3 (t1, op0, mask));
Index: gcc/config/ia64/vect.md
===================================================================
--- gcc/config/ia64/vect.md 2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/ia64/vect.md 2017-10-23 11:41:32.363050263 +0100
@@ -1138,8 +1138,7 @@ (define_expand "addv2sf3"
(match_operand:V2SF 2 "fr_register_operand" "")))]
""
{
- rtvec v = gen_rtvec (2, CONST1_RTX (SFmode), CONST1_RTX (SFmode));
- operands[3] = force_reg (V2SFmode, gen_rtx_CONST_VECTOR (V2SFmode, v));
+ operands[3] = force_reg (V2SFmode, CONST1_RTX (V2SFmode));
})
(define_expand "subv2sf3"
@@ -1150,8 +1149,7 @@ (define_expand "subv2sf3"
(neg:V2SF (match_operand:V2SF 2 "fr_register_operand" ""))))]
""
{
- rtvec v = gen_rtvec (2, CONST1_RTX (SFmode), CONST1_RTX (SFmode));
- operands[3] = force_reg (V2SFmode, gen_rtx_CONST_VECTOR (V2SFmode, v));
+ operands[3] = force_reg (V2SFmode, CONST1_RTX (V2SFmode));
})
(define_insn "mulv2sf3"
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c 2017-10-23 11:41:22.797858429 +0100
+++ gcc/config/mips/mips.c 2017-10-23 11:41:32.365050264 +0100
@@ -21681,14 +21681,8 @@ mips_expand_vi_broadcast (machine_mode v
rtx
mips_gen_const_int_vector (machine_mode mode, HOST_WIDE_INT val)
{
- int nunits = GET_MODE_NUNITS (mode);
- rtvec v = rtvec_alloc (nunits);
- int i;
-
- for (i = 0; i < nunits; i++)
- RTVEC_ELT (v, i) = gen_int_mode (val, GET_MODE_INNER (mode));
-
- return gen_rtx_CONST_VECTOR (mode, v);
+ rtx c = gen_int_mode (val, GET_MODE_INNER (mode));
+ return gen_const_vec_duplicate (mode, c);
}
/* Return a vector of repeated 4-element sets generated from
@@ -21843,12 +21837,7 @@ mips_expand_vector_init (rtx target, rtx
}
else
{
- rtvec vec = shallow_copy_rtvec (XVEC (vals, 0));
-
- for (i = 0; i < nelt; ++i)
- RTVEC_ELT (vec, i) = CONST0_RTX (imode);
-
- emit_move_insn (target, gen_rtx_CONST_VECTOR (vmode, vec));
+ emit_move_insn (target, CONST0_RTX (vmode));
for (i = 0; i < nelt; ++i)
{
Index: gcc/config/powerpcspe/altivec.md
===================================================================
--- gcc/config/powerpcspe/altivec.md 2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/powerpcspe/altivec.md 2017-10-23 11:41:32.366050264 +0100
@@ -352,12 +352,10 @@ (define_split
HOST_WIDE_INT val = const_vector_elt_as_int (op1, elt);
rtx rtx_val = GEN_INT (val);
int shift = vspltis_shifted (op1);
- int nunits = GET_MODE_NUNITS (<MODE>mode);
- int i;
gcc_assert (shift != 0);
operands[2] = gen_reg_rtx (<MODE>mode);
- operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, rtvec_alloc (nunits));
+ operands[3] = gen_const_vec_duplicate (<MODE>mode, rtx_val);
operands[4] = gen_reg_rtx (<MODE>mode);
if (shift < 0)
@@ -370,10 +368,6 @@ (define_split
operands[5] = CONST0_RTX (<MODE>mode);
operands[6] = GEN_INT (shift);
}
-
- /* Populate the constant vectors. */
- for (i = 0; i < nunits; i++)
- XVECEXP (operands[3], 0, i) = rtx_val;
})
(define_insn "get_vrsave_internal"
@@ -2752,15 +2746,8 @@ (define_expand "abs<mode>2"
(smax:VI2 (match_dup 1) (match_dup 4)))]
"<VI_unit>"
{
- int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
- rtvec v = rtvec_alloc (n_elt);
-
- /* Create an all 0 constant. */
- for (i = 0; i < n_elt; ++i)
- RTVEC_ELT (v, i) = const0_rtx;
-
operands[2] = gen_reg_rtx (<MODE>mode);
- operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+ operands[3] = CONST0_RTX (<MODE>mode);
operands[4] = gen_reg_rtx (<MODE>mode);
})
@@ -2777,17 +2764,8 @@ (define_expand "nabs<mode>2"
(smin:VI2 (match_dup 1) (match_dup 4)))]
"<VI_unit>"
{
- int i;
- int n_elt = GET_MODE_NUNITS (<MODE>mode);
-
- rtvec v = rtvec_alloc (n_elt);
-
- /* Create an all 0 constant. */
- for (i = 0; i < n_elt; ++i)
- RTVEC_ELT (v, i) = const0_rtx;
-
operands[2] = gen_reg_rtx (<MODE>mode);
- operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+ operands[3] = CONST0_RTX (<MODE>mode);
operands[4] = gen_reg_rtx (<MODE>mode);
})
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md 2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/rs6000/altivec.md 2017-10-23 11:41:32.366050264 +0100
@@ -363,12 +363,10 @@ (define_split
HOST_WIDE_INT val = const_vector_elt_as_int (op1, elt);
rtx rtx_val = GEN_INT (val);
int shift = vspltis_shifted (op1);
- int nunits = GET_MODE_NUNITS (<MODE>mode);
- int i;
gcc_assert (shift != 0);
operands[2] = gen_reg_rtx (<MODE>mode);
- operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, rtvec_alloc (nunits));
+ operands[3] = gen_const_vec_duplicate (<MODE>mode, rtx_val);
operands[4] = gen_reg_rtx (<MODE>mode);
if (shift < 0)
@@ -381,10 +379,6 @@ (define_split
operands[5] = CONST0_RTX (<MODE>mode);
operands[6] = GEN_INT (shift);
}
-
- /* Populate the constant vectors. */
- for (i = 0; i < nunits; i++)
- XVECEXP (operands[3], 0, i) = rtx_val;
})
(define_insn "get_vrsave_internal"
@@ -3237,15 +3231,8 @@ (define_expand "abs<mode>2"
(smax:VI2 (match_dup 1) (match_dup 4)))]
"<VI_unit>"
{
- int i, n_elt = GET_MODE_NUNITS (<MODE>mode);
- rtvec v = rtvec_alloc (n_elt);
-
- /* Create an all 0 constant. */
- for (i = 0; i < n_elt; ++i)
- RTVEC_ELT (v, i) = const0_rtx;
-
operands[2] = gen_reg_rtx (<MODE>mode);
- operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+ operands[3] = CONST0_RTX (<MODE>mode);
operands[4] = gen_reg_rtx (<MODE>mode);
})
@@ -3262,17 +3249,8 @@ (define_expand "nabs<mode>2"
(smin:VI2 (match_dup 1) (match_dup 4)))]
"<VI_unit>"
{
- int i;
- int n_elt = GET_MODE_NUNITS (<MODE>mode);
-
- rtvec v = rtvec_alloc (n_elt);
-
- /* Create an all 0 constant. */
- for (i = 0; i < n_elt; ++i)
- RTVEC_ELT (v, i) = const0_rtx;
-
operands[2] = gen_reg_rtx (<MODE>mode);
- operands[3] = gen_rtx_CONST_VECTOR (<MODE>mode, v);
+ operands[3] = CONST0_RTX (<MODE>mode);
operands[4] = gen_reg_rtx (<MODE>mode);
})
Index: gcc/config/s390/vx-builtins.md
===================================================================
--- gcc/config/s390/vx-builtins.md 2017-10-23 11:40:11.561479591 +0100
+++ gcc/config/s390/vx-builtins.md 2017-10-23 11:41:32.367050264 +0100
@@ -91,12 +91,10 @@ (define_expand "vec_genmask<mode>"
(match_operand:QI 2 "const_int_operand" "C")]
"TARGET_VX"
{
- int nunits = GET_MODE_NUNITS (<VI_HW:MODE>mode);
int bitlen = GET_MODE_UNIT_BITSIZE (<VI_HW:MODE>mode);
/* To bit little endian style. */
int end = bitlen - 1 - INTVAL (operands[1]);
int start = bitlen - 1 - INTVAL (operands[2]);
- rtx const_vec[16];
int i;
unsigned HOST_WIDE_INT mask;
bool swapped_p = false;
@@ -116,13 +114,11 @@ (define_expand "vec_genmask<mode>"
if (swapped_p)
mask = ~mask;
- for (i = 0; i < nunits; i++)
- const_vec[i] = GEN_INT (trunc_int_for_mode (mask,
- GET_MODE_INNER (<VI_HW:MODE>mode)));
+ rtx mask_rtx = gen_int_mode (mask, GET_MODE_INNER (<VI_HW:MODE>mode));
emit_insn (gen_rtx_SET (operands[0],
- gen_rtx_CONST_VECTOR (<VI_HW:MODE>mode,
- gen_rtvec_v (nunits, const_vec))));
+ gen_const_vec_duplicate (<VI_HW:MODE>mode,
+ mask_rtx)));
DONE;
})
@@ -1623,7 +1619,7 @@ (define_expand "vec_ctd_s64"
real_2expN (&f, -INTVAL (operands[2]), DFmode);
c = const_double_from_real_value (f, DFmode);
- operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+ operands[3] = gen_const_vec_duplicate (V2DFmode, c);
operands[3] = force_reg (V2DFmode, operands[3]);
})
@@ -1654,7 +1650,7 @@ (define_expand "vec_ctd_u64"
real_2expN (&f, -INTVAL (operands[2]), DFmode);
c = const_double_from_real_value (f, DFmode);
- operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+ operands[3] = gen_const_vec_duplicate (V2DFmode, c);
operands[3] = force_reg (V2DFmode, operands[3]);
})
@@ -1686,7 +1682,7 @@ (define_expand "vec_ctsl"
real_2expN (&f, INTVAL (operands[2]), DFmode);
c = const_double_from_real_value (f, DFmode);
- operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+ operands[3] = gen_const_vec_duplicate (V2DFmode, c);
operands[3] = force_reg (V2DFmode, operands[3]);
operands[4] = gen_reg_rtx (V2DFmode);
})
@@ -1719,7 +1715,7 @@ (define_expand "vec_ctul"
real_2expN (&f, INTVAL (operands[2]), DFmode);
c = const_double_from_real_value (f, DFmode);
- operands[3] = gen_rtx_CONST_VECTOR (V2DFmode, gen_rtvec (2, c, c));
+ operands[3] = gen_const_vec_duplicate (V2DFmode, c);
operands[3] = force_reg (V2DFmode, operands[3]);
operands[4] = gen_reg_rtx (V2DFmode);
})
Index: gcc/config/spu/spu.c
===================================================================
--- gcc/config/spu/spu.c 2017-10-23 11:41:23.057077951 +0100
+++ gcc/config/spu/spu.c 2017-10-23 11:41:32.368050264 +0100
@@ -1903,8 +1903,6 @@ spu_return_addr (int count, rtx frame AT
spu_const (machine_mode mode, HOST_WIDE_INT val)
{
rtx inner;
- rtvec v;
- int units, i;
gcc_assert (GET_MODE_CLASS (mode) == MODE_INT
|| GET_MODE_CLASS (mode) == MODE_FLOAT
@@ -1923,14 +1921,7 @@ spu_const (machine_mode mode, HOST_WIDE_
else
inner = hwint_to_const_double (GET_MODE_INNER (mode), val);
- units = GET_MODE_NUNITS (mode);
-
- v = rtvec_alloc (units);
-
- for (i = 0; i < units; ++i)
- RTVEC_ELT (v, i) = inner;
-
- return gen_rtx_CONST_VECTOR (mode, v);
+ return gen_const_vec_duplicate (mode, inner);
}
/* Create a MODE vector constant from 4 ints. */
^ permalink raw reply [flat|nested] 90+ messages in thread
* [03/nn] Allow vector CONSTs
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
@ 2017-10-23 11:19 ` Richard Sandiford
2017-10-25 16:59 ` Jeff Law
2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
` (18 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:19 UTC (permalink / raw)
To: gcc-patches
This patch allows (const ...) wrappers to be used for rtx vector
constants, as an alternative to const_vector. This is useful
for SVE, where the number of elements isn't known until runtime.
It could also be useful in future for fixed-length vectors, to
reduce the amount of memory needed to represent simple constants
with high element counts. However, one nice thing about keeping
it restricted to variable-length vectors is that there is never
any need to handle combinations of (const ...) and CONST_VECTOR.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi (const): Update description of address constants.
Say that vector constants are allowed too.
* common.md (E, F): Use CONSTANT_P instead of checking for
CONST_VECTOR.
* emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
checking for CONST_VECTOR.
* expmed.c (make_tree): Use build_vector_from_val for a CONST
VEC_DUPLICATE.
* expr.c (expand_expr_real_2): Check for vector modes instead
of checking for CONST_VECTOR.
* rtl.h (const_vec_p): New function.
(const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
(unwrap_const_vec_duplicate): Handle them here too.
Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi 2017-10-23 11:41:22.176892260 +0100
+++ gcc/doc/rtl.texi 2017-10-23 11:41:39.185050437 +0100
@@ -1667,14 +1667,17 @@ Usually that is the only mode for which
@findex const
@item (const:@var{m} @var{exp})
-Represents a constant that is the result of an assembly-time
-arithmetic computation. The operand, @var{exp}, is an expression that
-contains only constants (@code{const_int}, @code{symbol_ref} and
-@code{label_ref} expressions) combined with @code{plus} and
-@code{minus}. However, not all combinations are valid, since the
-assembler cannot do arbitrary arithmetic on relocatable symbols.
+Wraps an rtx computation @var{exp} whose inputs and result do not
+change during the execution of a thread. There are two valid uses.
+The first is to represent a global or thread-local address calculation.
+In this case @var{exp} should contain @code{const_int},
+@code{symbol_ref}, @code{label_ref} or @code{unspec} expressions,
+combined with @code{plus} and @code{minus}. Any such @code{unspec}s
+are target-specific and typically represent some form of relocation
+operator. @var{m} should be a valid address mode.
-@var{m} should be @code{Pmode}.
+The second use of @code{const} is to wrap a vector operation.
+In this case @var{exp} must be a @code{vec_duplicate} expression.
@findex high
@item (high:@var{m} @var{exp})
Index: gcc/common.md
===================================================================
--- gcc/common.md 2017-10-23 11:40:11.431285821 +0100
+++ gcc/common.md 2017-10-23 11:41:39.184050436 +0100
@@ -80,14 +80,14 @@ (define_constraint "n"
(define_constraint "E"
"Matches a floating-point constant."
(ior (match_test "CONST_DOUBLE_AS_FLOAT_P (op)")
- (match_test "GET_CODE (op) == CONST_VECTOR
+ (match_test "CONSTANT_P (op)
&& GET_MODE_CLASS (GET_MODE (op)) == MODE_VECTOR_FLOAT")))
;; There is no longer a distinction between "E" and "F".
(define_constraint "F"
"Matches a floating-point constant."
(ior (match_test "CONST_DOUBLE_AS_FLOAT_P (op)")
- (match_test "GET_CODE (op) == CONST_VECTOR
+ (match_test "CONSTANT_P (op)
&& GET_MODE_CLASS (GET_MODE (op)) == MODE_VECTOR_FLOAT")))
(define_constraint "X"
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-10-23 11:41:32.369050264 +0100
+++ gcc/emit-rtl.c 2017-10-23 11:41:39.186050437 +0100
@@ -1470,7 +1470,7 @@ gen_lowpart_common (machine_mode mode, r
return gen_rtx_fmt_e (GET_CODE (x), int_mode, XEXP (x, 0));
}
else if (GET_CODE (x) == SUBREG || REG_P (x)
- || GET_CODE (x) == CONCAT || GET_CODE (x) == CONST_VECTOR
+ || GET_CODE (x) == CONCAT || const_vec_p (x)
|| CONST_DOUBLE_AS_FLOAT_P (x) || CONST_SCALAR_INT_P (x))
return lowpart_subreg (mode, x, innermode);
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c 2017-10-23 11:41:25.541909864 +0100
+++ gcc/expmed.c 2017-10-23 11:41:39.186050437 +0100
@@ -5246,7 +5246,15 @@ make_tree (tree type, rtx x)
return fold_convert (type, make_tree (t, XEXP (x, 0)));
case CONST:
- return make_tree (type, XEXP (x, 0));
+ {
+ rtx op = XEXP (x, 0);
+ if (GET_CODE (op) == VEC_DUPLICATE)
+ {
+ tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
+ return build_vector_from_val (type, elt_tree);
+ }
+ return make_tree (type, op);
+ }
case SYMBOL_REF:
t = SYMBOL_REF_DECL (x);
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-10-23 11:41:24.408308073 +0100
+++ gcc/expr.c 2017-10-23 11:41:39.187050437 +0100
@@ -9429,7 +9429,7 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
/* Careful here: if the target doesn't support integral vector modes,
a constant selection vector could wind up smooshed into a normal
integral constant. */
- if (CONSTANT_P (op2) && GET_CODE (op2) != CONST_VECTOR)
+ if (CONSTANT_P (op2) && !VECTOR_MODE_P (GET_MODE (op2)))
{
tree sel_type = TREE_TYPE (treeop2);
machine_mode vmode
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-23 11:41:36.307050364 +0100
+++ gcc/rtl.h 2017-10-23 11:41:39.188050437 +0100
@@ -2749,12 +2749,22 @@ extern rtx shallow_copy_rtx (const_rtx C
extern int rtx_equal_p (const_rtx, const_rtx);
extern bool rtvec_all_equal_p (const_rtvec);
+/* Return true if X is some form of vector constant. */
+
+inline bool
+const_vec_p (const_rtx x)
+{
+ return VECTOR_MODE_P (GET_MODE (x)) && CONSTANT_P (x);
+}
+
/* Return true if X is a vector constant with a duplicated element value. */
inline bool
const_vec_duplicate_p (const_rtx x)
{
- return GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0));
+ return ((GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0)))
+ || (GET_CODE (x) == CONST
+ && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE));
}
/* Return true if X is a vector constant with a duplicated element value.
@@ -2764,11 +2774,16 @@ const_vec_duplicate_p (const_rtx x)
inline bool
const_vec_duplicate_p (T x, T *elt)
{
- if (const_vec_duplicate_p (x))
+ if (GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0)))
{
*elt = CONST_VECTOR_ELT (x, 0);
return true;
}
+ if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+ {
+ *elt = XEXP (XEXP (x, 0), 0);
+ return true;
+ }
return false;
}
@@ -2794,8 +2809,10 @@ vec_duplicate_p (T x, T *elt)
inline T
unwrap_const_vec_duplicate (T x)
{
- if (const_vec_duplicate_p (x))
- x = CONST_VECTOR_ELT (x, 0);
+ if (GET_CODE (x) == CONST_VECTOR && rtvec_all_equal_p (XVEC (x, 0)))
+ return CONST_VECTOR_ELT (x, 0);
+ if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_DUPLICATE)
+ return XEXP (XEXP (x, 0), 0);
return x;
}
^ permalink raw reply [flat|nested] 90+ messages in thread
* [02/nn] Add more vec_duplicate simplifications
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
@ 2017-10-23 11:19 ` Richard Sandiford
2017-10-25 16:35 ` Jeff Law
2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
` (19 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:19 UTC (permalink / raw)
To: gcc-patches
This patch adds a vec_duplicate_p helper that tests for constant
or non-constant vector duplicates. Together with the existing
const_vec_duplicate_p, this complements the gen_vec_duplicate
and gen_const_vec_duplicate added by a previous patch.
The patch uses the new routines to add more rtx simplifications
involving vector duplicates. These mirror simplifications that
we already do for CONST_VECTOR broadcasts and are needed for
variable-length SVE, which uses:
(const:M (vec_duplicate:M X))
to represent constant broadcasts instead. The simplifications do
trigger on the testsuite for variable duplicates too, and in each
case I saw the change was an improvement. E.g.:
- Several targets had this simplification in gcc.dg/pr49948.c
when compiled at -O3:
-Failed to match this instruction:
+Successfully matched this instruction:
(set (reg:DI 88)
- (subreg:DI (vec_duplicate:V2DI (reg/f:DI 75 [ _4 ])) 0))
+ (reg/f:DI 75 [ _4 ]))
On aarch64 this gives:
ret
.p2align 2
.L8:
+ adrp x1, b
sub sp, sp, #80
- adrp x2, b
- add x1, sp, 12
+ add x2, sp, 12
str wzr, [x0, #:lo12:a]
+ str x2, [x1, #:lo12:b]
mov w0, 0
- dup v0.2d, x1
- str d0, [x2, #:lo12:b]
add sp, sp, 80
ret
.size foo, .-foo
On x86_64:
jg .L2
leaq -76(%rsp), %rax
movl $0, a(%rip)
- movq %rax, -96(%rsp)
- movq -96(%rsp), %xmm0
- punpcklqdq %xmm0, %xmm0
- movq %xmm0, b(%rip)
+ movq %rax, b(%rip)
.L2:
xorl %eax, %eax
ret
etc.
- gcc.dg/torture/pr58018.c compiled at -O3 on aarch64 has an instance of:
Trying 50, 52, 46 -> 53:
Failed to match this instruction:
(set (reg:V4SI 167)
- (and:V4SI (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
- (reg:V4SI 209))
- (const_vector:V4SI [
- (const_int 1 [0x1])
- (const_int 1 [0x1])
- (const_int 1 [0x1])
- (const_int 1 [0x1])
- ])))
+ (and:V4SI (vec_duplicate:V4SI (reg:SI 132 [ _165 ]))
+ (reg:V4SI 209)))
Successfully matched this instruction:
(set (reg:V4SI 163 [ vect_patt_16.14 ])
(vec_duplicate:V4SI (reg:SI 132 [ _165 ])))
+Successfully matched this instruction:
+(set (reg:V4SI 167)
+ (and:V4SI (reg:V4SI 163 [ vect_patt_16.14 ])
+ (reg:V4SI 209)))
where (reg:SI 132) is the result of a scalar comparison and so
is known to be 0 or 1. This saves a MOVI and vector AND:
cmp w7, 4
bls .L15
dup v1.4s, w2
- lsr w2, w1, 2
+ dup v2.4s, w6
movi v3.4s, 0
- mov w0, 0
- movi v2.4s, 0x1
+ lsr w2, w1, 2
mvni v0.4s, 0
+ mov w0, 0
cmge v1.4s, v1.4s, v3.4s
and v1.16b, v2.16b, v1.16b
- dup v2.4s, w6
- and v1.16b, v1.16b, v2.16b
.p2align 3
.L7:
and v0.16b, v0.16b, v1.16b
- powerpc64le has many instances of things like:
-Failed to match this instruction:
+Successfully matched this instruction:
(set (reg:V4SI 161 [ vect_cst__24 ])
- (vec_select:V4SI (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
- (parallel [
- (const_int 0 [0])
- ])))
- (parallel [
- (const_int 2 [0x2])
- (const_int 3 [0x3])
- (const_int 0 [0])
- (const_int 1 [0x1])
- ])))
+ (vec_duplicate:V4SI (vec_select:SI (reg:V4SI 143)
+ (parallel [
+ (const_int 0 [0])
+ ]))))
This removes redundant XXPERMDIs from many tests.
The best way of testing the new simplifications seemed to be
via selftests. The patch cribs part of David's patch here:
https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
David Malcolm <dmalcolm@redhat.com>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtl.h (vec_duplicate_p): New function.
* selftest-rtl.c (assert_rtx_eq_at): New function.
* selftest-rtl.h (ASSERT_RTX_EQ): New macro.
(assert_rtx_eq_at): Declare.
* selftest.h (selftest::simplify_rtx_c_tests): Declare.
* selftest-run-tests.c (selftest::run_tests): Call it.
* simplify-rtx.c: Include selftest.h and selftest-rtl.h.
(simplify_unary_operation_1): Recursively handle vector duplicates.
(simplify_binary_operation_1): Likewise. Handle VEC_SELECTs of
vector duplicates.
(simplify_subreg): Handle subregs of vector duplicates.
(make_test_reg, test_vector_ops_duplicate, test_vector_ops)
(selftest::simplify_rtx_c_tests): New functions.
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-23 11:40:11.485292126 +0100
+++ gcc/rtl.h 2017-10-23 11:41:36.307050364 +0100
@@ -2772,6 +2772,21 @@ const_vec_duplicate_p (T x, T *elt)
return false;
}
+/* Return true if X is a vector with a duplicated element value, either
+ constant or nonconstant. Store the duplicated element in *ELT if so. */
+
+template <typename T>
+inline bool
+vec_duplicate_p (T x, T *elt)
+{
+ if (GET_CODE (x) == VEC_DUPLICATE)
+ {
+ *elt = XEXP (x, 0);
+ return true;
+ }
+ return const_vec_duplicate_p (x, elt);
+}
+
/* If X is a vector constant with a duplicated element value, return that
element value, otherwise return X. */
Index: gcc/selftest-rtl.c
===================================================================
--- gcc/selftest-rtl.c 2017-10-23 11:40:11.485292126 +0100
+++ gcc/selftest-rtl.c 2017-10-23 11:41:36.307050364 +0100
@@ -35,6 +35,29 @@ Software Foundation; either version 3, o
namespace selftest {
+/* Compare rtx EXPECTED and ACTUAL using rtx_equal_p, calling
+ ::selftest::pass if they are equal, aborting if they are non-equal.
+ LOC is the effective location of the assertion, MSG describes it. */
+
+void
+assert_rtx_eq_at (const location &loc, const char *msg,
+ rtx expected, rtx actual)
+{
+ if (rtx_equal_p (expected, actual))
+ ::selftest::pass (loc, msg);
+ else
+ {
+ fprintf (stderr, "%s:%i: %s: FAIL: %s\n", loc.m_file, loc.m_line,
+ loc.m_function, msg);
+ fprintf (stderr, " expected: ");
+ print_rtl (stderr, expected);
+ fprintf (stderr, "\n actual: ");
+ print_rtl (stderr, actual);
+ fprintf (stderr, "\n");
+ abort ();
+ }
+}
+
/* Compare rtx EXPECTED and ACTUAL by pointer equality, calling
::selftest::pass if they are equal, aborting if they are non-equal.
LOC is the effective location of the assertion, MSG describes it. */
Index: gcc/selftest-rtl.h
===================================================================
--- gcc/selftest-rtl.h 2017-10-23 11:40:11.485292126 +0100
+++ gcc/selftest-rtl.h 2017-10-23 11:41:36.307050364 +0100
@@ -47,6 +47,15 @@ #define ASSERT_RTL_DUMP_EQ_WITH_REUSE(EX
assert_rtl_dump_eq (SELFTEST_LOCATION, (EXPECTED_DUMP), (RTX), \
(REUSE_MANAGER))
+#define ASSERT_RTX_EQ(EXPECTED, ACTUAL) \
+ SELFTEST_BEGIN_STMT \
+ const char *desc = "ASSERT_RTX_EQ (" #EXPECTED ", " #ACTUAL ")"; \
+ ::selftest::assert_rtx_eq_at (SELFTEST_LOCATION, desc, (EXPECTED), \
+ (ACTUAL)); \
+ SELFTEST_END_STMT
+
+extern void assert_rtx_eq_at (const location &, const char *, rtx, rtx);
+
/* Evaluate rtx EXPECTED and ACTUAL and compare them with ==
(i.e. pointer equality), calling ::selftest::pass if they are
equal, aborting if they are non-equal. */
Index: gcc/selftest.h
===================================================================
--- gcc/selftest.h 2017-10-23 11:41:25.513859990 +0100
+++ gcc/selftest.h 2017-10-23 11:41:36.308050364 +0100
@@ -198,6 +198,7 @@ extern void tree_cfg_c_tests ();
extern void vec_c_tests ();
extern void wide_int_cc_tests ();
extern void predict_c_tests ();
+extern void simplify_rtx_c_tests ();
extern int num_passes;
Index: gcc/selftest-run-tests.c
===================================================================
--- gcc/selftest-run-tests.c 2017-10-23 11:41:25.872704926 +0100
+++ gcc/selftest-run-tests.c 2017-10-23 11:41:36.308050364 +0100
@@ -94,6 +94,7 @@ selftest::run_tests ()
store_merging_c_tests ();
predict_c_tests ();
+ simplify_rtx_c_tests ();
/* Run any lang-specific selftests. */
lang_hooks.run_lang_selftests ();
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c 2017-10-23 11:41:32.370050264 +0100
+++ gcc/simplify-rtx.c 2017-10-23 11:41:36.309050364 +0100
@@ -33,6 +33,8 @@ Software Foundation; either version 3, o
#include "diagnostic-core.h"
#include "varasm.h"
#include "flags.h"
+#include "selftest.h"
+#include "selftest-rtl.h"
/* Simplification and canonicalization of RTL. */
@@ -925,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
{
enum rtx_code reversed;
- rtx temp;
+ rtx temp, elt;
scalar_int_mode inner, int_mode, op_mode, op0_mode;
switch (code)
@@ -1681,6 +1683,28 @@ simplify_unary_operation_1 (enum rtx_cod
break;
}
+ if (VECTOR_MODE_P (mode) && vec_duplicate_p (op, &elt))
+ {
+ /* Try applying the operator to ELT and see if that simplifies.
+ We can duplicate the result if so.
+
+ The reason we don't use simplify_gen_unary is that it isn't
+ necessarily a win to convert things like:
+
+ (neg:V (vec_duplicate:V (reg:S R)))
+
+ to:
+
+ (vec_duplicate:V (neg:S (reg:S R)))
+
+ The first might be done entirely in vector registers while the
+ second might need a move between register files. */
+ temp = simplify_unary_operation (code, GET_MODE_INNER (mode),
+ elt, GET_MODE_INNER (GET_MODE (op)));
+ if (temp)
+ return gen_vec_duplicate (mode, temp);
+ }
+
return 0;
}
@@ -2138,7 +2162,7 @@ simplify_binary_operation (enum rtx_code
simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
rtx op0, rtx op1, rtx trueop0, rtx trueop1)
{
- rtx tem, reversed, opleft, opright;
+ rtx tem, reversed, opleft, opright, elt0, elt1;
HOST_WIDE_INT val;
unsigned int width = GET_MODE_PRECISION (mode);
scalar_int_mode int_mode, inner_mode;
@@ -3480,6 +3504,9 @@ simplify_binary_operation_1 (enum rtx_co
gcc_assert (XVECLEN (trueop1, 0) == 1);
gcc_assert (CONST_INT_P (XVECEXP (trueop1, 0, 0)));
+ if (vec_duplicate_p (trueop0, &elt0))
+ return elt0;
+
if (GET_CODE (trueop0) == CONST_VECTOR)
return CONST_VECTOR_ELT (trueop0, INTVAL (XVECEXP
(trueop1, 0, 0)));
@@ -3562,9 +3589,6 @@ simplify_binary_operation_1 (enum rtx_co
tmp_op, gen_rtx_PARALLEL (VOIDmode, vec));
return tmp;
}
- if (GET_CODE (trueop0) == VEC_DUPLICATE
- && GET_MODE (XEXP (trueop0, 0)) == mode)
- return XEXP (trueop0, 0);
}
else
{
@@ -3573,6 +3597,11 @@ simplify_binary_operation_1 (enum rtx_co
== GET_MODE_INNER (GET_MODE (trueop0)));
gcc_assert (GET_CODE (trueop1) == PARALLEL);
+ if (vec_duplicate_p (trueop0, &elt0))
+ /* It doesn't matter which elements are selected by trueop1,
+ because they are all the same. */
+ return gen_vec_duplicate (mode, elt0);
+
if (GET_CODE (trueop0) == CONST_VECTOR)
{
int elt_size = GET_MODE_UNIT_SIZE (mode);
@@ -3873,6 +3902,32 @@ simplify_binary_operation_1 (enum rtx_co
gcc_unreachable ();
}
+ if (mode == GET_MODE (op0)
+ && mode == GET_MODE (op1)
+ && vec_duplicate_p (op0, &elt0)
+ && vec_duplicate_p (op1, &elt1))
+ {
+ /* Try applying the operator to ELT and see if that simplifies.
+ We can duplicate the result if so.
+
+ The reason we don't use simplify_gen_binary is that it isn't
+ necessarily a win to convert things like:
+
+ (plus:V (vec_duplicate:V (reg:S R1))
+ (vec_duplicate:V (reg:S R2)))
+
+ to:
+
+ (vec_duplicate:V (plus:S (reg:S R1) (reg:S R2)))
+
+ The first might be done entirely in vector registers while the
+ second might need a move between register files. */
+ tem = simplify_binary_operation (code, GET_MODE_INNER (mode),
+ elt0, elt1);
+ if (tem)
+ return gen_vec_duplicate (mode, tem);
+ }
+
return 0;
}
@@ -6021,6 +6076,20 @@ simplify_subreg (machine_mode outermode,
if (outermode == innermode && !byte)
return op;
+ if (byte % GET_MODE_UNIT_SIZE (innermode) == 0)
+ {
+ rtx elt;
+
+ if (VECTOR_MODE_P (outermode)
+ && GET_MODE_INNER (outermode) == GET_MODE_INNER (innermode)
+ && vec_duplicate_p (op, &elt))
+ return gen_vec_duplicate (outermode, elt);
+
+ if (outermode == GET_MODE_INNER (innermode)
+ && vec_duplicate_p (op, &elt))
+ return elt;
+ }
+
if (CONST_SCALAR_INT_P (op)
|| CONST_DOUBLE_AS_FLOAT_P (op)
|| GET_CODE (op) == CONST_FIXED
@@ -6326,3 +6395,125 @@ simplify_rtx (const_rtx x)
}
return NULL;
}
+
+#if CHECKING_P
+
+namespace selftest {
+
+/* Make a unique pseudo REG of mode MODE for use by selftests. */
+
+static rtx
+make_test_reg (machine_mode mode)
+{
+ static int test_reg_num = LAST_VIRTUAL_REGISTER + 1;
+
+ return gen_rtx_REG (mode, test_reg_num++);
+}
+
+/* Test vector simplifications involving VEC_DUPLICATE in which the
+ operands and result have vector mode MODE. SCALAR_REG is a pseudo
+ register that holds one element of MODE. */
+
+static void
+test_vector_ops_duplicate (machine_mode mode, rtx scalar_reg)
+{
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
+ unsigned int nunits = GET_MODE_NUNITS (mode);
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ {
+ /* Test some simple unary cases with VEC_DUPLICATE arguments. */
+ rtx not_scalar_reg = gen_rtx_NOT (inner_mode, scalar_reg);
+ rtx duplicate_not = gen_rtx_VEC_DUPLICATE (mode, not_scalar_reg);
+ ASSERT_RTX_EQ (duplicate,
+ simplify_unary_operation (NOT, mode,
+ duplicate_not, mode));
+
+ rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
+ rtx duplicate_neg = gen_rtx_VEC_DUPLICATE (mode, neg_scalar_reg);
+ ASSERT_RTX_EQ (duplicate,
+ simplify_unary_operation (NEG, mode,
+ duplicate_neg, mode));
+
+ /* Test some simple binary cases with VEC_DUPLICATE arguments. */
+ ASSERT_RTX_EQ (duplicate,
+ simplify_binary_operation (PLUS, mode, duplicate,
+ CONST0_RTX (mode)));
+
+ ASSERT_RTX_EQ (duplicate,
+ simplify_binary_operation (MINUS, mode, duplicate,
+ CONST0_RTX (mode)));
+
+ ASSERT_RTX_PTR_EQ (CONST0_RTX (mode),
+ simplify_binary_operation (MINUS, mode, duplicate,
+ duplicate));
+ }
+
+ /* Test a scalar VEC_SELECT of a VEC_DUPLICATE. */
+ rtx zero_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, const0_rtx));
+ ASSERT_RTX_PTR_EQ (scalar_reg,
+ simplify_binary_operation (VEC_SELECT, inner_mode,
+ duplicate, zero_par));
+
+ /* And again with the final element. */
+ rtx last_index = gen_int_mode (GET_MODE_NUNITS (mode) - 1, word_mode);
+ rtx last_par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (1, last_index));
+ ASSERT_RTX_PTR_EQ (scalar_reg,
+ simplify_binary_operation (VEC_SELECT, inner_mode,
+ duplicate, last_par));
+
+ /* Test a scalar subreg of a VEC_DUPLICATE. */
+ unsigned int offset = subreg_lowpart_offset (inner_mode, mode);
+ ASSERT_RTX_EQ (scalar_reg,
+ simplify_gen_subreg (inner_mode, duplicate,
+ mode, offset));
+
+ machine_mode narrower_mode;
+ if (nunits > 2
+ && mode_for_vector (inner_mode, 2).exists (&narrower_mode)
+ && VECTOR_MODE_P (narrower_mode))
+ {
+ /* Test VEC_SELECT of a vector. */
+ rtx vec_par
+ = gen_rtx_PARALLEL (VOIDmode, gen_rtvec (2, const1_rtx, const0_rtx));
+ rtx narrower_duplicate
+ = gen_rtx_VEC_DUPLICATE (narrower_mode, scalar_reg);
+ ASSERT_RTX_EQ (narrower_duplicate,
+ simplify_binary_operation (VEC_SELECT, narrower_mode,
+ duplicate, vec_par));
+
+ /* Test a vector subreg of a VEC_DUPLICATE. */
+ unsigned int offset = subreg_lowpart_offset (narrower_mode, mode);
+ ASSERT_RTX_EQ (narrower_duplicate,
+ simplify_gen_subreg (narrower_mode, duplicate,
+ mode, offset));
+ }
+}
+
+/* Verify some simplifications involving vectors. */
+
+static void
+test_vector_ops ()
+{
+ for (unsigned int i = 0; i < NUM_MACHINE_MODES; ++i)
+ {
+ machine_mode mode = (machine_mode) i;
+ if (VECTOR_MODE_P (mode))
+ {
+ rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
+ test_vector_ops_duplicate (mode, scalar_reg);
+ }
+ }
+}
+
+/* Run all of the selftests within this file. */
+
+void
+simplify_rtx_c_tests ()
+{
+ test_vector_ops ();
+}
+
+} // namespace selftest
+
+#endif /* CHECKING_P */
^ permalink raw reply [flat|nested] 90+ messages in thread
* [04/nn] Add a VEC_SERIES rtl code
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (2 preceding siblings ...)
2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
@ 2017-10-23 11:20 ` Richard Sandiford
2017-10-26 11:49 ` Richard Biener
2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
` (17 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:20 UTC (permalink / raw)
To: gcc-patches
This patch adds an rtl representation of a vector linear series
of the form:
a[I] = BASE + I * STEP
Like vec_duplicate;
- the new rtx can be used for both constant and non-constant vectors
- when used for constant vectors it is wrapped in a (const ...)
- the constant form is only used for variable-length vectors;
fixed-length vectors still use CONST_VECTOR
At the moment the code is restricted to integer elements, to avoid
concerns over floating-point rounding.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi (vec_series): Document.
(const): Say that the operand can be a vec_series.
* rtl.def (VEC_SERIES): New rtx code.
* rtl.h (const_vec_series_p_1): Declare.
(const_vec_series_p): New function.
* emit-rtl.h (gen_const_vec_series): Declare.
(gen_vec_series): Likewise.
* emit-rtl.c (const_vec_series_p_1, gen_const_vec_series)
(gen_vec_series): Likewise.
* optabs.c (expand_mult_highpart): Use gen_const_vec_series.
* simplify-rtx.c (simplify_unary_operation): Handle negations
of vector series.
(simplify_binary_operation_series): New function.
(simplify_binary_operation_1): Use it. Handle VEC_SERIES.
(test_vector_ops_series): New function.
(test_vector_ops): Call it.
* config/powerpcspe/altivec.md (altivec_lvsl): Use
gen_const_vec_series.
(altivec_lvsr): Likewise.
* config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise.
Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi 2017-10-23 11:41:39.185050437 +0100
+++ gcc/doc/rtl.texi 2017-10-23 11:41:41.547050496 +0100
@@ -1677,7 +1677,8 @@ are target-specific and typically repres
operator. @var{m} should be a valid address mode.
The second use of @code{const} is to wrap a vector operation.
-In this case @var{exp} must be a @code{vec_duplicate} expression.
+In this case @var{exp} must be a @code{vec_duplicate} or
+@code{vec_series} expression.
@findex high
@item (high:@var{m} @var{exp})
@@ -2722,6 +2723,10 @@ the same submodes as the input vector mo
number of output parts must be an integer multiple of the number of input
parts.
+@findex vec_series
+@item (vec_series:@var{m} @var{base} @var{step})
+This operation creates a vector in which element @var{i} is equal to
+@samp{@var{base} + @var{i}*@var{step}}. @var{m} must be a vector integer mode.
@end table
@node Conversions
Index: gcc/rtl.def
===================================================================
--- gcc/rtl.def 2017-10-23 11:40:11.378243915 +0100
+++ gcc/rtl.def 2017-10-23 11:41:41.549050496 +0100
@@ -710,6 +710,11 @@ DEF_RTL_EXPR(VEC_CONCAT, "vec_concat", "
an integer multiple of the number of input parts. */
DEF_RTL_EXPR(VEC_DUPLICATE, "vec_duplicate", "e", RTX_UNARY)
+/* Creation of a vector in which element I has the value BASE + I * STEP,
+ where BASE is the first operand and STEP is the second. The result
+ must have a vector integer mode. */
+DEF_RTL_EXPR(VEC_SERIES, "vec_series", "ee", RTX_BIN_ARITH)
+
/* Addition with signed saturation */
DEF_RTL_EXPR(SS_PLUS, "ss_plus", "ee", RTX_COMM_ARITH)
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-23 11:41:39.188050437 +0100
+++ gcc/rtl.h 2017-10-23 11:41:41.549050496 +0100
@@ -2816,6 +2816,51 @@ unwrap_const_vec_duplicate (T x)
return x;
}
+/* In emit-rtl.c. */
+extern bool const_vec_series_p_1 (const_rtx, rtx *, rtx *);
+
+/* Return true if X is a constant vector that contains a linear series
+ of the form:
+
+ { B, B + S, B + 2 * S, B + 3 * S, ... }
+
+ for a nonzero S. Store B and S in *BASE_OUT and *STEP_OUT on sucess. */
+
+inline bool
+const_vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
+{
+ if (GET_CODE (x) == CONST_VECTOR
+ && GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
+ return const_vec_series_p_1 (x, base_out, step_out);
+ if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_SERIES)
+ {
+ *base_out = XEXP (XEXP (x, 0), 0);
+ *step_out = XEXP (XEXP (x, 0), 1);
+ return true;
+ }
+ return false;
+}
+
+/* Return true if X is a vector that contains a linear series of the
+ form:
+
+ { B, B + S, B + 2 * S, B + 3 * S, ... }
+
+ where B and S are constant or nonconstant. Store B and S in
+ *BASE_OUT and *STEP_OUT on sucess. */
+
+inline bool
+vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
+{
+ if (GET_CODE (x) == VEC_SERIES)
+ {
+ *base_out = XEXP (x, 0);
+ *step_out = XEXP (x, 1);
+ return true;
+ }
+ return const_vec_series_p (x, base_out, step_out);
+}
+
/* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X. */
inline scalar_int_mode
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h 2017-10-23 11:41:32.369050264 +0100
+++ gcc/emit-rtl.h 2017-10-23 11:41:41.548050496 +0100
@@ -441,6 +441,9 @@ get_max_uid (void)
extern rtx gen_const_vec_duplicate (machine_mode, rtx);
extern rtx gen_vec_duplicate (machine_mode, rtx);
+extern rtx gen_const_vec_series (machine_mode, rtx, rtx);
+extern rtx gen_vec_series (machine_mode, rtx, rtx);
+
extern void set_decl_incoming_rtl (tree, rtx, bool);
/* Return a memory reference like MEMREF, but with its mode changed
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-10-23 11:41:39.186050437 +0100
+++ gcc/emit-rtl.c 2017-10-23 11:41:41.548050496 +0100
@@ -5796,6 +5796,69 @@ gen_vec_duplicate (machine_mode mode, rt
return gen_rtx_VEC_DUPLICATE (mode, x);
}
+/* A subroutine of const_vec_series_p that handles the case in which
+ X is known to be an integer CONST_VECTOR. */
+
+bool
+const_vec_series_p_1 (const_rtx x, rtx *base_out, rtx *step_out)
+{
+ unsigned int nelts = CONST_VECTOR_NUNITS (x);
+ if (nelts < 2)
+ return false;
+
+ scalar_mode inner = GET_MODE_INNER (GET_MODE (x));
+ rtx base = CONST_VECTOR_ELT (x, 0);
+ rtx step = simplify_binary_operation (MINUS, inner,
+ CONST_VECTOR_ELT (x, 1), base);
+ if (rtx_equal_p (step, CONST0_RTX (inner)))
+ return false;
+
+ for (unsigned int i = 2; i < nelts; ++i)
+ {
+ rtx diff = simplify_binary_operation (MINUS, inner,
+ CONST_VECTOR_ELT (x, i),
+ CONST_VECTOR_ELT (x, i - 1));
+ if (!rtx_equal_p (step, diff))
+ return false;
+ }
+
+ *base_out = base;
+ *step_out = step;
+ return true;
+}
+
+/* Generate a vector constant of mode MODE in which element I has
+ the value BASE + I * STEP. */
+
+rtx
+gen_const_vec_series (machine_mode mode, rtx base, rtx step)
+{
+ gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
+
+ int nunits = GET_MODE_NUNITS (mode);
+ rtvec v = rtvec_alloc (nunits);
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ RTVEC_ELT (v, 0) = base;
+ for (int i = 1; i < nunits; ++i)
+ RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
+ RTVEC_ELT (v, i - 1), step);
+ return gen_rtx_raw_CONST_VECTOR (mode, v);
+}
+
+/* Generate a vector of mode MODE in which element I has the value
+ BASE + I * STEP. The result will be a constant if BASE and STEP
+ are both constants. */
+
+rtx
+gen_vec_series (machine_mode mode, rtx base, rtx step)
+{
+ if (step == const0_rtx)
+ return gen_vec_duplicate (mode, base);
+ if (CONSTANT_P (base) && CONSTANT_P (step))
+ return gen_const_vec_series (mode, base, step);
+ return gen_rtx_VEC_SERIES (mode, base, step);
+}
+
/* Generate a new vector constant for mode MODE and constant value
CONSTANT. */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:41:32.369050264 +0100
+++ gcc/optabs.c 2017-10-23 11:41:41.549050496 +0100
@@ -5784,13 +5784,13 @@ expand_mult_highpart (machine_mode mode,
for (i = 0; i < nunits; ++i)
RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
+ ((i & 1) ? nunits : 0));
+ perm = gen_rtx_CONST_VECTOR (mode, v);
}
else
{
- for (i = 0; i < nunits; ++i)
- RTVEC_ELT (v, i) = GEN_INT (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
+ int base = BYTES_BIG_ENDIAN ? 0 : 1;
+ perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
}
- perm = gen_rtx_CONST_VECTOR (mode, v);
return expand_vec_perm (mode, m1, m2, perm, target);
}
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c 2017-10-23 11:41:36.309050364 +0100
+++ gcc/simplify-rtx.c 2017-10-23 11:41:41.550050496 +0100
@@ -927,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
{
enum rtx_code reversed;
- rtx temp, elt;
+ rtx temp, elt, base, step;
scalar_int_mode inner, int_mode, op_mode, op0_mode;
switch (code)
@@ -1185,6 +1185,22 @@ simplify_unary_operation_1 (enum rtx_cod
return simplify_gen_unary (TRUNCATE, int_mode, temp, inner);
}
}
+
+ if (vec_series_p (op, &base, &step))
+ {
+ /* Only create a new series if we can simplify both parts. In other
+ cases this isn't really a simplification, and it's not necessarily
+ a win to replace a vector operation with a scalar operation. */
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ base = simplify_unary_operation (NEG, inner_mode, base, inner_mode);
+ if (base)
+ {
+ step = simplify_unary_operation (NEG, inner_mode,
+ step, inner_mode);
+ if (step)
+ return gen_vec_series (mode, base, step);
+ }
+ }
break;
case TRUNCATE:
@@ -2153,6 +2169,46 @@ simplify_binary_operation (enum rtx_code
return NULL_RTX;
}
+/* Subroutine of simplify_binary_operation_1 that looks for cases in
+ which OP0 and OP1 are both vector series or vector duplicates
+ (which are really just series with a step of 0). If so, try to
+ form a new series by applying CODE to the bases and to the steps.
+ Return null if no simplification is possible.
+
+ MODE is the mode of the operation and is known to be a vector
+ integer mode. */
+
+static rtx
+simplify_binary_operation_series (rtx_code code, machine_mode mode,
+ rtx op0, rtx op1)
+{
+ rtx base0, step0;
+ if (vec_duplicate_p (op0, &base0))
+ step0 = const0_rtx;
+ else if (!vec_series_p (op0, &base0, &step0))
+ return NULL_RTX;
+
+ rtx base1, step1;
+ if (vec_duplicate_p (op1, &base1))
+ step1 = const0_rtx;
+ else if (!vec_series_p (op1, &base1, &step1))
+ return NULL_RTX;
+
+ /* Only create a new series if we can simplify both parts. In other
+ cases this isn't really a simplification, and it's not necessarily
+ a win to replace a vector operation with a scalar operation. */
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ rtx new_base = simplify_binary_operation (code, inner_mode, base0, base1);
+ if (!new_base)
+ return NULL_RTX;
+
+ rtx new_step = simplify_binary_operation (code, inner_mode, step0, step1);
+ if (!new_step)
+ return NULL_RTX;
+
+ return gen_vec_series (mode, new_base, new_step);
+}
+
/* Subroutine of simplify_binary_operation. Simplify a binary operation
CODE with result mode MODE, operating on OP0 and OP1. If OP0 and/or
OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
@@ -2333,6 +2389,14 @@ simplify_binary_operation_1 (enum rtx_co
if (tem)
return tem;
}
+
+ /* Handle vector series. */
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ {
+ tem = simplify_binary_operation_series (code, mode, op0, op1);
+ if (tem)
+ return tem;
+ }
break;
case COMPARE:
@@ -2544,6 +2608,14 @@ simplify_binary_operation_1 (enum rtx_co
|| plus_minus_operand_p (op1))
&& (tem = simplify_plus_minus (code, mode, op0, op1)) != 0)
return tem;
+
+ /* Handle vector series. */
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
+ {
+ tem = simplify_binary_operation_series (code, mode, op0, op1);
+ if (tem)
+ return tem;
+ }
break;
case MULT:
@@ -3495,6 +3567,11 @@ simplify_binary_operation_1 (enum rtx_co
/* ??? There are simplifications that can be done. */
return 0;
+ case VEC_SERIES:
+ if (op1 == CONST0_RTX (GET_MODE_INNER (mode)))
+ return gen_vec_duplicate (mode, op0);
+ return 0;
+
case VEC_SELECT:
if (!VECTOR_MODE_P (mode))
{
@@ -6490,6 +6567,60 @@ test_vector_ops_duplicate (machine_mode
}
}
+/* Test vector simplifications involving VEC_SERIES in which the
+ operands and result have vector mode MODE. SCALAR_REG is a pseudo
+ register that holds one element of MODE. */
+
+static void
+test_vector_ops_series (machine_mode mode, rtx scalar_reg)
+{
+ /* Test unary cases with VEC_SERIES arguments. */
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
+ rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
+ rtx series_0_r = gen_rtx_VEC_SERIES (mode, const0_rtx, scalar_reg);
+ rtx series_0_nr = gen_rtx_VEC_SERIES (mode, const0_rtx, neg_scalar_reg);
+ rtx series_nr_1 = gen_rtx_VEC_SERIES (mode, neg_scalar_reg, const1_rtx);
+ rtx series_r_m1 = gen_rtx_VEC_SERIES (mode, scalar_reg, constm1_rtx);
+ rtx series_r_r = gen_rtx_VEC_SERIES (mode, scalar_reg, scalar_reg);
+ rtx series_nr_nr = gen_rtx_VEC_SERIES (mode, neg_scalar_reg,
+ neg_scalar_reg);
+ ASSERT_RTX_EQ (series_0_r,
+ simplify_unary_operation (NEG, mode, series_0_nr, mode));
+ ASSERT_RTX_EQ (series_r_m1,
+ simplify_unary_operation (NEG, mode, series_nr_1, mode));
+ ASSERT_RTX_EQ (series_r_r,
+ simplify_unary_operation (NEG, mode, series_nr_nr, mode));
+
+ /* Test that a VEC_SERIES with a zero step is simplified away. */
+ ASSERT_RTX_EQ (duplicate,
+ simplify_binary_operation (VEC_SERIES, mode,
+ scalar_reg, const0_rtx));
+
+ /* Test PLUS and MINUS with VEC_SERIES. */
+ rtx series_0_1 = gen_const_vec_series (mode, const0_rtx, const1_rtx);
+ rtx series_0_m1 = gen_const_vec_series (mode, const0_rtx, constm1_rtx);
+ rtx series_r_1 = gen_rtx_VEC_SERIES (mode, scalar_reg, const1_rtx);
+ ASSERT_RTX_EQ (series_r_r,
+ simplify_binary_operation (PLUS, mode, series_0_r,
+ duplicate));
+ ASSERT_RTX_EQ (series_r_1,
+ simplify_binary_operation (PLUS, mode, duplicate,
+ series_0_1));
+ ASSERT_RTX_EQ (series_r_m1,
+ simplify_binary_operation (PLUS, mode, duplicate,
+ series_0_m1));
+ ASSERT_RTX_EQ (series_0_r,
+ simplify_binary_operation (MINUS, mode, series_r_r,
+ duplicate));
+ ASSERT_RTX_EQ (series_r_m1,
+ simplify_binary_operation (MINUS, mode, duplicate,
+ series_0_1));
+ ASSERT_RTX_EQ (series_r_1,
+ simplify_binary_operation (MINUS, mode, duplicate,
+ series_0_m1));
+}
+
/* Verify some simplifications involving vectors. */
static void
@@ -6502,6 +6633,9 @@ test_vector_ops ()
{
rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
test_vector_ops_duplicate (mode, scalar_reg);
+ if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
+ && GET_MODE_NUNITS (mode) > 2)
+ test_vector_ops_series (mode, scalar_reg);
}
}
}
Index: gcc/config/powerpcspe/altivec.md
===================================================================
--- gcc/config/powerpcspe/altivec.md 2017-10-23 11:41:32.366050264 +0100
+++ gcc/config/powerpcspe/altivec.md 2017-10-23 11:41:41.546050496 +0100
@@ -2456,13 +2456,10 @@ (define_expand "altivec_lvsl"
emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
else
{
- int i;
- rtx mask, perm[16], constv, vperm;
+ rtx mask, constv, vperm;
mask = gen_reg_rtx (V16QImode);
emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
- for (i = 0; i < 16; ++i)
- perm[i] = GEN_INT (i);
- constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+ constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
constv = force_reg (V16QImode, constv);
vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
UNSPEC_VPERM);
@@ -2488,13 +2485,10 @@ (define_expand "altivec_lvsr"
emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
else
{
- int i;
- rtx mask, perm[16], constv, vperm;
+ rtx mask, constv, vperm;
mask = gen_reg_rtx (V16QImode);
emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
- for (i = 0; i < 16; ++i)
- perm[i] = GEN_INT (i);
- constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+ constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
constv = force_reg (V16QImode, constv);
vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
UNSPEC_VPERM);
Index: gcc/config/rs6000/altivec.md
===================================================================
--- gcc/config/rs6000/altivec.md 2017-10-23 11:41:32.366050264 +0100
+++ gcc/config/rs6000/altivec.md 2017-10-23 11:41:41.547050496 +0100
@@ -2573,13 +2573,10 @@ (define_expand "altivec_lvsl"
emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
else
{
- int i;
- rtx mask, perm[16], constv, vperm;
+ rtx mask, constv, vperm;
mask = gen_reg_rtx (V16QImode);
emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
- for (i = 0; i < 16; ++i)
- perm[i] = GEN_INT (i);
- constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+ constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
constv = force_reg (V16QImode, constv);
vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
UNSPEC_VPERM);
@@ -2614,13 +2611,10 @@ (define_expand "altivec_lvsr"
emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
else
{
- int i;
- rtx mask, perm[16], constv, vperm;
+ rtx mask, constv, vperm;
mask = gen_reg_rtx (V16QImode);
emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
- for (i = 0; i < 16; ++i)
- perm[i] = GEN_INT (i);
- constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
+ constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
constv = force_reg (V16QImode, constv);
vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
UNSPEC_VPERM);
^ permalink raw reply [flat|nested] 90+ messages in thread
* [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (3 preceding siblings ...)
2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
@ 2017-10-23 11:21 ` Richard Sandiford
2017-10-26 11:53 ` Richard Biener
2017-12-15 0:29 ` Richard Sandiford
2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} " Richard Sandiford
` (16 subsequent siblings)
21 siblings, 2 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:21 UTC (permalink / raw)
To: gcc-patches
SVE needs a way of broadcasting a scalar to a variable-length vector.
This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree
equivalent of the existing rtl code VEC_DUPLICATE.
Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
to mark constant nodes, but in response to last year's RFC, Richard B.
suggested it would be better to have separate codes for the constant
and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated
as a normal unary operation and avoids the previous need for treating
it as a GIMPLE_SINGLE_RHS.
It might make sense to use VEC_DUPLICATE_CST for all duplicated
vector constants, since it's a bit more compact than VECTOR_CST
in that case, and is potentially more efficient to process.
However, the nice thing about keeping it restricted to variable-length
vectors is that there is then no need to handle combinations of
VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
VECTOR_CST or never use it.
The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hawyard@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
(VEC_COND_EXPR): Add missing @tindex.
* doc/md.texi (vec_duplicate@var{m}): Document.
* tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
* tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
are used for VEC_DUPLICATE_CST as well.
(tree_vector): Access base.n.nelts directly.
* tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
valid codes.
(VEC_DUPLICATE_CST_ELT): New macro.
(build_vec_duplicate_cst): Declare.
* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
(integer_zerop, integer_onep, integer_all_onesp, integer_truep)
(real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
(walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
(build_vec_duplicate_cst): New function.
(uniform_vector_p): Handle the new codes.
(test_vec_duplicate_predicates_int): New function.
(test_vec_duplicate_predicates_float): Likewise.
(test_vec_duplicate_predicates): Likewise.
(tree_c_tests): Call test_vec_duplicate_predicates.
* cfgexpand.c (expand_debug_expr): Handle the new codes.
* tree-pretty-print.c (dump_generic_node): Likewise.
* dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
* gimple-expr.h (is_gimple_constant): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::is_constant): Likewise.
* graphite-scop-detection.c (scan_tree_for_params): Likewise.
* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
(func_checker::compare_operand): Likewise.
* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
* match.pd (negate_expr_p): Likewise.
* print-tree.c (print_node): Likewise.
* tree-chkp.c (chkp_find_bounds_1): Likewise.
* tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
* tree-ssa-loop.c (for_each_index): Likewise.
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
(ao_ref_init_from_vn_reference): Likewise.
* tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
* varasm.c (const_hash_1, compare_constant): Likewise.
* fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
(fold_convert_const, operand_equal_p, fold_view_convert_expr)
(exact_inverse, fold_checksum_tree): Likewise.
(const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant.
(test_vec_duplicate_folding): New function.
(fold_const_c_tests): Call it.
* optabs.def (vec_duplicate_optab): New optab.
* optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
* optabs.h (expand_vector_broadcast): Declare.
* optabs.c (expand_vector_broadcast): Make non-static. Try using
vec_duplicate_optab.
* expr.c (store_constructor): Try using vec_duplicate_optab for
uniform vectors.
(const_vector_element): New function, split out from...
(const_vector_from_tree): ...here.
(expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
(expand_expr_real_1): Handle VEC_DUPLICATE_CST.
* internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
instead of checking for VECTOR_CST.
* tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
(verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
* tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.
Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi 2017-10-23 11:38:53.934094740 +0100
+++ gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100
@@ -1036,6 +1036,7 @@ As this example indicates, the operands
@tindex FIXED_CST
@tindex COMPLEX_CST
@tindex VECTOR_CST
+@tindex VEC_DUPLICATE_CST
@tindex STRING_CST
@findex TREE_STRING_LENGTH
@findex TREE_STRING_POINTER
@@ -1089,6 +1090,14 @@ constant nodes. Each individual constan
double constant node. The first operand is a @code{TREE_LIST} of the
constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
+@item VEC_DUPLICATE_CST
+These nodes represent a vector constant in which every element has the
+same scalar value. At present only variable-length vectors use
+@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
+instead. The scalar element value is given by
+@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
+element of a @code{VECTOR_CST}.
+
@item STRING_CST
These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
returns the length of the string, as an @code{int}. The
@@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
@node Vectors
@subsection Vectors
+@tindex VEC_DUPLICATE_EXPR
@tindex VEC_LSHIFT_EXPR
@tindex VEC_RSHIFT_EXPR
@tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
@tindex VEC_PACK_TRUNC_EXPR
@tindex VEC_PACK_SAT_EXPR
@tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_COND_EXPR
@tindex SAD_EXPR
@table @code
+@item VEC_DUPLICATE_EXPR
+This node has a single operand and represents a vector in which every
+element is equal to that operand.
+
@item VEC_LSHIFT_EXPR
@itemx VEC_RSHIFT_EXPR
These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi 2017-10-23 11:41:22.189466342 +0100
+++ gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100
@@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
the vector mode @var{m}, or a vector mode with the same element mode and
smaller number of elements.
+@cindex @code{vec_duplicate@var{m}} instruction pattern
+@item @samp{vec_duplicate@var{m}}
+Initialize vector output operand 0 so that each element has the value given
+by scalar input operand 1. The vector has mode @var{m} and the scalar has
+the mode appropriate for one element of @var{m}.
+
+This pattern only handles duplicates of non-constant inputs. Constant
+vectors go through the @code{mov@var{m}} pattern instead.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
@item @samp{vec_cmp@var{m}@var{n}}
Output a vector comparison. Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree.def 2017-10-23 11:41:51.774917721 +0100
@@ -304,6 +304,10 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
/* Contents are in VECTOR_CST_ELTS field. */
DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
+/* Represents a vector constant in which every element is equal to
+ VEC_DUPLICATE_CST_ELT. */
+DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
+
/* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
@@ -534,6 +538,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
1 and 2 are NULL. The operands are then taken from the cfg edges. */
DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
+/* Represents a vector in which every element is equal to operand 0. */
+DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+
/* Vector conditional expression. It is like COND_EXPR, but with
vector operands.
Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h 2017-10-23 11:41:25.862065318 +0100
+++ gcc/tree-core.h 2017-10-23 11:41:51.771059237 +0100
@@ -975,7 +975,8 @@ struct GTY(()) tree_base {
/* VEC length. This field is only used with TREE_VEC. */
int length;
- /* Number of elements. This field is only used with VECTOR_CST. */
+ /* Number of elements. This field is only used with VECTOR_CST
+ and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */
unsigned int nelts;
/* SSA version number. This field is only used with SSA_NAME. */
@@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
public_flag:
TREE_OVERFLOW in
- INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
+ INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
TREE_PUBLIC in
VAR_DECL, FUNCTION_DECL
@@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
struct GTY(()) tree_vector {
struct tree_typed typed;
- tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
+ tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
};
struct GTY(()) tree_identifier {
Index: gcc/tree.h
===================================================================
--- gcc/tree.h 2017-10-23 11:41:23.517482774 +0100
+++ gcc/tree.h 2017-10-23 11:41:51.775882341 +0100
@@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
#define TYPE_REF_CAN_ALIAS_ALL(NODE) \
(PTR_OR_REF_CHECK (NODE)->base.static_flag)
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
- there was an overflow in folding. */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
+ this means there was an overflow in folding. */
#define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
@@ -1030,6 +1030,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
#define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
#define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
+/* In a VEC_DUPLICATE_CST node. */
+#define VEC_DUPLICATE_CST_ELT(NODE) \
+ (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
+
/* Define fields and accessors for some special-purpose tree nodes. */
#define IDENTIFIER_LENGTH(NODE) \
@@ -4025,6 +4029,7 @@ extern tree build_int_cst (tree, HOST_WI
extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
extern tree build_int_cst_type (tree, HOST_WIDE_INT);
extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
+extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
extern tree build_vector_from_val (tree, tree);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c 2017-10-23 11:41:23.515548300 +0100
+++ gcc/tree.c 2017-10-23 11:41:51.774917721 +0100
@@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
case FIXED_CST: return TS_FIXED_CST;
case COMPLEX_CST: return TS_COMPLEX;
case VECTOR_CST: return TS_VECTOR;
+ case VEC_DUPLICATE_CST: return TS_VECTOR;
case STRING_CST: return TS_STRING;
/* tcc_exceptional cases. */
case ERROR_MARK: return TS_COMMON;
@@ -816,6 +817,7 @@ tree_code_size (enum tree_code code)
case FIXED_CST: return sizeof (struct tree_fixed_cst);
case COMPLEX_CST: return sizeof (struct tree_complex);
case VECTOR_CST: return sizeof (struct tree_vector);
+ case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
case STRING_CST: gcc_unreachable ();
default:
return lang_hooks.tree_size (code);
@@ -875,6 +877,9 @@ tree_size (const_tree node)
return (sizeof (struct tree_vector)
+ (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
+ case VEC_DUPLICATE_CST:
+ return sizeof (struct tree_vector);
+
case STRING_CST:
return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
@@ -1682,6 +1687,30 @@ cst_and_fits_in_hwi (const_tree x)
&& (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
}
+/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
+
+ Note that this function is only suitable for callers that specifically
+ need a VEC_DUPLICATE_CST node. Use build_vector_from_val to duplicate
+ a general scalar into a general vector type. */
+
+tree
+build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
+{
+ int length = sizeof (struct tree_vector);
+
+ record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
+
+ tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+ TREE_SET_CODE (t, VEC_DUPLICATE_CST);
+ TREE_TYPE (t) = type;
+ t->base.u.nelts = 1;
+ VEC_DUPLICATE_CST_ELT (t) = exp;
+ TREE_CONSTANT (t) = 1;
+
+ return t;
+}
+
/* Build a newly constructed VECTOR_CST node of length LEN. */
tree
@@ -2343,6 +2372,8 @@ integer_zerop (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2369,6 +2400,8 @@ integer_onep (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2407,6 +2440,9 @@ integer_all_onesp (const_tree expr)
return 1;
}
+ else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
+ return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
+
else if (TREE_CODE (expr) != INTEGER_CST)
return 0;
@@ -2463,7 +2499,7 @@ integer_nonzerop (const_tree expr)
int
integer_truep (const_tree expr)
{
- if (TREE_CODE (expr) == VECTOR_CST)
+ if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
return integer_all_onesp (expr);
return integer_onep (expr);
}
@@ -2634,6 +2670,8 @@ real_zerop (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2662,6 +2700,8 @@ real_onep (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return real_onep (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2689,6 +2729,8 @@ real_minus_onep (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -7091,6 +7133,9 @@ add_expr (const_tree t, inchash::hash &h
inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
return;
}
+ case VEC_DUPLICATE_CST:
+ inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
+ return;
case SSA_NAME:
/* We can just compare by pointer. */
hstate.add_wide_int (SSA_NAME_VERSION (t));
@@ -10345,6 +10390,9 @@ initializer_zerop (const_tree init)
return true;
}
+ case VEC_DUPLICATE_CST:
+ return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
+
case CONSTRUCTOR:
{
unsigned HOST_WIDE_INT idx;
@@ -10390,7 +10438,13 @@ uniform_vector_p (const_tree vec)
gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
- if (TREE_CODE (vec) == VECTOR_CST)
+ if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
+ return VEC_DUPLICATE_CST_ELT (vec);
+
+ else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
+ return TREE_OPERAND (vec, 0);
+
+ else if (TREE_CODE (vec) == VECTOR_CST)
{
first = VECTOR_CST_ELT (vec, 0);
for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
@@ -11095,6 +11149,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
case REAL_CST:
case FIXED_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
case BLOCK:
case PLACEHOLDER_EXPR:
@@ -12381,6 +12436,12 @@ drop_tree_overflow (tree t)
elt = drop_tree_overflow (elt);
}
}
+ if (TREE_CODE (t) == VEC_DUPLICATE_CST)
+ {
+ tree *elt = &VEC_DUPLICATE_CST_ELT (t);
+ if (TREE_OVERFLOW (*elt))
+ *elt = drop_tree_overflow (*elt);
+ }
return t;
}
@@ -13798,6 +13859,92 @@ test_integer_constants ()
ASSERT_EQ (type, TREE_TYPE (zero));
}
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
+ for integral type TYPE. */
+
+static void
+test_vec_duplicate_predicates_int (tree type)
+{
+ tree vec_type = build_vector_type (type, 4);
+
+ tree zero = build_zero_cst (type);
+ tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
+ ASSERT_TRUE (integer_zerop (vec_zero));
+ ASSERT_FALSE (integer_onep (vec_zero));
+ ASSERT_FALSE (integer_minus_onep (vec_zero));
+ ASSERT_FALSE (integer_all_onesp (vec_zero));
+ ASSERT_FALSE (integer_truep (vec_zero));
+ ASSERT_TRUE (initializer_zerop (vec_zero));
+
+ tree one = build_one_cst (type);
+ tree vec_one = build_vec_duplicate_cst (vec_type, one);
+ ASSERT_FALSE (integer_zerop (vec_one));
+ ASSERT_TRUE (integer_onep (vec_one));
+ ASSERT_FALSE (integer_minus_onep (vec_one));
+ ASSERT_FALSE (integer_all_onesp (vec_one));
+ ASSERT_FALSE (integer_truep (vec_one));
+ ASSERT_FALSE (initializer_zerop (vec_one));
+
+ tree minus_one = build_minus_one_cst (type);
+ tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
+ ASSERT_FALSE (integer_zerop (vec_minus_one));
+ ASSERT_FALSE (integer_onep (vec_minus_one));
+ ASSERT_TRUE (integer_minus_onep (vec_minus_one));
+ ASSERT_TRUE (integer_all_onesp (vec_minus_one));
+ ASSERT_TRUE (integer_truep (vec_minus_one));
+ ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+ tree x = create_tmp_var_raw (type, "x");
+ tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
+ ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+ ASSERT_EQ (uniform_vector_p (vec_one), one);
+ ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+ ASSERT_EQ (uniform_vector_p (vec_x), x);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
+ type TYPE. */
+
+static void
+test_vec_duplicate_predicates_float (tree type)
+{
+ tree vec_type = build_vector_type (type, 4);
+
+ tree zero = build_zero_cst (type);
+ tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
+ ASSERT_TRUE (real_zerop (vec_zero));
+ ASSERT_FALSE (real_onep (vec_zero));
+ ASSERT_FALSE (real_minus_onep (vec_zero));
+ ASSERT_TRUE (initializer_zerop (vec_zero));
+
+ tree one = build_one_cst (type);
+ tree vec_one = build_vec_duplicate_cst (vec_type, one);
+ ASSERT_FALSE (real_zerop (vec_one));
+ ASSERT_TRUE (real_onep (vec_one));
+ ASSERT_FALSE (real_minus_onep (vec_one));
+ ASSERT_FALSE (initializer_zerop (vec_one));
+
+ tree minus_one = build_minus_one_cst (type);
+ tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
+ ASSERT_FALSE (real_zerop (vec_minus_one));
+ ASSERT_FALSE (real_onep (vec_minus_one));
+ ASSERT_TRUE (real_minus_onep (vec_minus_one));
+ ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+ ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+ ASSERT_EQ (uniform_vector_p (vec_one), one);
+ ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
+
+static void
+test_vec_duplicate_predicates ()
+{
+ test_vec_duplicate_predicates_int (integer_type_node);
+ test_vec_duplicate_predicates_float (float_type_node);
+}
+
/* Verify identifiers. */
static void
@@ -13826,6 +13973,7 @@ test_labels ()
tree_c_tests ()
{
test_integer_constants ();
+ test_vec_duplicate_predicates ();
test_identifiers ();
test_labels ();
}
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c 2017-10-23 11:41:23.137358624 +0100
+++ gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100
@@ -5049,6 +5049,8 @@ expand_debug_expr (tree exp)
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
case VEC_PERM_EXPR:
+ case VEC_DUPLICATE_CST:
+ case VEC_DUPLICATE_EXPR:
return NULL;
/* Misc codes. */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100
@@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
}
break;
+ case VEC_DUPLICATE_CST:
+ pp_string (pp, "{ ");
+ dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
+ pp_string (pp, ", ... }");
+ break;
+
case FUNCTION_TYPE:
case METHOD_TYPE:
dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
pp_string (pp, " > ");
break;
+ case VEC_DUPLICATE_EXPR:
+ pp_space (pp);
+ for (str = get_tree_code_name (code); *str; str++)
+ pp_character (pp, TOUPPER (*str));
+ pp_string (pp, " < ");
+ dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+ pp_string (pp, " > ");
+ break;
+
case VEC_UNPACK_HI_EXPR:
pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c 2017-10-23 11:41:24.407340836 +0100
+++ gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100
@@ -18862,6 +18862,7 @@ rtl_for_decl_init (tree init, tree type)
switch (TREE_CODE (init))
{
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
break;
case CONSTRUCTOR:
if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h 2017-10-23 11:38:53.934094740 +0100
+++ gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100
@@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
case FIXED_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
return true;
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c 2017-10-23 11:41:25.531270256 +0100
+++ gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100
@@ -11506,6 +11506,7 @@ gimplify_expr (tree *expr_p, gimple_seq
case STRING_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
/* Drop the overflow flag on constants, we do not want
that in the GIMPLE IL. */
if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-isl-ast-to-gimple.c
===================================================================
--- gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:23.205065216 +0100
+++ gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:51.767200753 +0100
@@ -222,7 +222,8 @@ enum phi_node_kind
return TREE_CODE (op) == INTEGER_CST
|| TREE_CODE (op) == REAL_CST
|| TREE_CODE (op) == COMPLEX_CST
- || TREE_CODE (op) == VECTOR_CST;
+ || TREE_CODE (op) == VECTOR_CST
+ || TREE_CODE (op) == VEC_DUPLICATE_CST;
}
private:
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c 2017-10-23 11:41:25.533204730 +0100
+++ gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100
@@ -1243,6 +1243,7 @@ scan_tree_for_params (sese_info_p s, tre
case REAL_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
break;
default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100
@@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
case REAL_CST:
{
@@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
case REAL_CST:
case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c 2017-10-23 11:41:25.874639400 +0100
+++ gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100
@@ -1478,6 +1478,7 @@ sem_item::add_expr (const_tree exp, inch
case STRING_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
inchash::add_expr (exp, hstate);
break;
case CONSTRUCTOR:
@@ -2030,6 +2031,9 @@ sem_variable::equals (tree t1, tree t2)
return 1;
}
+ case VEC_DUPLICATE_CST:
+ return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
+ VEC_DUPLICATE_CST_ELT (t2));
case ARRAY_REF:
case ARRAY_RANGE_REF:
{
Index: gcc/match.pd
===================================================================
--- gcc/match.pd 2017-10-23 11:38:53.934094740 +0100
+++ gcc/match.pd 2017-10-23 11:41:51.768165374 +0100
@@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(match negate_expr_p
VECTOR_CST
(if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
+(match negate_expr_p
+ VEC_DUPLICATE_CST
+ (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
/* (-A) * (-B) -> A * B */
(simplify
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100
@@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
}
break;
+ case VEC_DUPLICATE_CST:
+ print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
+ break;
+
case COMPLEX_CST:
print_node (file, "real", TREE_REALPART (node), indent + 4);
print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-chkp.c
===================================================================
--- gcc/tree-chkp.c 2017-10-23 11:41:23.201196268 +0100
+++ gcc/tree-chkp.c 2017-10-23 11:41:51.770094616 +0100
@@ -3800,6 +3800,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
if (integer_zerop (ptr_src))
bounds = chkp_get_none_bounds ();
else
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c 2017-10-23 11:41:23.228278904 +0100
+++ gcc/tree-loop-distribution.c 2017-10-23 11:41:51.771059237 +0100
@@ -921,6 +921,9 @@ const_with_all_bytes_same (tree val)
&& CONSTRUCTOR_NELTS (val) == 0))
return 0;
+ if (TREE_CODE (val) == VEC_DUPLICATE_CST)
+ return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
+
if (real_zerop (val))
{
/* Only return 0 for +0.0, not for -0.0, which doesn't have
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
@@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
case STRING_CST:
case RESULT_DECL:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case COMPLEX_CST:
case INTEGER_CST:
case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c 2017-10-23 11:41:25.549647760 +0100
+++ gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100
@@ -2675,6 +2675,7 @@ create_component_ref_by_pieces_1 (basic_
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case REAL_CST:
case CONSTRUCTOR:
case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100
@@ -858,6 +858,7 @@ copy_reference_ops_from_ref (tree ref, v
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case REAL_CST:
case FIXED_CST:
case CONSTRUCTOR:
@@ -1050,6 +1051,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case REAL_CST:
case CONSTRUCTOR:
case CONST_DECL:
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
@@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
ssa_uniform_vector_p (tree op)
{
if (TREE_CODE (op) == VECTOR_CST
+ || TREE_CODE (op) == VEC_DUPLICATE_CST
|| TREE_CODE (op) == CONSTRUCTOR)
return uniform_vector_p (op);
if (TREE_CODE (op) == SSA_NAME)
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c 2017-10-23 11:41:25.822408600 +0100
+++ gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100
@@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
CASE_CONVERT:
return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
+ case VEC_DUPLICATE_CST:
+ return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
+
default:
/* A language specific constant. Just hash the code. */
return code;
@@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
return 1;
}
+ case VEC_DUPLICATE_CST:
+ return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
+ VEC_DUPLICATE_CST_ELT (t2));
+
case CONSTRUCTOR:
{
vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c 2017-10-23 11:41:23.535860278 +0100
+++ gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100
@@ -418,6 +418,9 @@ negate_expr_p (tree t)
return true;
}
+ case VEC_DUPLICATE_CST:
+ return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
+
case COMPLEX_EXPR:
return negate_expr_p (TREE_OPERAND (t, 0))
&& negate_expr_p (TREE_OPERAND (t, 1));
@@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
return build_vector (type, elts);
}
+ case VEC_DUPLICATE_CST:
+ {
+ tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (type, sub);
+ }
+
case COMPLEX_EXPR:
if (negate_expr_p (t))
return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
return build_vector (type, elts);
}
+ if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+ && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
+ {
+ tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
+ VEC_DUPLICATE_CST_ELT (arg2));
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (TREE_TYPE (arg1), sub);
+ }
+
/* Shifts allow a scalar offset for a vector. */
if (TREE_CODE (arg1) == VECTOR_CST
&& TREE_CODE (arg2) == INTEGER_CST)
@@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
return build_vector (type, elts);
}
+
+ if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+ && TREE_CODE (arg2) == INTEGER_CST)
+ {
+ tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (TREE_TYPE (arg1), sub);
+ }
return NULL_TREE;
}
@@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
if (i == count)
return build_vector (type, elements);
}
+ else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
+ {
+ tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (arg0));
+ if (sub)
+ return build_vector_from_val (type, sub);
+ }
break;
case TRUTH_NOT_EXPR:
@@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
return res;
}
+ case VEC_DUPLICATE_EXPR:
+ if (CONSTANT_CLASS_P (arg0))
+ return build_vector_from_val (type, arg0);
+ return NULL_TREE;
+
default:
break;
}
@@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
}
return build_vector (type, v);
}
+ if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+ && (TYPE_VECTOR_SUBPARTS (type)
+ == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
+ {
+ tree sub = fold_convert_const (code, TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (arg1));
+ if (sub)
+ return build_vector_from_val (type, sub);
+ }
}
return NULL_TREE;
}
@@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
return 1;
}
+ case VEC_DUPLICATE_CST:
+ return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
+ VEC_DUPLICATE_CST_ELT (arg1), flags);
+
case COMPLEX_CST:
return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
flags)
@@ -7492,6 +7547,20 @@ can_native_interpret_type_p (tree type)
static tree
fold_view_convert_expr (tree type, tree expr)
{
+ /* Recurse on duplicated vectors if the target type is also a vector
+ and if the elements line up. */
+ tree expr_type = TREE_TYPE (expr);
+ if (TREE_CODE (expr) == VEC_DUPLICATE_CST
+ && VECTOR_TYPE_P (type)
+ && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
+ && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
+ {
+ tree sub = fold_view_convert_expr (TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (expr));
+ if (sub)
+ return build_vector_from_val (type, sub);
+ }
+
/* We support up to 512-bit values (for V8DFmode). */
unsigned char buffer[64];
int len;
@@ -8891,6 +8960,15 @@ exact_inverse (tree type, tree cst)
return build_vector (type, elts);
}
+ case VEC_DUPLICATE_CST:
+ {
+ tree sub = exact_inverse (TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (cst));
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (type, sub);
+ }
+
default:
return NULL_TREE;
}
@@ -11969,6 +12047,9 @@ fold_checksum_tree (const_tree expr, str
for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
break;
+ case VEC_DUPLICATE_CST:
+ fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
+ break;
default:
break;
}
@@ -14436,6 +14517,36 @@ test_vector_folding ()
ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
}
+/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
+
+static void
+test_vec_duplicate_folding ()
+{
+ tree type = build_vector_type (ssizetype, 4);
+ tree dup5 = build_vec_duplicate_cst (type, ssize_int (5));
+ tree dup3 = build_vec_duplicate_cst (type, ssize_int (3));
+
+ tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
+ ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
+
+ tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
+ ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
+
+ tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
+ ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
+
+ tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
+ ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
+
+ tree size_vector = build_vector_type (sizetype, 4);
+ tree size_dup5 = fold_convert (size_vector, dup5);
+ ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
+
+ tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
+ tree dup5_cst = build_vector_from_val (type, ssize_int (5));
+ ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
+}
+
/* Run all of the selftests within this file. */
void
@@ -14443,6 +14554,7 @@ fold_const_c_tests ()
{
test_arithmetic_folding ();
test_vector_folding ();
+ test_vec_duplicate_folding ();
}
} // namespace selftest
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def 2017-10-23 11:38:53.934094740 +0100
+++ gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100
@@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
+
+OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100
@@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
return TYPE_UNSIGNED (type) ?
vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
+ case VEC_DUPLICATE_EXPR:
+ return vec_duplicate_optab;
+
default:
break;
}
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h 2017-10-23 11:38:53.934094740 +0100
+++ gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100
@@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
enum optab_methods methods);
extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
enum optab_methods);
+extern rtx expand_vector_broadcast (machine_mode, rtx);
/* Generate code for a simple binary or unary operation. "Simple" in
this case means "can be unambiguously described by a (mode, code)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:41:41.549050496 +0100
+++ gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100
@@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
mode of OP must be the element mode of VMODE. If OP is a constant,
then the return value will be a constant. */
-static rtx
+rtx
expand_vector_broadcast (machine_mode vmode, rtx op)
{
enum insn_code icode;
@@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
if (CONSTANT_P (op))
return gen_const_vec_duplicate (vmode, op);
+ icode = optab_handler (vec_duplicate_optab, vmode);
+ if (icode != CODE_FOR_nothing)
+ {
+ struct expand_operand ops[2];
+ create_output_operand (&ops[0], NULL_RTX, vmode);
+ create_input_operand (&ops[1], op, GET_MODE (op));
+ expand_insn (icode, 2, ops);
+ return ops[0].value;
+ }
+
/* ??? If the target doesn't have a vec_init, then we have no easy way
of performing this operation. Most of this sort of generic support
is hidden away in the vector lowering support in gimple. */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-10-23 11:41:39.187050437 +0100
+++ gcc/expr.c 2017-10-23 11:41:51.764306890 +0100
@@ -6572,7 +6572,8 @@ store_constructor (tree exp, rtx target,
constructor_elt *ce;
int i;
int need_to_clear;
- int icode = CODE_FOR_nothing;
+ insn_code icode = CODE_FOR_nothing;
+ tree elt;
tree elttype = TREE_TYPE (type);
int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
machine_mode eltmode = TYPE_MODE (elttype);
@@ -6582,13 +6583,30 @@ store_constructor (tree exp, rtx target,
unsigned n_elts;
alias_set_type alias;
bool vec_vec_init_p = false;
+ machine_mode mode = GET_MODE (target);
gcc_assert (eltmode != BLKmode);
+ /* Try using vec_duplicate_optab for uniform vectors. */
+ if (!TREE_SIDE_EFFECTS (exp)
+ && VECTOR_MODE_P (mode)
+ && eltmode == GET_MODE_INNER (mode)
+ && ((icode = optab_handler (vec_duplicate_optab, mode))
+ != CODE_FOR_nothing)
+ && (elt = uniform_vector_p (exp)))
+ {
+ struct expand_operand ops[2];
+ create_output_operand (&ops[0], target, mode);
+ create_input_operand (&ops[1], expand_normal (elt), eltmode);
+ expand_insn (icode, 2, ops);
+ if (!rtx_equal_p (target, ops[0].value))
+ emit_move_insn (target, ops[0].value);
+ break;
+ }
+
n_elts = TYPE_VECTOR_SUBPARTS (type);
- if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
+ if (REG_P (target) && VECTOR_MODE_P (mode))
{
- machine_mode mode = GET_MODE (target);
machine_mode emode = eltmode;
if (CONSTRUCTOR_NELTS (exp)
@@ -6600,7 +6618,7 @@ store_constructor (tree exp, rtx target,
== n_elts);
emode = TYPE_MODE (etype);
}
- icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
+ icode = convert_optab_handler (vec_init_optab, mode, emode);
if (icode != CODE_FOR_nothing)
{
unsigned int i, n = n_elts;
@@ -6648,7 +6666,7 @@ store_constructor (tree exp, rtx target,
if (need_to_clear && size > 0 && !vector)
{
if (REG_P (target))
- emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+ emit_move_insn (target, CONST0_RTX (mode));
else
clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
cleared = 1;
@@ -6656,7 +6674,7 @@ store_constructor (tree exp, rtx target,
/* Inform later passes that the old value is dead. */
if (!cleared && !vector && REG_P (target))
- emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+ emit_move_insn (target, CONST0_RTX (mode));
if (MEM_P (target))
alias = MEM_ALIAS_SET (target);
@@ -6707,8 +6725,7 @@ store_constructor (tree exp, rtx target,
if (vector)
emit_insn (GEN_FCN (icode) (target,
- gen_rtx_PARALLEL (GET_MODE (target),
- vector)));
+ gen_rtx_PARALLEL (mode, vector)));
break;
}
@@ -7686,6 +7703,19 @@ expand_operands (tree exp0, tree exp1, r
}
\f
+/* Expand constant vector element ELT, which has mode MODE. This is used
+ for members of VECTOR_CST and VEC_DUPLICATE_CST. */
+
+static rtx
+const_vector_element (scalar_mode mode, const_tree elt)
+{
+ if (TREE_CODE (elt) == REAL_CST)
+ return const_double_from_real_value (TREE_REAL_CST (elt), mode);
+ if (TREE_CODE (elt) == FIXED_CST)
+ return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
+ return immed_wide_int_const (wi::to_wide (elt), mode);
+}
+
/* Return a MEM that contains constant EXP. DEFER is as for
output_constant_def and MODIFIER is as for expand_expr. */
@@ -9551,6 +9581,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
return target;
+ case VEC_DUPLICATE_EXPR:
+ op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
+ target = expand_vector_broadcast (mode, op0);
+ gcc_assert (target);
+ return target;
+
case BIT_INSERT_EXPR:
{
unsigned bitpos = tree_to_uhwi (treeop2);
@@ -10003,6 +10039,11 @@ expand_expr_real_1 (tree exp, rtx target
tmode, modifier);
}
+ case VEC_DUPLICATE_CST:
+ op0 = const_vector_element (GET_MODE_INNER (mode),
+ VEC_DUPLICATE_CST_ELT (exp));
+ return gen_const_vec_duplicate (mode, op0);
+
case CONST_DECL:
if (modifier == EXPAND_WRITE)
{
@@ -11764,8 +11805,7 @@ const_vector_from_tree (tree exp)
{
rtvec v;
unsigned i, units;
- tree elt;
- machine_mode inner, mode;
+ machine_mode mode;
mode = TYPE_MODE (TREE_TYPE (exp));
@@ -11776,23 +11816,12 @@ const_vector_from_tree (tree exp)
return const_vector_mask_from_tree (exp);
units = VECTOR_CST_NELTS (exp);
- inner = GET_MODE_INNER (mode);
v = rtvec_alloc (units);
for (i = 0; i < units; ++i)
- {
- elt = VECTOR_CST_ELT (exp, i);
-
- if (TREE_CODE (elt) == REAL_CST)
- RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
- inner);
- else if (TREE_CODE (elt) == FIXED_CST)
- RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
- inner);
- else
- RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
- }
+ RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
+ VECTOR_CST_ELT (exp, i));
return gen_rtx_CONST_VECTOR (mode, v);
}
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c 2017-10-23 11:41:23.529089619 +0100
+++ gcc/internal-fn.c 2017-10-23 11:41:51.767200753 +0100
@@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
emit_move_insn (cntvar, const0_rtx);
emit_label (loop_lab);
}
- if (TREE_CODE (arg0) != VECTOR_CST)
+ if (!CONSTANT_CLASS_P (arg0))
{
rtx arg0r = expand_normal (arg0);
arg0 = make_tree (TREE_TYPE (arg0), arg0r);
}
- if (TREE_CODE (arg1) != VECTOR_CST)
+ if (!CONSTANT_CLASS_P (arg1))
{
rtx arg1r = expand_normal (arg1);
arg1 = make_tree (TREE_TYPE (arg1), arg1r);
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c 2017-10-23 11:41:25.864967029 +0100
+++ gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100
@@ -3803,6 +3803,17 @@ verify_gimple_assign_unary (gassign *stm
case CONJ_EXPR:
break;
+ case VEC_DUPLICATE_EXPR:
+ if (TREE_CODE (lhs_type) != VECTOR_TYPE
+ || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+ {
+ error ("vec_duplicate should be from a scalar to a like vector");
+ debug_generic_expr (lhs_type);
+ debug_generic_expr (rhs1_type);
+ return true;
+ }
+ return false;
+
default:
gcc_unreachable ();
}
@@ -4473,6 +4484,7 @@ verify_gimple_assign_single (gassign *st
case FIXED_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
return res;
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c 2017-10-23 11:41:25.833048208 +0100
+++ gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100
@@ -4002,6 +4002,7 @@ estimate_operator_cost (enum tree_code c
case VEC_PACK_FIX_TRUNC_EXPR:
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
+ case VEC_DUPLICATE_EXPR:
return 1;
^ permalink raw reply [flat|nested] 90+ messages in thread
* [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (4 preceding siblings ...)
2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
@ 2017-10-23 11:22 ` Richard Sandiford
2017-10-26 12:26 ` Richard Biener
2017-12-15 0:34 ` Richard Sandiford
2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
` (15 subsequent siblings)
21 siblings, 2 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:22 UTC (permalink / raw)
To: gcc-patches
Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
tree code equivalents of the VEC_SERIES rtx code. VEC_SERIES_EXPR
is for non-constant inputs and is a normal tcc_binary. VEC_SERIES_CST
is a tcc_constant.
Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
vectors. This avoids the need to handle combinations of VECTOR_CST
and VEC_SERIES_CST.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
* doc/md.texi (vec_series@var{m}): Document.
* tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
* tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
codes.
(VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
(build_vec_series_cst, build_vec_series): Declare.
* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
(add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
(build_vec_series_cst, build_vec_series): New functions.
* cfgexpand.c (expand_debug_expr): Handle the new codes.
* tree-pretty-print.c (dump_generic_node): Likewise.
* dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
* gimple-expr.h (is_gimple_constant): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* graphite-scop-detection.c (scan_tree_for_params): Likewise.
* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
(func_checker::compare_operand): Likewise.
* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
* print-tree.c (print_node): Likewise.
* tree-ssa-loop.c (for_each_index): Likewise.
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
(ao_ref_init_from_vn_reference): Likewise.
* varasm.c (const_hash_1, compare_constant): Likewise.
* fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
(fold_checksum_tree): Likewise.
(vec_series_equivalent_p): New function.
(const_binop): Use it. Fold VEC_SERIES_EXPRs of constants.
* expmed.c (make_tree): Handle VEC_SERIES.
* gimple-pretty-print.c (dump_binary_rhs): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
(expand_expr_real_2): Handle VEC_SERIES_EXPR.
(expand_expr_real_1): Handle VEC_SERIES_CST.
* optabs.def (vec_series_optab): New optab.
* optabs.h (expand_vec_series_expr): Declare.
* optabs.c (expand_vec_series_expr): New function.
* optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
* tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
(verify_gimple_assign_single): Handle VEC_SERIES_CST.
* tree-vect-generic.c (expand_vector_operations_1): Check that
the operands also have vector type.
Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100
+++ gcc/doc/generic.texi 2017-10-23 11:42:34.910720660 +0100
@@ -1037,6 +1037,7 @@ As this example indicates, the operands
@tindex COMPLEX_CST
@tindex VECTOR_CST
@tindex VEC_DUPLICATE_CST
+@tindex VEC_SERIES_CST
@tindex STRING_CST
@findex TREE_STRING_LENGTH
@findex TREE_STRING_POINTER
@@ -1098,6 +1099,16 @@ instead. The scalar element value is gi
@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
element of a @code{VECTOR_CST}.
+@item VEC_SERIES_CST
+These nodes represent a vector constant in which element @var{i}
+has the value @samp{@var{base} + @var{i} * @var{step}}, for some
+constant @var{base} and @var{step}. The value of @var{base} is
+given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
+given by @code{VEC_SERIES_CST_STEP}.
+
+These nodes are restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
@item STRING_CST
These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
returns the length of the string, as an @code{int}. The
@@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
@node Vectors
@subsection Vectors
@tindex VEC_DUPLICATE_EXPR
+@tindex VEC_SERIES_EXPR
@tindex VEC_LSHIFT_EXPR
@tindex VEC_RSHIFT_EXPR
@tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1721,6 +1733,14 @@ a value from @code{enum annot_expr_kind}
This node has a single operand and represents a vector in which every
element is equal to that operand.
+@item VEC_SERIES_EXPR
+This node represents a vector formed from a scalar base and step,
+given as the first and second operands respectively. Element @var{i}
+of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
+
+This node is restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
@item VEC_LSHIFT_EXPR
@itemx VEC_RSHIFT_EXPR
These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100
+++ gcc/doc/md.texi 2017-10-23 11:42:34.911720660 +0100
@@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
This pattern is not allowed to @code{FAIL}.
+@cindex @code{vec_series@var{m}} instruction pattern
+@item @samp{vec_series@var{m}}
+Initialize vector output operand 0 so that element @var{i} is equal to
+operand 1 plus @var{i} times operand 2. In other words, create a linear
+series whose base value is operand 1 and whose step is operand 2.
+
+The vector output has mode @var{m} and the scalar inputs have the mode
+appropriate for one element of @var{m}. This pattern is not used for
+floating-point vectors, in order to avoid having to specify the
+rounding behavior for @var{i} > 1.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
@item @samp{vec_cmp@var{m}@var{n}}
Output a vector comparison. Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def 2017-10-23 11:41:51.774917721 +0100
+++ gcc/tree.def 2017-10-23 11:42:34.924720660 +0100
@@ -308,6 +308,10 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
VEC_DUPLICATE_CST_ELT. */
DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
+/* Represents a vector constant in which element i is equal to
+ VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP. */
+DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
+
/* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
@@ -541,6 +545,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
/* Represents a vector in which every element is equal to operand 0. */
DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+/* Vector series created from a start (base) value and a step.
+
+ A = VEC_SERIES_EXPR (B, C)
+
+ means
+
+ for (i = 0; i < N; i++)
+ A[i] = B + C * i; */
+DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
+
/* Vector conditional expression. It is like COND_EXPR, but with
vector operands.
Index: gcc/tree.h
===================================================================
--- gcc/tree.h 2017-10-23 11:41:51.775882341 +0100
+++ gcc/tree.h 2017-10-23 11:42:34.925720660 +0100
@@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
#define TYPE_REF_CAN_ALIAS_ALL(NODE) \
(PTR_OR_REF_CHECK (NODE)->base.static_flag)
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
- this means there was an overflow in folding. */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
+ or VEC_SERES_CST, this means there was an overflow in folding. */
#define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
@@ -1034,6 +1034,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
#define VEC_DUPLICATE_CST_ELT(NODE) \
(VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
+/* In a VEC_SERIES_CST node. */
+#define VEC_SERIES_CST_BASE(NODE) \
+ (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
+#define VEC_SERIES_CST_STEP(NODE) \
+ (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
+
/* Define fields and accessors for some special-purpose tree nodes. */
#define IDENTIFIER_LENGTH(NODE) \
@@ -4030,9 +4036,11 @@ extern tree build_int_cstu (tree type, u
extern tree build_int_cst_type (tree, HOST_WIDE_INT);
extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
+extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
extern tree build_vector_from_val (tree, tree);
+extern tree build_vec_series (tree, tree, tree);
extern void recompute_constructor_flags (tree);
extern void verify_constructor_flags (tree);
extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c 2017-10-23 11:41:51.774917721 +0100
+++ gcc/tree.c 2017-10-23 11:42:34.924720660 +0100
@@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
case COMPLEX_CST: return TS_COMPLEX;
case VECTOR_CST: return TS_VECTOR;
case VEC_DUPLICATE_CST: return TS_VECTOR;
+ case VEC_SERIES_CST: return TS_VECTOR;
case STRING_CST: return TS_STRING;
/* tcc_exceptional cases. */
case ERROR_MARK: return TS_COMMON;
@@ -818,6 +819,8 @@ tree_code_size (enum tree_code code)
case COMPLEX_CST: return sizeof (struct tree_complex);
case VECTOR_CST: return sizeof (struct tree_vector);
case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
+ case VEC_SERIES_CST:
+ return sizeof (struct tree_vector) + sizeof (tree);
case STRING_CST: gcc_unreachable ();
default:
return lang_hooks.tree_size (code);
@@ -880,6 +883,9 @@ tree_size (const_tree node)
case VEC_DUPLICATE_CST:
return sizeof (struct tree_vector);
+ case VEC_SERIES_CST:
+ return sizeof (struct tree_vector) + sizeof (tree);
+
case STRING_CST:
return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
@@ -1711,6 +1717,31 @@ build_vec_duplicate_cst (tree type, tree
return t;
}
+/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
+
+ Note that this function is only suitable for callers that specifically
+ need a VEC_SERIES_CST node. Use build_vec_series to build a general
+ series vector from a general base and step. */
+
+tree
+build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
+{
+ int length = sizeof (struct tree_vector) + sizeof (tree);
+
+ record_node_allocation_statistics (VEC_SERIES_CST, length);
+
+ tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+ TREE_SET_CODE (t, VEC_SERIES_CST);
+ TREE_TYPE (t) = type;
+ t->base.u.nelts = 2;
+ VEC_SERIES_CST_BASE (t) = base;
+ VEC_SERIES_CST_STEP (t) = step;
+ TREE_CONSTANT (t) = 1;
+
+ return t;
+}
+
/* Build a newly constructed VECTOR_CST node of length LEN. */
tree
@@ -1821,6 +1852,19 @@ build_vector_from_val (tree vectype, tre
}
}
+/* Build a vector series of type TYPE in which element I has the value
+ BASE + I * STEP. */
+
+tree
+build_vec_series (tree type, tree base, tree step)
+{
+ if (integer_zerop (step))
+ return build_vector_from_val (type, base);
+ if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
+ return build_vec_series_cst (type, base, step);
+ return build2 (VEC_SERIES_EXPR, type, base, step);
+}
+
/* Something has messed with the elements of CONSTRUCTOR C after it was built;
calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */
@@ -7136,6 +7180,10 @@ add_expr (const_tree t, inchash::hash &h
case VEC_DUPLICATE_CST:
inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
return;
+ case VEC_SERIES_CST:
+ inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
+ inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
+ return;
case SSA_NAME:
/* We can just compare by pointer. */
hstate.add_wide_int (SSA_NAME_VERSION (t));
@@ -11150,6 +11198,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
case FIXED_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
case BLOCK:
case PLACEHOLDER_EXPR:
@@ -12442,6 +12491,15 @@ drop_tree_overflow (tree t)
if (TREE_OVERFLOW (*elt))
*elt = drop_tree_overflow (*elt);
}
+ if (TREE_CODE (t) == VEC_SERIES_CST)
+ {
+ tree *elt = &VEC_SERIES_CST_BASE (t);
+ if (TREE_OVERFLOW (*elt))
+ *elt = drop_tree_overflow (*elt);
+ elt = &VEC_SERIES_CST_STEP (t);
+ if (TREE_OVERFLOW (*elt))
+ *elt = drop_tree_overflow (*elt);
+ }
return t;
}
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100
+++ gcc/cfgexpand.c 2017-10-23 11:42:34.909720660 +0100
@@ -5051,6 +5051,8 @@ expand_debug_expr (tree exp)
case VEC_PERM_EXPR:
case VEC_DUPLICATE_CST:
case VEC_DUPLICATE_EXPR:
+ case VEC_SERIES_CST:
+ case VEC_SERIES_EXPR:
return NULL;
/* Misc codes. */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100
+++ gcc/tree-pretty-print.c 2017-10-23 11:42:34.921720660 +0100
@@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
pp_string (pp, ", ... }");
break;
+ case VEC_SERIES_CST:
+ pp_string (pp, "{ ");
+ dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
+ pp_string (pp, ", +, ");
+ dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
+ pp_string (pp, "}");
+ break;
+
case FUNCTION_TYPE:
case METHOD_TYPE:
dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
pp_string (pp, " > ");
break;
+ case VEC_SERIES_EXPR:
case VEC_WIDEN_MULT_HI_EXPR:
case VEC_WIDEN_MULT_LO_EXPR:
case VEC_WIDEN_MULT_EVEN_EXPR:
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100
+++ gcc/dwarf2out.c 2017-10-23 11:42:34.913720660 +0100
@@ -18863,6 +18863,7 @@ rtl_for_decl_init (tree init, tree type)
{
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
break;
case CONSTRUCTOR:
if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100
+++ gcc/gimple-expr.h 2017-10-23 11:42:34.916720660 +0100
@@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
return true;
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100
+++ gcc/gimplify.c 2017-10-23 11:42:34.917720660 +0100
@@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
/* Drop the overflow flag on constants, we do not want
that in the GIMPLE IL. */
if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100
+++ gcc/graphite-scop-detection.c 2017-10-23 11:42:34.917720660 +0100
@@ -1244,6 +1244,7 @@ scan_tree_for_params (sese_info_p s, tre
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
break;
default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100
+++ gcc/ipa-icf-gimple.c 2017-10-23 11:42:34.917720660 +0100
@@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
case REAL_CST:
{
@@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
case REAL_CST:
case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100
+++ gcc/ipa-icf.c 2017-10-23 11:42:34.918720660 +0100
@@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
inchash::add_expr (exp, hstate);
break;
case CONSTRUCTOR:
@@ -2034,6 +2035,11 @@ sem_variable::equals (tree t1, tree t2)
case VEC_DUPLICATE_CST:
return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
VEC_DUPLICATE_CST_ELT (t2));
+ case VEC_SERIES_CST:
+ return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
+ VEC_SERIES_CST_BASE (t2))
+ && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
+ VEC_SERIES_CST_STEP (t2)));
case ARRAY_REF:
case ARRAY_RANGE_REF:
{
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100
+++ gcc/print-tree.c 2017-10-23 11:42:34.919720660 +0100
@@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
break;
+ case VEC_SERIES_CST:
+ print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
+ print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
+ break;
+
case COMPLEX_CST:
print_node (file, "real", TREE_REALPART (node), indent + 4);
print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
+++ gcc/tree-ssa-loop.c 2017-10-23 11:42:34.921720660 +0100
@@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
case RESULT_DECL:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case COMPLEX_CST:
case INTEGER_CST:
case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100
+++ gcc/tree-ssa-pre.c 2017-10-23 11:42:34.922720660 +0100
@@ -2676,6 +2676,7 @@ create_component_ref_by_pieces_1 (basic_
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case REAL_CST:
case CONSTRUCTOR:
case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100
+++ gcc/tree-ssa-sccvn.c 2017-10-23 11:42:34.922720660 +0100
@@ -859,6 +859,7 @@ copy_reference_ops_from_ref (tree ref, v
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case REAL_CST:
case FIXED_CST:
case CONSTRUCTOR:
@@ -1052,6 +1053,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case REAL_CST:
case CONSTRUCTOR:
case CONST_DECL:
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100
+++ gcc/varasm.c 2017-10-23 11:42:34.927720660 +0100
@@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
+ const_hash_1 (TREE_OPERAND (exp, 1)));
+ case VEC_SERIES_CST:
+ return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
+ + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
+
CASE_CONVERT:
return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
@@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
VEC_DUPLICATE_CST_ELT (t2));
+ case VEC_SERIES_CST:
+ return (compare_constant (VEC_SERIES_CST_BASE (t1),
+ VEC_SERIES_CST_BASE (t2))
+ && compare_constant (VEC_SERIES_CST_STEP (t1),
+ VEC_SERIES_CST_STEP (t2)));
+
case CONSTRUCTOR:
{
vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100
+++ gcc/fold-const.c 2017-10-23 11:42:34.916720660 +0100
@@ -421,6 +421,10 @@ negate_expr_p (tree t)
case VEC_DUPLICATE_CST:
return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
+ case VEC_SERIES_CST:
+ return (negate_expr_p (VEC_SERIES_CST_BASE (t))
+ && negate_expr_p (VEC_SERIES_CST_STEP (t)));
+
case COMPLEX_EXPR:
return negate_expr_p (TREE_OPERAND (t, 0))
&& negate_expr_p (TREE_OPERAND (t, 1));
@@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
return build_vector_from_val (type, sub);
}
+ case VEC_SERIES_CST:
+ {
+ tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
+ if (!neg_base)
+ return NULL_TREE;
+ tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
+ if (!neg_step)
+ return NULL_TREE;
+ return build_vec_series (type, neg_base, neg_step);
+ }
+
case COMPLEX_EXPR:
if (negate_expr_p (t))
return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
return int_const_binop_1 (code, arg1, arg2, 1);
}
+/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
+ and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
+ The step will be zero for VEC_DUPLICATE_CST. */
+
+static bool
+vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
+{
+ if (TREE_CODE (exp) == VEC_SERIES_CST)
+ {
+ *base_out = VEC_SERIES_CST_BASE (exp);
+ *step_out = VEC_SERIES_CST_STEP (exp);
+ return true;
+ }
+ if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
+ {
+ *base_out = VEC_DUPLICATE_CST_ELT (exp);
+ *step_out = build_zero_cst (TREE_TYPE (*base_out));
+ return true;
+ }
+ return false;
+}
+
/* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
constant. We assume ARG1 and ARG2 have the same data type, or at least
are the same kind of constant and the same machine mode. Return zero if
@@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
return build_vector_from_val (TREE_TYPE (arg1), sub);
}
+ tree base1, step1, base2, step2;
+ if ((code == PLUS_EXPR || code == MINUS_EXPR)
+ && vec_series_equivalent_p (arg1, &base1, &step1)
+ && vec_series_equivalent_p (arg2, &base2, &step2))
+ {
+ tree new_base = const_binop (code, base1, base2);
+ if (!new_base)
+ return NULL_TREE;
+ tree new_step = const_binop (code, step1, step2);
+ if (!new_step)
+ return NULL_TREE;
+ return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
+ }
+
/* Shifts allow a scalar offset for a vector. */
if (TREE_CODE (arg1) == VECTOR_CST
&& TREE_CODE (arg2) == INTEGER_CST)
@@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
result as argument put those cases that need it here. */
switch (code)
{
+ case VEC_SERIES_EXPR:
+ if (CONSTANT_CLASS_P (arg1)
+ && CONSTANT_CLASS_P (arg2))
+ return build_vec_series (type, arg1, arg2);
+ return NULL_TREE;
+
case COMPLEX_EXPR:
if ((TREE_CODE (arg1) == REAL_CST
&& TREE_CODE (arg2) == REAL_CST)
@@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
VEC_DUPLICATE_CST_ELT (arg1), flags);
+ case VEC_SERIES_CST:
+ return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
+ VEC_SERIES_CST_BASE (arg1), flags)
+ && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
+ VEC_SERIES_CST_STEP (arg1), flags));
+
case COMPLEX_CST:
return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
flags)
@@ -12050,6 +12113,10 @@ fold_checksum_tree (const_tree expr, str
case VEC_DUPLICATE_CST:
fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
break;
+ case VEC_SERIES_CST:
+ fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
+ fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
+ break;
default:
break;
}
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c 2017-10-23 11:41:39.186050437 +0100
+++ gcc/expmed.c 2017-10-23 11:42:34.914720660 +0100
@@ -5253,6 +5253,13 @@ make_tree (tree type, rtx x)
tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
return build_vector_from_val (type, elt_tree);
}
+ if (GET_CODE (op) == VEC_SERIES)
+ {
+ tree itype = TREE_TYPE (type);
+ tree base_tree = make_tree (itype, XEXP (op, 0));
+ tree step_tree = make_tree (itype, XEXP (op, 1));
+ return build_vec_series (type, base_tree, step_tree);
+ }
return make_tree (type, op);
}
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c 2017-10-23 11:41:25.500318672 +0100
+++ gcc/gimple-pretty-print.c 2017-10-23 11:42:34.916720660 +0100
@@ -438,6 +438,7 @@ dump_binary_rhs (pretty_printer *buffer,
case VEC_PACK_FIX_TRUNC_EXPR:
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
+ case VEC_SERIES_EXPR:
for (p = get_tree_code_name (code); *p; p++)
pp_character (buffer, TOUPPER (*p));
pp_string (buffer, " <");
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100
+++ gcc/tree-inline.c 2017-10-23 11:42:34.921720660 +0100
@@ -4003,6 +4003,7 @@ estimate_operator_cost (enum tree_code c
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
case VEC_DUPLICATE_EXPR:
+ case VEC_SERIES_EXPR:
return 1;
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-10-23 11:41:51.764306890 +0100
+++ gcc/expr.c 2017-10-23 11:42:34.915720660 +0100
@@ -7704,7 +7704,7 @@ expand_operands (tree exp0, tree exp1, r
\f
/* Expand constant vector element ELT, which has mode MODE. This is used
- for members of VECTOR_CST and VEC_DUPLICATE_CST. */
+ for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST. */
static rtx
const_vector_element (scalar_mode mode, const_tree elt)
@@ -9587,6 +9587,10 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
gcc_assert (target);
return target;
+ case VEC_SERIES_EXPR:
+ expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
+ return expand_vec_series_expr (mode, op0, op1, target);
+
case BIT_INSERT_EXPR:
{
unsigned bitpos = tree_to_uhwi (treeop2);
@@ -10044,6 +10048,13 @@ expand_expr_real_1 (tree exp, rtx target
VEC_DUPLICATE_CST_ELT (exp));
return gen_const_vec_duplicate (mode, op0);
+ case VEC_SERIES_CST:
+ op0 = const_vector_element (GET_MODE_INNER (mode),
+ VEC_SERIES_CST_BASE (exp));
+ op1 = const_vector_element (GET_MODE_INNER (mode),
+ VEC_SERIES_CST_STEP (exp));
+ return gen_const_vec_series (mode, op0, op1);
+
case CONST_DECL:
if (modifier == EXPAND_WRITE)
{
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100
+++ gcc/optabs.def 2017-10-23 11:42:34.919720660 +0100
@@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
+OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100
+++ gcc/optabs.h 2017-10-23 11:42:34.919720660 +0100
@@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
/* Generate code for VEC_COND_EXPR. */
extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+/* Generate code for VEC_SERIES_EXPR. */
+extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
+
/* Generate code for MULT_HIGHPART_EXPR. */
extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100
+++ gcc/optabs.c 2017-10-23 11:42:34.919720660 +0100
@@ -5693,6 +5693,27 @@ expand_vec_cond_expr (tree vec_cond_type
return ops[0].value;
}
+/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
+ Use TARGET for the result if nonnull and convenient. */
+
+rtx
+expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
+{
+ struct expand_operand ops[3];
+ enum insn_code icode;
+ machine_mode emode = GET_MODE_INNER (vmode);
+
+ icode = direct_optab_handler (vec_series_optab, vmode);
+ gcc_assert (icode != CODE_FOR_nothing);
+
+ create_output_operand (&ops[0], target, vmode);
+ create_input_operand (&ops[1], op0, emode);
+ create_input_operand (&ops[2], op1, emode);
+
+ expand_insn (icode, 3, ops);
+ return ops[0].value;
+}
+
/* Generate insns for a vector comparison into a mask. */
rtx
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100
+++ gcc/optabs-tree.c 2017-10-23 11:42:34.918720660 +0100
@@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
case VEC_DUPLICATE_EXPR:
return vec_duplicate_optab;
+ case VEC_SERIES_EXPR:
+ return vec_series_optab;
+
default:
break;
}
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100
+++ gcc/tree-cfg.c 2017-10-23 11:42:34.920720660 +0100
@@ -4119,6 +4119,23 @@ verify_gimple_assign_binary (gassign *st
/* Continue with generic binary expression handling. */
break;
+ case VEC_SERIES_EXPR:
+ if (!useless_type_conversion_p (rhs1_type, rhs2_type))
+ {
+ error ("type mismatch in series expression");
+ debug_generic_expr (rhs1_type);
+ debug_generic_expr (rhs2_type);
+ return true;
+ }
+ if (TREE_CODE (lhs_type) != VECTOR_TYPE
+ || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+ {
+ error ("vector type expected in series expression");
+ debug_generic_expr (lhs_type);
+ return true;
+ }
+ return false;
+
default:
gcc_unreachable ();
}
@@ -4485,6 +4502,7 @@ verify_gimple_assign_single (gassign *st
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
return res;
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
+++ gcc/tree-vect-generic.c 2017-10-23 11:42:34.922720660 +0100
@@ -1595,7 +1595,8 @@ expand_vector_operations_1 (gimple_stmt_
if (rhs_class == GIMPLE_BINARY_RHS)
rhs2 = gimple_assign_rhs2 (stmt);
- if (TREE_CODE (type) != VECTOR_TYPE)
+ if (!VECTOR_TYPE_P (type)
+ || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
return;
/* If the vector operation is operating on all same vector elements
^ permalink raw reply [flat|nested] 90+ messages in thread
* [07/nn] Add unique CONSTs
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (6 preceding siblings ...)
2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
@ 2017-10-23 11:22 ` Richard Sandiford
2017-10-27 15:51 ` Jeff Law
2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
` (13 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:22 UTC (permalink / raw)
To: gcc-patches
This patch adds a way of treating certain kinds of CONST as unique,
so that pointer equality is equivalent to value equality. For now it
is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
generate them remains in the else arm of an "if (1)" until a later
patch.
This is needed so that (const (vec_duplicate xx)) can used as the
CONSTxx_RTX of a variable-length vector.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtl.h (unique_const_p): New function.
(gen_rtx_CONST): Declare.
* emit-rtl.c (const_hasher): New struct.
(const_htab): New variable.
(init_emit_once): Initialize it.
(const_hasher::hash, const_hasher::equal): New functions.
(gen_rtx_CONST): New function.
(spare_vec_duplicate, spare_vec_series): New variables.
(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
but disable it for now.
(gen_const_vec_series): Likewise (const (vec_series)).
* gengenrtl.c (special_rtx): Return true for CONST.
* rtl.c (shared_const_p): Return true if unique_const_p.
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-23 11:41:41.549050496 +0100
+++ gcc/rtl.h 2017-10-23 11:42:47.297720974 +0100
@@ -2861,6 +2861,23 @@ vec_series_p (const_rtx x, rtx *base_out
return const_vec_series_p (x, base_out, step_out);
}
+/* Return true if there should only ever be one instance of (const X),
+ so that constants of this type can be compared using pointer equality. */
+
+inline bool
+unique_const_p (const_rtx x)
+{
+ switch (GET_CODE (x))
+ {
+ case VEC_DUPLICATE:
+ case VEC_SERIES:
+ return true;
+
+ default:
+ return false;
+ }
+}
+
/* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X. */
inline scalar_int_mode
@@ -3542,6 +3559,7 @@ extern rtx_insn_list *gen_rtx_INSN_LIST
gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
basic_block bb, rtx pattern, int location, int code,
rtx reg_notes);
+extern rtx gen_rtx_CONST (machine_mode, rtx);
extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-10-23 11:41:41.548050496 +0100
+++ gcc/emit-rtl.c 2017-10-23 11:42:47.296720974 +0100
@@ -175,6 +175,15 @@ struct const_fixed_hasher : ggc_cache_pt
static GTY ((cache)) hash_table<const_fixed_hasher> *const_fixed_htab;
+/* A hash table storing unique CONSTs. */
+struct const_hasher : ggc_cache_ptr_hash<rtx_def>
+{
+ static hashval_t hash (rtx x);
+ static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table<const_hasher> *const_htab;
+
#define cur_insn_uid (crtl->emit.x_cur_insn_uid)
#define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
#define first_label_num (crtl->emit.x_first_label_num)
@@ -310,6 +319,28 @@ const_fixed_hasher::equal (rtx x, rtx y)
return fixed_identical (CONST_FIXED_VALUE (a), CONST_FIXED_VALUE (b));
}
+/* Returns a hash code for X (which is either an existing unique CONST
+ or an operand to gen_rtx_CONST). */
+
+hashval_t
+const_hasher::hash (rtx x)
+{
+ if (GET_CODE (x) == CONST)
+ x = XEXP (x, 0);
+
+ int do_not_record_p = 0;
+ return hash_rtx (x, GET_MODE (x), &do_not_record_p, NULL, false);
+}
+
+/* Returns true if the operand of unique CONST X is equal to Y. */
+
+bool
+const_hasher::equal (rtx x, rtx y)
+{
+ gcc_checking_assert (GET_CODE (x) == CONST);
+ return rtx_equal_p (XEXP (x, 0), y);
+}
+
/* Return true if the given memory attributes are equal. */
bool
@@ -5756,16 +5787,55 @@ init_emit (void)
#endif
}
+rtx
+gen_rtx_CONST (machine_mode mode, rtx val)
+{
+ if (unique_const_p (val))
+ {
+ /* Look up the CONST in the hash table. */
+ rtx *slot = const_htab->find_slot (val, INSERT);
+ if (*slot == 0)
+ *slot = gen_rtx_raw_CONST (mode, val);
+ return *slot;
+ }
+
+ return gen_rtx_raw_CONST (mode, val);
+}
+
+/* Temporary rtx used by gen_const_vec_duplicate_1. */
+static GTY((deletable)) rtx spare_vec_duplicate;
+
/* Like gen_const_vec_duplicate, but ignore const_tiny_rtx. */
static rtx
gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
{
int nunits = GET_MODE_NUNITS (mode);
- rtvec v = rtvec_alloc (nunits);
- for (int i = 0; i < nunits; ++i)
- RTVEC_ELT (v, i) = el;
- return gen_rtx_raw_CONST_VECTOR (mode, v);
+ if (1)
+ {
+ rtvec v = rtvec_alloc (nunits);
+
+ for (int i = 0; i < nunits; ++i)
+ RTVEC_ELT (v, i) = el;
+
+ return gen_rtx_raw_CONST_VECTOR (mode, v);
+ }
+ else
+ {
+ if (spare_vec_duplicate)
+ {
+ PUT_MODE (spare_vec_duplicate, mode);
+ XEXP (spare_vec_duplicate, 0) = el;
+ }
+ else
+ spare_vec_duplicate = gen_rtx_VEC_DUPLICATE (mode, el);
+
+ rtx res = gen_rtx_CONST (mode, spare_vec_duplicate);
+ if (XEXP (res, 0) == spare_vec_duplicate)
+ spare_vec_duplicate = NULL_RTX;
+
+ return res;
+ }
}
/* Generate a vector constant of mode MODE in which every element has
@@ -5827,6 +5897,9 @@ const_vec_series_p_1 (const_rtx x, rtx *
return true;
}
+/* Temporary rtx used by gen_const_vec_series. */
+static GTY((deletable)) rtx spare_vec_series;
+
/* Generate a vector constant of mode MODE in which element I has
the value BASE + I * STEP. */
@@ -5836,13 +5909,33 @@ gen_const_vec_series (machine_mode mode,
gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
int nunits = GET_MODE_NUNITS (mode);
- rtvec v = rtvec_alloc (nunits);
- scalar_mode inner_mode = GET_MODE_INNER (mode);
- RTVEC_ELT (v, 0) = base;
- for (int i = 1; i < nunits; ++i)
- RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
- RTVEC_ELT (v, i - 1), step);
- return gen_rtx_raw_CONST_VECTOR (mode, v);
+ if (1)
+ {
+ rtvec v = rtvec_alloc (nunits);
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ RTVEC_ELT (v, 0) = base;
+ for (int i = 1; i < nunits; ++i)
+ RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
+ RTVEC_ELT (v, i - 1), step);
+ return gen_rtx_raw_CONST_VECTOR (mode, v);
+ }
+ else
+ {
+ if (spare_vec_series)
+ {
+ PUT_MODE (spare_vec_series, mode);
+ XEXP (spare_vec_series, 0) = base;
+ XEXP (spare_vec_series, 1) = step;
+ }
+ else
+ spare_vec_series = gen_rtx_VEC_SERIES (mode, base, step);
+
+ rtx res = gen_rtx_CONST (mode, spare_vec_series);
+ if (XEXP (res, 0) == spare_vec_series)
+ spare_vec_series = NULL_RTX;
+
+ return res;
+ }
}
/* Generate a vector of mode MODE in which element I has the value
@@ -6000,6 +6093,8 @@ init_emit_once (void)
reg_attrs_htab = hash_table<reg_attr_hasher>::create_ggc (37);
+ const_htab = hash_table<const_hasher>::create_ggc (37);
+
#ifdef INIT_EXPANDERS
/* This is to initialize {init|mark|free}_machine_status before the first
call to push_function_context_to. This is needed by the Chill front
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c 2017-08-03 10:40:53.029491180 +0100
+++ gcc/gengenrtl.c 2017-10-23 11:42:47.297720974 +0100
@@ -143,7 +143,8 @@ special_rtx (int idx)
|| strcmp (defs[idx].enumname, "CC0") == 0
|| strcmp (defs[idx].enumname, "RETURN") == 0
|| strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
- || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
+ || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0
+ || strcmp (defs[idx].enumname, "CONST") == 0);
}
/* Return nonzero if the RTL code given by index IDX is one that we should
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c 2017-08-03 10:40:55.646123304 +0100
+++ gcc/rtl.c 2017-10-23 11:42:47.297720974 +0100
@@ -252,6 +252,9 @@ shared_const_p (const_rtx orig)
{
gcc_assert (GET_CODE (orig) == CONST);
+ if (unique_const_p (XEXP (orig, 0)))
+ return true;
+
/* CONST can be shared if it contains a SYMBOL_REF. If it contains
a LABEL_REF, it isn't sharable. */
return (GET_CODE (XEXP (orig, 0)) == PLUS
^ permalink raw reply [flat|nested] 90+ messages in thread
* [08/nn] Add a fixed_size_mode class
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (5 preceding siblings ...)
2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} " Richard Sandiford
@ 2017-10-23 11:22 ` Richard Sandiford
2017-10-26 11:57 ` Richard Biener
2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
` (14 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:22 UTC (permalink / raw)
To: gcc-patches
This patch adds a fixed_size_mode machine_mode wrapper
for modes that are known to have a fixed size. That applies
to all current modes, but future patches will add support for
variable-sized modes.
The use of this class should be pretty restricted. One important
use case is to hold the mode of static data, which can never be
variable-sized with current file formats. Another is to hold
the modes of registers involved in __builtin_apply and
__builtin_result, since those interfaces don't cope well with
variable-sized data.
The class can also be useful when reinterpreting the contents of
a fixed-length bit string as a different kind of value.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (fixed_size_mode): New class.
* rtl.h (get_pool_mode): Return fixed_size_mode.
* gengtype.c (main): Add fixed_size_mode.
* target.def (get_raw_result_mode): Return a fixed_size_mode.
(get_raw_arg_mode): Likewise.
* doc/tm.texi: Regenerate.
* targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode.
* targhooks.c (default_get_reg_raw_mode): Likewise.
* config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise.
* config/mips/mips.c (mips_get_reg_raw_mode): Likewise.
* config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise.
(msp430_get_raw_result_mode): Likewise.
* config/avr/avr-protos.h (regmask): Use as_a <fixed_side_mode>
* dbxout.c (dbxout_parms): Require fixed-size modes.
* expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise.
* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
* omp-low.c (lower_oacc_reductions): Likewise.
* simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes.
(simplify_subreg): Update accordingly.
* varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode.
(force_const_mem): Update accordingly. Return NULL_RTX for modes
that aren't fixed-size.
(get_pool_mode): Return a fixed_size_mode.
(output_constant_pool_2): Take a fixed_size_mode.
Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h 2017-09-15 14:47:33.184331588 +0100
+++ gcc/machmode.h 2017-10-23 11:42:52.014721093 +0100
@@ -652,6 +652,39 @@ GET_MODE_2XWIDER_MODE (const T &m)
extern const unsigned char mode_complex[NUM_MACHINE_MODES];
#define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
+/* Represents a machine mode that must have a fixed size. The main
+ use of this class is to represent the modes of objects that always
+ have static storage duration, such as constant pool entries.
+ (No current target supports the concept of variable-size static data.) */
+class fixed_size_mode
+{
+public:
+ typedef mode_traits<fixed_size_mode>::from_int from_int;
+
+ ALWAYS_INLINE fixed_size_mode () {}
+ ALWAYS_INLINE fixed_size_mode (from_int m) : m_mode (machine_mode (m)) {}
+ ALWAYS_INLINE fixed_size_mode (const scalar_mode &m) : m_mode (m) {}
+ ALWAYS_INLINE fixed_size_mode (const scalar_int_mode &m) : m_mode (m) {}
+ ALWAYS_INLINE fixed_size_mode (const scalar_float_mode &m) : m_mode (m) {}
+ ALWAYS_INLINE fixed_size_mode (const scalar_mode_pod &m) : m_mode (m) {}
+ ALWAYS_INLINE fixed_size_mode (const scalar_int_mode_pod &m) : m_mode (m) {}
+ ALWAYS_INLINE fixed_size_mode (const complex_mode &m) : m_mode (m) {}
+ ALWAYS_INLINE operator machine_mode () const { return m_mode; }
+
+ static bool includes_p (machine_mode);
+
+protected:
+ machine_mode m_mode;
+};
+
+/* Return true if MODE has a fixed size. */
+
+inline bool
+fixed_size_mode::includes_p (machine_mode)
+{
+ return true;
+}
+
extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
/* Return the machine mode to use for a MODE_INT of SIZE bits, if one
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-23 11:42:47.297720974 +0100
+++ gcc/rtl.h 2017-10-23 11:42:52.015721094 +0100
@@ -3020,7 +3020,7 @@ extern rtx force_const_mem (machine_mode
struct function;
extern rtx get_pool_constant (const_rtx);
extern rtx get_pool_constant_mark (rtx, bool *);
-extern machine_mode get_pool_mode (const_rtx);
+extern fixed_size_mode get_pool_mode (const_rtx);
extern rtx simplify_subtraction (rtx);
extern void decide_function_section (tree);
Index: gcc/gengtype.c
===================================================================
--- gcc/gengtype.c 2017-05-23 19:29:56.919436344 +0100
+++ gcc/gengtype.c 2017-10-23 11:42:52.014721093 +0100
@@ -5197,6 +5197,7 @@ #define POS_HERE(Call) do { pos.file = t
POS_HERE (do_scalar_typedef ("JCF_u2", &pos));
POS_HERE (do_scalar_typedef ("void", &pos));
POS_HERE (do_scalar_typedef ("machine_mode", &pos));
+ POS_HERE (do_scalar_typedef ("fixed_size_mode", &pos));
POS_HERE (do_typedef ("PTR",
create_pointer (resolve_typedef ("void", &pos)),
&pos));
Index: gcc/target.def
===================================================================
--- gcc/target.def 2017-10-23 11:41:23.134456913 +0100
+++ gcc/target.def 2017-10-23 11:42:52.017721094 +0100
@@ -5021,7 +5021,7 @@ DEFHOOK
"This target hook returns the mode to be used when accessing raw return\
registers in @code{__builtin_return}. Define this macro if the value\
in @var{reg_raw_mode} is not correct.",
- machine_mode, (int regno),
+ fixed_size_mode, (int regno),
default_get_reg_raw_mode)
/* Return a mode wide enough to copy any argument value that might be
@@ -5031,7 +5031,7 @@ DEFHOOK
"This target hook returns the mode to be used when accessing raw argument\
registers in @code{__builtin_apply_args}. Define this macro if the value\
in @var{reg_raw_mode} is not correct.",
- machine_mode, (int regno),
+ fixed_size_mode, (int regno),
default_get_reg_raw_mode)
HOOK_VECTOR_END (calls)
Index: gcc/doc/tm.texi
===================================================================
--- gcc/doc/tm.texi 2017-10-23 11:41:22.175925023 +0100
+++ gcc/doc/tm.texi 2017-10-23 11:42:52.012721093 +0100
@@ -4536,11 +4536,11 @@ This macro has effect in @option{-fpcc-s
nothing when you use @option{-freg-struct-return} mode.
@end defmac
-@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
+@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
This target hook returns the mode to be used when accessing raw return registers in @code{__builtin_return}. Define this macro if the value in @var{reg_raw_mode} is not correct.
@end deftypefn
-@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
+@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
This target hook returns the mode to be used when accessing raw argument registers in @code{__builtin_apply_args}. Define this macro if the value in @var{reg_raw_mode} is not correct.
@end deftypefn
Index: gcc/targhooks.h
===================================================================
--- gcc/targhooks.h 2017-10-02 09:08:43.318933786 +0100
+++ gcc/targhooks.h 2017-10-23 11:42:52.017721094 +0100
@@ -233,7 +233,7 @@ extern int default_jump_align_max_skip (
extern section * default_function_section(tree decl, enum node_frequency freq,
bool startup, bool exit);
extern machine_mode default_dwarf_frame_reg_mode (int);
-extern machine_mode default_get_reg_raw_mode (int);
+extern fixed_size_mode default_get_reg_raw_mode (int);
extern bool default_keep_leaf_when_profiled ();
extern void *default_get_pch_validity (size_t *);
Index: gcc/targhooks.c
===================================================================
--- gcc/targhooks.c 2017-10-23 11:41:23.195392846 +0100
+++ gcc/targhooks.c 2017-10-23 11:42:52.017721094 +0100
@@ -1834,10 +1834,12 @@ default_dwarf_frame_reg_mode (int regno)
/* To be used by targets where reg_raw_mode doesn't return the right
mode for registers used in apply_builtin_return and apply_builtin_arg. */
-machine_mode
+fixed_size_mode
default_get_reg_raw_mode (int regno)
{
- return reg_raw_mode[regno];
+ /* Targets must override this hook if the underlying register is
+ variable-sized. */
+ return as_a <fixed_size_mode> (reg_raw_mode[regno]);
}
/* Return true if a leaf function should stay leaf even with profiling
Index: gcc/config/ia64/ia64.c
===================================================================
--- gcc/config/ia64/ia64.c 2017-10-23 11:41:32.363050263 +0100
+++ gcc/config/ia64/ia64.c 2017-10-23 11:42:52.009721093 +0100
@@ -329,7 +329,7 @@ static tree ia64_fold_builtin (tree, int
static tree ia64_builtin_decl (unsigned, bool);
static reg_class_t ia64_preferred_reload_class (rtx, reg_class_t);
-static machine_mode ia64_get_reg_raw_mode (int regno);
+static fixed_size_mode ia64_get_reg_raw_mode (int regno);
static section * ia64_hpux_function_section (tree, enum node_frequency,
bool, bool);
@@ -11328,7 +11328,7 @@ ia64_dconst_0_375 (void)
return ia64_dconst_0_375_rtx;
}
-static machine_mode
+static fixed_size_mode
ia64_get_reg_raw_mode (int regno)
{
if (FR_REGNO_P (regno))
Index: gcc/config/mips/mips.c
===================================================================
--- gcc/config/mips/mips.c 2017-10-23 11:41:32.365050264 +0100
+++ gcc/config/mips/mips.c 2017-10-23 11:42:52.010721093 +0100
@@ -1132,7 +1132,6 @@ static rtx mips_find_pic_call_symbol (rt
static int mips_register_move_cost (machine_mode, reg_class_t,
reg_class_t);
static unsigned int mips_function_arg_boundary (machine_mode, const_tree);
-static machine_mode mips_get_reg_raw_mode (int regno);
static rtx mips_gen_const_int_vector_shuffle (machine_mode, int);
\f
/* This hash table keeps track of implicit "mips16" and "nomips16" attributes
@@ -6111,7 +6110,7 @@ mips_function_arg_boundary (machine_mode
/* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE. */
-static machine_mode
+static fixed_size_mode
mips_get_reg_raw_mode (int regno)
{
if (TARGET_FLOATXX && FP_REG_P (regno))
Index: gcc/config/msp430/msp430.c
===================================================================
--- gcc/config/msp430/msp430.c 2017-10-23 11:41:23.047405581 +0100
+++ gcc/config/msp430/msp430.c 2017-10-23 11:42:52.011721093 +0100
@@ -1398,16 +1398,17 @@ msp430_return_in_memory (const_tree ret_
#undef TARGET_GET_RAW_ARG_MODE
#define TARGET_GET_RAW_ARG_MODE msp430_get_raw_arg_mode
-static machine_mode
+static fixed_size_mode
msp430_get_raw_arg_mode (int regno)
{
- return (regno == ARG_POINTER_REGNUM) ? VOIDmode : Pmode;
+ return as_a <fixed_size_mode> (regno == ARG_POINTER_REGNUM
+ ? VOIDmode : Pmode);
}
#undef TARGET_GET_RAW_RESULT_MODE
#define TARGET_GET_RAW_RESULT_MODE msp430_get_raw_result_mode
-static machine_mode
+static fixed_size_mode
msp430_get_raw_result_mode (int regno ATTRIBUTE_UNUSED)
{
return Pmode;
Index: gcc/config/avr/avr-protos.h
===================================================================
--- gcc/config/avr/avr-protos.h 2017-10-23 11:41:22.812366984 +0100
+++ gcc/config/avr/avr-protos.h 2017-10-23 11:42:52.007721093 +0100
@@ -132,7 +132,7 @@ extern bool avr_casei_sequence_check_ope
static inline unsigned
regmask (machine_mode mode, unsigned regno)
{
- return ((1u << GET_MODE_SIZE (mode)) - 1) << regno;
+ return ((1u << GET_MODE_SIZE (as_a <fixed_size_mode> (mode))) - 1) << regno;
}
extern void avr_fix_inputs (rtx*, unsigned, unsigned);
Index: gcc/dbxout.c
===================================================================
--- gcc/dbxout.c 2017-10-10 17:55:22.088175460 +0100
+++ gcc/dbxout.c 2017-10-23 11:42:52.011721093 +0100
@@ -3393,12 +3393,16 @@ dbxout_parms (tree parms)
{
++debug_nesting;
emit_pending_bincls_if_required ();
+ fixed_size_mode rtl_mode, type_mode;
for (; parms; parms = DECL_CHAIN (parms))
if (DECL_NAME (parms)
&& TREE_TYPE (parms) != error_mark_node
&& DECL_RTL_SET_P (parms)
- && DECL_INCOMING_RTL (parms))
+ && DECL_INCOMING_RTL (parms)
+ /* We can't represent variable-sized types in this format. */
+ && is_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (parms)), &type_mode)
+ && is_a <fixed_size_mode> (GET_MODE (DECL_RTL (parms)), &rtl_mode))
{
tree eff_type;
char letter;
@@ -3555,10 +3559,9 @@ dbxout_parms (tree parms)
/* Make a big endian correction if the mode of the type of the
parameter is not the same as the mode of the rtl. */
if (BYTES_BIG_ENDIAN
- && TYPE_MODE (TREE_TYPE (parms)) != GET_MODE (DECL_RTL (parms))
- && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))) < UNITS_PER_WORD)
- number += (GET_MODE_SIZE (GET_MODE (DECL_RTL (parms)))
- - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))));
+ && type_mode != rtl_mode
+ && GET_MODE_SIZE (type_mode) < UNITS_PER_WORD)
+ number += GET_MODE_SIZE (rtl_mode) - GET_MODE_SIZE (type_mode);
}
else
/* ??? We don't know how to represent this argument. */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-10-23 11:42:34.915720660 +0100
+++ gcc/expr.c 2017-10-23 11:42:52.013721093 +0100
@@ -2628,9 +2628,10 @@ copy_blkmode_from_reg (rtx target, rtx s
rtx src = NULL, dst = NULL;
unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD);
unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0;
- machine_mode mode = GET_MODE (srcreg);
- machine_mode tmode = GET_MODE (target);
- machine_mode copy_mode;
+ /* No current ABI uses variable-sized modes to pass a BLKmnode type. */
+ fixed_size_mode mode = as_a <fixed_size_mode> (GET_MODE (srcreg));
+ fixed_size_mode tmode = as_a <fixed_size_mode> (GET_MODE (target));
+ fixed_size_mode copy_mode;
/* BLKmode registers created in the back-end shouldn't have survived. */
gcc_assert (mode != BLKmode);
@@ -2728,19 +2729,21 @@ copy_blkmode_from_reg (rtx target, rtx s
}
}
-/* Copy BLKmode value SRC into a register of mode MODE. Return the
+/* Copy BLKmode value SRC into a register of mode MODE_IN. Return the
register if it contains any data, otherwise return null.
This is used on targets that return BLKmode values in registers. */
rtx
-copy_blkmode_to_reg (machine_mode mode, tree src)
+copy_blkmode_to_reg (machine_mode mode_in, tree src)
{
int i, n_regs;
unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0, bytes;
unsigned int bitsize;
rtx *dst_words, dst, x, src_word = NULL_RTX, dst_word = NULL_RTX;
- machine_mode dst_mode;
+ /* No current ABI uses variable-sized modes to pass a BLKmnode type. */
+ fixed_size_mode mode = as_a <fixed_size_mode> (mode_in);
+ fixed_size_mode dst_mode;
gcc_assert (TYPE_MODE (TREE_TYPE (src)) == BLKmode);
Index: gcc/gimple-ssa-store-merging.c
===================================================================
--- gcc/gimple-ssa-store-merging.c 2017-10-09 11:50:52.446411111 +0100
+++ gcc/gimple-ssa-store-merging.c 2017-10-23 11:42:52.014721093 +0100
@@ -401,8 +401,11 @@ encode_tree_to_bitpos (tree expr, unsign
The awkwardness comes from the fact that bitpos is counted from the
most significant bit of a byte. */
+ /* We must be dealing with fixed-size data at this point, since the
+ total size is also fixed. */
+ fixed_size_mode mode = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (expr)));
/* Allocate an extra byte so that we have space to shift into. */
- unsigned int byte_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (expr))) + 1;
+ unsigned int byte_size = GET_MODE_SIZE (mode) + 1;
unsigned char *tmpbuf = XALLOCAVEC (unsigned char, byte_size);
memset (tmpbuf, '\0', byte_size);
/* The store detection code should only have allowed constants that are
Index: gcc/omp-low.c
===================================================================
--- gcc/omp-low.c 2017-10-10 17:55:22.100175459 +0100
+++ gcc/omp-low.c 2017-10-23 11:42:52.015721094 +0100
@@ -5067,8 +5067,10 @@ lower_oacc_reductions (location_t loc, t
v1 = v2 = v3 = var;
/* Determine position in reduction buffer, which may be used
- by target. */
- machine_mode mode = TYPE_MODE (TREE_TYPE (var));
+ by target. The parser has ensured that this is not a
+ variable-sized type. */
+ fixed_size_mode mode
+ = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (var)));
unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
offset = (offset + align - 1) & ~(align - 1);
tree off = build_int_cst (sizetype, offset);
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c 2017-10-23 11:41:41.550050496 +0100
+++ gcc/simplify-rtx.c 2017-10-23 11:42:52.016721094 +0100
@@ -48,8 +48,6 @@ #define HWI_SIGN_EXTEND(low) \
static rtx neg_const_int (machine_mode, const_rtx);
static bool plus_minus_operand_p (const_rtx);
static rtx simplify_plus_minus (enum rtx_code, machine_mode, rtx, rtx);
-static rtx simplify_immed_subreg (machine_mode, rtx, machine_mode,
- unsigned int);
static rtx simplify_associative_operation (enum rtx_code, machine_mode,
rtx, rtx);
static rtx simplify_relational_operation_1 (enum rtx_code, machine_mode,
@@ -5802,8 +5800,8 @@ simplify_ternary_operation (enum rtx_cod
and then repacking them again for OUTERMODE. */
static rtx
-simplify_immed_subreg (machine_mode outermode, rtx op,
- machine_mode innermode, unsigned int byte)
+simplify_immed_subreg (fixed_size_mode outermode, rtx op,
+ fixed_size_mode innermode, unsigned int byte)
{
enum {
value_bit = 8,
@@ -6171,7 +6169,18 @@ simplify_subreg (machine_mode outermode,
|| CONST_DOUBLE_AS_FLOAT_P (op)
|| GET_CODE (op) == CONST_FIXED
|| GET_CODE (op) == CONST_VECTOR)
- return simplify_immed_subreg (outermode, op, innermode, byte);
+ {
+ /* simplify_immed_subreg deconstructs OP into bytes and constructs
+ the result from bytes, so it only works if the sizes of the modes
+ are known at compile time. Cases that apply to general modes
+ should be handled here before calling simplify_immed_subreg. */
+ fixed_size_mode fs_outermode, fs_innermode;
+ if (is_a <fixed_size_mode> (outermode, &fs_outermode)
+ && is_a <fixed_size_mode> (innermode, &fs_innermode))
+ return simplify_immed_subreg (fs_outermode, op, fs_innermode, byte);
+
+ return NULL_RTX;
+ }
/* Changing mode twice with SUBREG => just change it once,
or not at all if changing back op starting mode. */
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c 2017-10-23 11:42:34.927720660 +0100
+++ gcc/varasm.c 2017-10-23 11:42:52.018721094 +0100
@@ -3584,7 +3584,7 @@ struct GTY((chain_next ("%h.next"), for_
rtx constant;
HOST_WIDE_INT offset;
hashval_t hash;
- machine_mode mode;
+ fixed_size_mode mode;
unsigned int align;
int labelno;
int mark;
@@ -3760,10 +3760,11 @@ simplify_subtraction (rtx x)
}
\f
/* Given a constant rtx X, make (or find) a memory constant for its value
- and return a MEM rtx to refer to it in memory. */
+ and return a MEM rtx to refer to it in memory. IN_MODE is the mode
+ of X. */
rtx
-force_const_mem (machine_mode mode, rtx x)
+force_const_mem (machine_mode in_mode, rtx x)
{
struct constant_descriptor_rtx *desc, tmp;
struct rtx_constant_pool *pool;
@@ -3772,6 +3773,11 @@ force_const_mem (machine_mode mode, rtx
hashval_t hash;
unsigned int align;
constant_descriptor_rtx **slot;
+ fixed_size_mode mode;
+
+ /* We can't force variable-sized objects to memory. */
+ if (!is_a <fixed_size_mode> (in_mode, &mode))
+ return NULL_RTX;
/* If we're not allowed to drop X into the constant pool, don't. */
if (targetm.cannot_force_const_mem (mode, x))
@@ -3881,7 +3887,7 @@ get_pool_constant_mark (rtx addr, bool *
/* Similar, return the mode. */
-machine_mode
+fixed_size_mode
get_pool_mode (const_rtx addr)
{
return SYMBOL_REF_CONSTANT (addr)->mode;
@@ -3901,7 +3907,7 @@ constant_pool_empty_p (void)
in MODE with known alignment ALIGN. */
static void
-output_constant_pool_2 (machine_mode mode, rtx x, unsigned int align)
+output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
{
switch (GET_MODE_CLASS (mode))
{
^ permalink raw reply [flat|nested] 90+ messages in thread
* [09/nn] Add a fixed_size_mode_pod class
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (7 preceding siblings ...)
2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
@ 2017-10-23 11:23 ` Richard Sandiford
2017-10-26 11:59 ` Richard Biener
2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
` (12 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:23 UTC (permalink / raw)
To: gcc-patches
This patch adds a POD version of fixed_size_mode. The only current use
is for storing the __builtin_apply and __builtin_result register modes,
which were made fixed_size_modes by the previous patch.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* coretypes.h (fixed_size_mode): Declare.
(fixed_size_mode_pod): New typedef.
* builtins.h (target_builtins::x_apply_args_mode)
(target_builtins::x_apply_result_mode): Change type to
fixed_size_mode_pod.
* builtins.c (apply_args_size, apply_result_size, result_vector)
(expand_builtin_apply_args_1, expand_builtin_apply)
(expand_builtin_return): Update accordingly.
Index: gcc/coretypes.h
===================================================================
--- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
+++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
@@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
class scalar_int_mode;
class scalar_float_mode;
class complex_mode;
+class fixed_size_mode;
template<typename> class opt_mode;
typedef opt_mode<scalar_mode> opt_scalar_mode;
typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
@@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
template<typename> class pod_mode;
typedef pod_mode<scalar_mode> scalar_mode_pod;
typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
+typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
/* Subclasses of rtx_def, using indentation to show the class
hierarchy, along with the relevant invariant.
Index: gcc/builtins.h
===================================================================
--- gcc/builtins.h 2017-08-30 12:18:46.602740973 +0100
+++ gcc/builtins.h 2017-10-23 11:42:57.592545063 +0100
@@ -29,14 +29,14 @@ struct target_builtins {
the register is not used for calling a function. If the machine
has register windows, this gives only the outbound registers.
INCOMING_REGNO gives the corresponding inbound register. */
- machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
+ fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
/* For each register that may be used for returning values, this gives
a mode used to copy the register's value. VOIDmode indicates the
register is not used for returning values. If the machine has
register windows, this gives only the outbound registers.
INCOMING_REGNO gives the corresponding inbound register. */
- machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
+ fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
};
extern struct target_builtins default_target_builtins;
Index: gcc/builtins.c
===================================================================
--- gcc/builtins.c 2017-10-23 11:41:23.140260335 +0100
+++ gcc/builtins.c 2017-10-23 11:42:57.592545063 +0100
@@ -1358,7 +1358,6 @@ apply_args_size (void)
static int size = -1;
int align;
unsigned int regno;
- machine_mode mode;
/* The values computed by this function never change. */
if (size < 0)
@@ -1374,7 +1373,7 @@ apply_args_size (void)
for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
if (FUNCTION_ARG_REGNO_P (regno))
{
- mode = targetm.calls.get_raw_arg_mode (regno);
+ fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
gcc_assert (mode != VOIDmode);
@@ -1386,7 +1385,7 @@ apply_args_size (void)
}
else
{
- apply_args_mode[regno] = VOIDmode;
+ apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
}
}
return size;
@@ -1400,7 +1399,6 @@ apply_result_size (void)
{
static int size = -1;
int align, regno;
- machine_mode mode;
/* The values computed by this function never change. */
if (size < 0)
@@ -1410,7 +1408,7 @@ apply_result_size (void)
for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
if (targetm.calls.function_value_regno_p (regno))
{
- mode = targetm.calls.get_raw_result_mode (regno);
+ fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
gcc_assert (mode != VOIDmode);
@@ -1421,7 +1419,7 @@ apply_result_size (void)
apply_result_mode[regno] = mode;
}
else
- apply_result_mode[regno] = VOIDmode;
+ apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
/* Allow targets that use untyped_call and untyped_return to override
the size so that machine-specific information can be stored here. */
@@ -1440,7 +1438,7 @@ apply_result_size (void)
result_vector (int savep, rtx result)
{
int regno, size, align, nelts;
- machine_mode mode;
+ fixed_size_mode mode;
rtx reg, mem;
rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
@@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
{
rtx registers, tem;
int size, align, regno;
- machine_mode mode;
+ fixed_size_mode mode;
rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
/* Create a block where the arg-pointer, structure value address,
@@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
{
int size, align, regno;
- machine_mode mode;
+ fixed_size_mode mode;
rtx incoming_args, result, reg, dest, src;
rtx_call_insn *call_insn;
rtx old_stack_level = 0;
@@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
expand_builtin_return (rtx result)
{
int size, align, regno;
- machine_mode mode;
+ fixed_size_mode mode;
rtx reg;
rtx_insn *call_fusage = 0;
^ permalink raw reply [flat|nested] 90+ messages in thread
* [10/nn] Widening optab cleanup
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (9 preceding siblings ...)
2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
@ 2017-10-23 11:24 ` Richard Sandiford
2017-10-30 18:32 ` Jeff Law
2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
` (10 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:24 UTC (permalink / raw)
To: gcc-patches
widening_optab_handler had the comment:
/* ??? Why does find_widening_optab_handler_and_mode attempt to
widen things that can't be widened? E.g. add_optab... */
if (op > LAST_CONV_OPTAB)
return CODE_FOR_nothing;
I think it comes from expand_binop using
find_widening_optab_handler_and_mode for two things: to test whether
a "normal" optab like add_optab is supported for a standard binary
operation and to test whether a "convert" optab is supported for a
widening operation like umul_widen_optab. In the former case from_mode
and to_mode must be the same, in the latter from_mode must be narrower
than to_mode.
For the former case, find_widening_optab_handler_and_mode is only really
testing the modes that are passed in. permit_non_widening must be true
here.
For the latter case, find_widening_optab_handler_and_mode should only
really consider new from_modes that are wider than the original
from_mode and narrower than the original to_mode. Logically
permit_non_widening should be false, since widening optabs aren't
supposed to take operands that are the same width as the destination.
We get away with permit_non_widening being true because no target
would/should define a widening .md pattern with matching modes.
But really, it seems better for expand_binop to handle these two
cases itself rather than pushing them down. With that change,
find_widening_optab_handler_and_mode is only ever called with
permit_non_widening set to false and is only ever called with
a "proper" convert optab. We then no longer need widening_optab_handler,
we can just use convert_optab_handler directly.
The patch also passes the instruction code down to expand_binop_directly.
This should be more efficient and removes an extra call to
find_widening_optab_handler_and_mode.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* optabs-query.h (convert_optab_p): New function, split out from...
(convert_optab_handler): ...here.
(widening_optab_handler): Delete.
(find_widening_optab_handler): Remove permit_non_widening parameter.
(find_widening_optab_handler_and_mode): Likewise. Provide an
override that operates on mode class wrappers.
* optabs-query.c (widening_optab_handler): Delete.
(find_widening_optab_handler_and_mode): Remove permit_non_widening
parameter. Assert that the two modes are the same class and that
the "from" mode is narrower than the "to" mode. Use
convert_optab_handler instead of widening_optab_handler.
* expmed.c (expmed_mult_highpart_optab): Use convert_optab_handler
instead of widening_optab_handler.
* expr.c (expand_expr_real_2): Update calls to
find_widening_optab_handler.
* optabs.c (expand_widen_pattern_expr): Likewise.
(expand_binop_directly): Take the insn_code as a parameter.
(expand_binop): Only call find_widening_optab_handler for
conversion optabs; use optab_handler otherwise. Update calls
to find_widening_optab_handler and expand_binop_directly.
Use convert_optab_handler instead of widening_optab_handler.
* tree-ssa-math-opts.c (convert_mult_to_widen): Update calls to
find_widening_optab_handler and use scalar_mode rather than
machine_mode.
(convert_plusminus_to_widen): Likewise.
Index: gcc/optabs-query.h
===================================================================
--- gcc/optabs-query.h 2017-09-14 17:04:19.080694343 +0100
+++ gcc/optabs-query.h 2017-10-23 11:43:01.517673716 +0100
@@ -23,6 +23,14 @@ #define GCC_OPTABS_QUERY_H
#include "insn-opinit.h"
#include "target.h"
+/* Return true if OP is a conversion optab. */
+
+inline bool
+convert_optab_p (optab op)
+{
+ return op > unknown_optab && op <= LAST_CONV_OPTAB;
+}
+
/* Return the insn used to implement mode MODE of OP, or CODE_FOR_nothing
if the target does not have such an insn. */
@@ -43,7 +51,7 @@ convert_optab_handler (convert_optab op,
machine_mode from_mode)
{
unsigned scode = (op << 16) | (from_mode << 8) | to_mode;
- gcc_assert (op > unknown_optab && op <= LAST_CONV_OPTAB);
+ gcc_assert (convert_optab_p (op));
return raw_optab_handler (scode);
}
@@ -167,12 +175,11 @@ enum insn_code can_float_p (machine_mode
enum insn_code can_fix_p (machine_mode, machine_mode, int, bool *);
bool can_conditionally_move_p (machine_mode mode);
bool can_vec_perm_p (machine_mode, bool, vec_perm_indices *);
-enum insn_code widening_optab_handler (optab, machine_mode, machine_mode);
/* Find a widening optab even if it doesn't widen as much as we want. */
-#define find_widening_optab_handler(A,B,C,D) \
- find_widening_optab_handler_and_mode (A, B, C, D, NULL)
+#define find_widening_optab_handler(A, B, C) \
+ find_widening_optab_handler_and_mode (A, B, C, NULL)
enum insn_code find_widening_optab_handler_and_mode (optab, machine_mode,
- machine_mode, int,
+ machine_mode,
machine_mode *);
int can_mult_highpart_p (machine_mode, bool);
bool can_vec_mask_load_store_p (machine_mode, machine_mode, bool);
@@ -181,4 +188,20 @@ bool can_atomic_exchange_p (machine_mode
bool can_atomic_load_p (machine_mode);
bool lshift_cheap_p (bool);
+/* Version of find_widening_optab_handler_and_mode that operates on
+ specific mode types. */
+
+template<typename T>
+inline enum insn_code
+find_widening_optab_handler_and_mode (optab op, const T &to_mode,
+ const T &from_mode, T *found_mode)
+{
+ machine_mode tmp;
+ enum insn_code icode = find_widening_optab_handler_and_mode
+ (op, machine_mode (to_mode), machine_mode (from_mode), &tmp);
+ if (icode != CODE_FOR_nothing && found_mode)
+ *found_mode = as_a <T> (tmp);
+ return icode;
+}
+
#endif
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c 2017-09-25 13:57:21.028734061 +0100
+++ gcc/optabs-query.c 2017-10-23 11:43:01.517673716 +0100
@@ -401,44 +401,20 @@ can_vec_perm_p (machine_mode mode, bool
return true;
}
-/* Like optab_handler, but for widening_operations that have a
- TO_MODE and a FROM_MODE. */
-
-enum insn_code
-widening_optab_handler (optab op, machine_mode to_mode,
- machine_mode from_mode)
-{
- unsigned scode = (op << 16) | to_mode;
- if (to_mode != from_mode && from_mode != VOIDmode)
- {
- /* ??? Why does find_widening_optab_handler_and_mode attempt to
- widen things that can't be widened? E.g. add_optab... */
- if (op > LAST_CONV_OPTAB)
- return CODE_FOR_nothing;
- scode |= from_mode << 8;
- }
- return raw_optab_handler (scode);
-}
-
/* Find a widening optab even if it doesn't widen as much as we want.
E.g. if from_mode is HImode, and to_mode is DImode, and there is no
- direct HI->SI insn, then return SI->DI, if that exists.
- If PERMIT_NON_WIDENING is non-zero then this can be used with
- non-widening optabs also. */
+ direct HI->SI insn, then return SI->DI, if that exists. */
enum insn_code
find_widening_optab_handler_and_mode (optab op, machine_mode to_mode,
machine_mode from_mode,
- int permit_non_widening,
machine_mode *found_mode)
{
- for (; (permit_non_widening || from_mode != to_mode)
- && GET_MODE_SIZE (from_mode) <= GET_MODE_SIZE (to_mode)
- && from_mode != VOIDmode;
- from_mode = GET_MODE_WIDER_MODE (from_mode).else_void ())
+ gcc_checking_assert (GET_MODE_CLASS (from_mode) == GET_MODE_CLASS (to_mode));
+ gcc_checking_assert (from_mode < to_mode);
+ FOR_EACH_MODE (from_mode, from_mode, to_mode)
{
- enum insn_code handler = widening_optab_handler (op, to_mode,
- from_mode);
+ enum insn_code handler = convert_optab_handler (op, to_mode, from_mode);
if (handler != CODE_FOR_nothing)
{
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c 2017-10-23 11:42:34.914720660 +0100
+++ gcc/expmed.c 2017-10-23 11:43:01.515743957 +0100
@@ -3701,7 +3701,7 @@ expmed_mult_highpart_optab (scalar_int_m
/* Try widening multiplication. */
moptab = unsignedp ? umul_widen_optab : smul_widen_optab;
- if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
+ if (convert_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
&& mul_widen_cost (speed, wider_mode) < max_cost)
{
tem = expand_binop (wider_mode, moptab, op0, narrow_op1, 0,
@@ -3740,7 +3740,7 @@ expmed_mult_highpart_optab (scalar_int_m
/* Try widening multiplication of opposite signedness, and adjust. */
moptab = unsignedp ? smul_widen_optab : umul_widen_optab;
- if (widening_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
+ if (convert_optab_handler (moptab, wider_mode, mode) != CODE_FOR_nothing
&& size - 1 < BITS_PER_WORD
&& (mul_widen_cost (speed, wider_mode)
+ 2 * shift_cost (speed, mode, size-1)
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-10-23 11:42:52.013721093 +0100
+++ gcc/expr.c 2017-10-23 11:43:01.517673716 +0100
@@ -8640,7 +8640,7 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
{
machine_mode innermode = TYPE_MODE (TREE_TYPE (treeop0));
this_optab = usmul_widen_optab;
- if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+ if (find_widening_optab_handler (this_optab, mode, innermode)
!= CODE_FOR_nothing)
{
if (TYPE_UNSIGNED (TREE_TYPE (treeop0)))
@@ -8675,7 +8675,7 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
if (TREE_CODE (treeop0) != INTEGER_CST)
{
- if (find_widening_optab_handler (this_optab, mode, innermode, 0)
+ if (find_widening_optab_handler (this_optab, mode, innermode)
!= CODE_FOR_nothing)
{
expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1,
@@ -8697,7 +8697,7 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
unsignedp, this_optab);
return REDUCE_BIT_FIELD (temp);
}
- if (find_widening_optab_handler (other_optab, mode, innermode, 0)
+ if (find_widening_optab_handler (other_optab, mode, innermode)
!= CODE_FOR_nothing
&& innermode == word_mode)
{
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:42:34.919720660 +0100
+++ gcc/optabs.c 2017-10-23 11:43:01.518638595 +0100
@@ -264,7 +264,7 @@ expand_widen_pattern_expr (sepops ops, r
|| ops->code == WIDEN_MULT_MINUS_EXPR)
icode = find_widening_optab_handler (widen_pattern_optab,
TYPE_MODE (TREE_TYPE (ops->op2)),
- tmode0, 0);
+ tmode0);
else
icode = optab_handler (widen_pattern_optab, tmode0);
gcc_assert (icode != CODE_FOR_nothing);
@@ -989,17 +989,14 @@ avoid_expensive_constant (machine_mode m
}
/* Helper function for expand_binop: handle the case where there
- is an insn that directly implements the indicated operation.
+ is an insn ICODE that directly implements the indicated operation.
Returns null if this is not possible. */
static rtx
-expand_binop_directly (machine_mode mode, optab binoptab,
+expand_binop_directly (enum insn_code icode, machine_mode mode, optab binoptab,
rtx op0, rtx op1,
rtx target, int unsignedp, enum optab_methods methods,
rtx_insn *last)
{
- machine_mode from_mode = widened_mode (mode, op0, op1);
- enum insn_code icode = find_widening_optab_handler (binoptab, mode,
- from_mode, 1);
machine_mode xmode0 = insn_data[(int) icode].operand[1].mode;
machine_mode xmode1 = insn_data[(int) icode].operand[2].mode;
machine_mode mode0, mode1, tmp_mode;
@@ -1123,6 +1120,7 @@ expand_binop (machine_mode mode, optab b
= (methods == OPTAB_LIB || methods == OPTAB_LIB_WIDEN
? OPTAB_WIDEN : methods);
enum mode_class mclass;
+ enum insn_code icode;
machine_mode wider_mode;
scalar_int_mode int_mode;
rtx libfunc;
@@ -1156,23 +1154,30 @@ expand_binop (machine_mode mode, optab b
/* If we can do it with a three-operand insn, do so. */
- if (methods != OPTAB_MUST_WIDEN
- && find_widening_optab_handler (binoptab, mode,
- widened_mode (mode, op0, op1), 1)
- != CODE_FOR_nothing)
+ if (methods != OPTAB_MUST_WIDEN)
{
- temp = expand_binop_directly (mode, binoptab, op0, op1, target,
- unsignedp, methods, last);
- if (temp)
- return temp;
+ if (convert_optab_p (binoptab))
+ {
+ machine_mode from_mode = widened_mode (mode, op0, op1);
+ icode = find_widening_optab_handler (binoptab, mode, from_mode);
+ }
+ else
+ icode = optab_handler (binoptab, mode);
+ if (icode != CODE_FOR_nothing)
+ {
+ temp = expand_binop_directly (icode, mode, binoptab, op0, op1,
+ target, unsignedp, methods, last);
+ if (temp)
+ return temp;
+ }
}
/* If we were trying to rotate, and that didn't work, try rotating
the other direction before falling back to shifts and bitwise-or. */
if (((binoptab == rotl_optab
- && optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
+ && (icode = optab_handler (rotr_optab, mode)) != CODE_FOR_nothing)
|| (binoptab == rotr_optab
- && optab_handler (rotl_optab, mode) != CODE_FOR_nothing))
+ && (icode = optab_handler (rotl_optab, mode)) != CODE_FOR_nothing))
&& is_int_mode (mode, &int_mode))
{
optab otheroptab = (binoptab == rotl_optab ? rotr_optab : rotl_optab);
@@ -1188,7 +1193,7 @@ expand_binop (machine_mode mode, optab b
gen_int_mode (bits, GET_MODE (op1)), op1,
NULL_RTX, unsignedp, OPTAB_DIRECT);
- temp = expand_binop_directly (int_mode, otheroptab, op0, newop1,
+ temp = expand_binop_directly (icode, int_mode, otheroptab, op0, newop1,
target, unsignedp, methods, last);
if (temp)
return temp;
@@ -1235,7 +1240,8 @@ expand_binop (machine_mode mode, optab b
else if (binoptab == rotr_optab)
otheroptab = vrotr_optab;
- if (otheroptab && optab_handler (otheroptab, mode) != CODE_FOR_nothing)
+ if (otheroptab
+ && (icode = optab_handler (otheroptab, mode)) != CODE_FOR_nothing)
{
/* The scalar may have been extended to be too wide. Truncate
it back to the proper size to fit in the broadcast vector. */
@@ -1249,7 +1255,7 @@ expand_binop (machine_mode mode, optab b
rtx vop1 = expand_vector_broadcast (mode, op1);
if (vop1)
{
- temp = expand_binop_directly (mode, otheroptab, op0, vop1,
+ temp = expand_binop_directly (icode, mode, otheroptab, op0, vop1,
target, unsignedp, methods, last);
if (temp)
return temp;
@@ -1272,7 +1278,7 @@ expand_binop (machine_mode mode, optab b
&& (find_widening_optab_handler ((unsignedp
? umul_widen_optab
: smul_widen_optab),
- next_mode, mode, 0)
+ next_mode, mode)
!= CODE_FOR_nothing)))
{
rtx xop0 = op0, xop1 = op1;
@@ -1703,7 +1709,7 @@ expand_binop (machine_mode mode, optab b
&& optab_handler (add_optab, word_mode) != CODE_FOR_nothing)
{
rtx product = NULL_RTX;
- if (widening_optab_handler (umul_widen_optab, int_mode, word_mode)
+ if (convert_optab_handler (umul_widen_optab, int_mode, word_mode)
!= CODE_FOR_nothing)
{
product = expand_doubleword_mult (int_mode, op0, op1, target,
@@ -1713,7 +1719,7 @@ expand_binop (machine_mode mode, optab b
}
if (product == NULL_RTX
- && (widening_optab_handler (smul_widen_optab, int_mode, word_mode)
+ && (convert_optab_handler (smul_widen_optab, int_mode, word_mode)
!= CODE_FOR_nothing))
{
product = expand_doubleword_mult (int_mode, op0, op1, target,
@@ -1806,10 +1812,13 @@ expand_binop (machine_mode mode, optab b
if (CLASS_HAS_WIDER_MODES_P (mclass))
{
+ /* This code doesn't make sense for conversion optabs, since we
+ wouldn't then want to extend the operands to be the same size
+ as the result. */
+ gcc_assert (!convert_optab_p (binoptab));
FOR_EACH_WIDER_MODE (wider_mode, mode)
{
- if (find_widening_optab_handler (binoptab, wider_mode, mode, 1)
- != CODE_FOR_nothing
+ if (optab_handler (binoptab, wider_mode)
|| (methods == OPTAB_LIB
&& optab_libfunc (binoptab, wider_mode)))
{
Index: gcc/tree-ssa-math-opts.c
===================================================================
--- gcc/tree-ssa-math-opts.c 2017-10-09 11:51:27.664982724 +0100
+++ gcc/tree-ssa-math-opts.c 2017-10-23 11:43:01.519603474 +0100
@@ -3242,7 +3242,7 @@ convert_mult_to_widen (gimple *stmt, gim
{
tree lhs, rhs1, rhs2, type, type1, type2;
enum insn_code handler;
- machine_mode to_mode, from_mode, actual_mode;
+ scalar_int_mode to_mode, from_mode, actual_mode;
optab op;
int actual_precision;
location_t loc = gimple_location (stmt);
@@ -3269,7 +3269,7 @@ convert_mult_to_widen (gimple *stmt, gim
op = usmul_widen_optab;
handler = find_widening_optab_handler_and_mode (op, to_mode, from_mode,
- 0, &actual_mode);
+ &actual_mode);
if (handler == CODE_FOR_nothing)
{
@@ -3290,7 +3290,7 @@ convert_mult_to_widen (gimple *stmt, gim
op = smul_widen_optab;
handler = find_widening_optab_handler_and_mode (op, to_mode,
- from_mode, 0,
+ from_mode,
&actual_mode);
if (handler == CODE_FOR_nothing)
@@ -3350,8 +3350,7 @@ convert_plusminus_to_widen (gimple_stmt_
optab this_optab;
enum tree_code wmult_code;
enum insn_code handler;
- scalar_mode to_mode, from_mode;
- machine_mode actual_mode;
+ scalar_mode to_mode, from_mode, actual_mode;
location_t loc = gimple_location (stmt);
int actual_precision;
bool from_unsigned1, from_unsigned2;
@@ -3509,7 +3508,7 @@ convert_plusminus_to_widen (gimple_stmt_
this transformation is likely to pessimize code. */
this_optab = optab_for_tree_code (wmult_code, optype, optab_default);
handler = find_widening_optab_handler_and_mode (this_optab, to_mode,
- from_mode, 0, &actual_mode);
+ from_mode, &actual_mode);
if (handler == CODE_FOR_nothing)
return false;
^ permalink raw reply [flat|nested] 90+ messages in thread
* [11/nn] Add narrower_subreg_mode helper function
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (8 preceding siblings ...)
2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
@ 2017-10-23 11:24 ` Richard Sandiford
2017-10-30 15:06 ` Jeff Law
2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
` (11 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:24 UTC (permalink / raw)
To: gcc-patches
This patch adds a narrowing equivalent of wider_subreg_mode. At present
there is only one user.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtl.h (narrower_subreg_mode): New function.
* ira-color.c (update_costs_from_allocno): Use it.
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-23 11:44:06.562686090 +0100
+++ gcc/rtl.h 2017-10-23 11:44:15.916785881 +0100
@@ -2972,6 +2972,16 @@ subreg_lowpart_offset (machine_mode oute
}
/* Given that a subreg has outer mode OUTERMODE and inner mode INNERMODE,
+ return the smaller of the two modes if they are different sizes,
+ otherwise return the outer mode. */
+
+inline machine_mode
+narrower_subreg_mode (machine_mode outermode, machine_mode innermode)
+{
+ return paradoxical_subreg_p (outermode, innermode) ? innermode : outermode;
+}
+
+/* Given that a subreg has outer mode OUTERMODE and inner mode INNERMODE,
return the mode that is big enough to hold both the outer and inner
values. Prefer the outer mode in the event of a tie. */
Index: gcc/ira-color.c
===================================================================
--- gcc/ira-color.c 2017-10-23 11:44:11.500538024 +0100
+++ gcc/ira-color.c 2017-10-23 11:44:15.915819948 +0100
@@ -1367,15 +1367,14 @@ update_costs_from_allocno (ira_allocno_t
|| ALLOCNO_ASSIGNED_P (another_allocno))
continue;
- if (GET_MODE_SIZE (ALLOCNO_MODE (cp->second)) < GET_MODE_SIZE (mode))
- /* If we have different modes use the smallest one. It is
- a sub-register move. It is hard to predict what LRA
- will reload (the pseudo or its sub-register) but LRA
- will try to minimize the data movement. Also for some
- register classes bigger modes might be invalid,
- e.g. DImode for AREG on x86. For such cases the
- register move cost will be maximal. */
- mode = ALLOCNO_MODE (cp->second);
+ /* If we have different modes use the smallest one. It is
+ a sub-register move. It is hard to predict what LRA
+ will reload (the pseudo or its sub-register) but LRA
+ will try to minimize the data movement. Also for some
+ register classes bigger modes might be invalid,
+ e.g. DImode for AREG on x86. For such cases the
+ register move cost will be maximal. */
+ mode = narrower_subreg_mode (mode, ALLOCNO_MODE (cp->second));
cost = (cp->second == allocno
? ira_register_move_cost[mode][rclass][aclass]
^ permalink raw reply [flat|nested] 90+ messages in thread
* [13/nn] More is_a <scalar_int_mode>
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (11 preceding siblings ...)
2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
@ 2017-10-23 11:25 ` Richard Sandiford
2017-10-26 12:03 ` Richard Biener
2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
` (8 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:25 UTC (permalink / raw)
To: gcc-patches
alias.c:find_base_term and find_base_value checked:
if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
but (a) comparing the precision seems more correct, since it's possible
for modes to have the same memory size as Pmode but fewer bits and
(b) the functions are called on arbitrary rtl, so there's no guarantee
that we're handling an integer truncation.
Since there's no point processing truncations of anything other than an
integer, this patch checks that first.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* alias.c (find_base_value, find_base_term): Only process integer
truncations. Check the precision rather than the size.
Index: gcc/alias.c
===================================================================
--- gcc/alias.c 2017-10-23 11:41:25.511925516 +0100
+++ gcc/alias.c 2017-10-23 11:44:27.544693078 +0100
@@ -1349,6 +1349,7 @@ known_base_value_p (rtx x)
find_base_value (rtx src)
{
unsigned int regno;
+ scalar_int_mode int_mode;
#if defined (FIND_BASE_TERM)
/* Try machine-dependent ways to find the base term. */
@@ -1475,7 +1476,8 @@ find_base_value (rtx src)
address modes depending on the address space. */
if (!target_default_pointer_address_modes_p ())
break;
- if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
+ if (!is_a <scalar_int_mode> (GET_MODE (src), &int_mode)
+ || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
break;
/* Fall through. */
case HIGH:
@@ -1876,6 +1878,7 @@ find_base_term (rtx x)
cselib_val *val;
struct elt_loc_list *l, *f;
rtx ret;
+ scalar_int_mode int_mode;
#if defined (FIND_BASE_TERM)
/* Try machine-dependent ways to find the base term. */
@@ -1893,7 +1896,8 @@ find_base_term (rtx x)
address modes depending on the address space. */
if (!target_default_pointer_address_modes_p ())
return 0;
- if (GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (Pmode))
+ if (!is_a <scalar_int_mode> (GET_MODE (x), &int_mode)
+ || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
return 0;
/* Fall through. */
case HIGH:
^ permalink raw reply [flat|nested] 90+ messages in thread
* [12/nn] Add an is_narrower_int_mode helper function
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (10 preceding siblings ...)
2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
@ 2017-10-23 11:25 ` Richard Sandiford
2017-10-26 11:59 ` Richard Biener
2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
` (9 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:25 UTC (permalink / raw)
To: gcc-patches
This patch adds a function for testing whether an arbitrary mode X
is an integer mode that is narrower than integer mode Y. This is
useful for code like expand_float and expand_fix that could in
principle handle vectors as well as scalars.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* machmode.h (is_narrower_int_mode): New function
* optabs.c (expand_float, expand_fix): Use it.
* dwarf2out.c (rotate_loc_descriptor): Likewise.
Index: gcc/machmode.h
===================================================================
--- gcc/machmode.h 2017-10-23 11:44:06.561720156 +0100
+++ gcc/machmode.h 2017-10-23 11:44:23.979432614 +0100
@@ -893,6 +893,17 @@ is_complex_float_mode (machine_mode mode
return false;
}
+/* Return true if MODE is a scalar integer mode with a precision
+ smaller than LIMIT's precision. */
+
+inline bool
+is_narrower_int_mode (machine_mode mode, scalar_int_mode limit)
+{
+ scalar_int_mode int_mode;
+ return (is_a <scalar_int_mode> (mode, &int_mode)
+ && GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (limit));
+}
+
namespace mode_iterator
{
/* Start mode iterator *ITER at the first mode in class MCLASS, if any. */
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:44:07.732431531 +0100
+++ gcc/optabs.c 2017-10-23 11:44:23.980398548 +0100
@@ -4820,7 +4820,7 @@ expand_float (rtx to, rtx from, int unsi
rtx value;
convert_optab tab = unsignedp ? ufloat_optab : sfloat_optab;
- if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION (SImode))
+ if (is_narrower_int_mode (GET_MODE (from), SImode))
from = convert_to_mode (SImode, from, unsignedp);
libfunc = convert_optab_libfunc (tab, GET_MODE (to), GET_MODE (from));
@@ -5002,7 +5002,7 @@ expand_fix (rtx to, rtx from, int unsign
that the mode of TO is at least as wide as SImode, since those are the
only library calls we know about. */
- if (GET_MODE_PRECISION (GET_MODE (to)) < GET_MODE_PRECISION (SImode))
+ if (is_narrower_int_mode (GET_MODE (to), SImode))
{
target = gen_reg_rtx (SImode);
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c 2017-10-23 11:44:05.684652559 +0100
+++ gcc/dwarf2out.c 2017-10-23 11:44:23.979432614 +0100
@@ -14530,8 +14530,7 @@ rotate_loc_descriptor (rtx rtl, scalar_i
dw_loc_descr_ref op0, op1, ret, mask[2] = { NULL, NULL };
int i;
- if (GET_MODE (rtlop1) != VOIDmode
- && GET_MODE_BITSIZE (GET_MODE (rtlop1)) < GET_MODE_BITSIZE (mode))
+ if (is_narrower_int_mode (GET_MODE (rtlop1), mode))
rtlop1 = gen_rtx_ZERO_EXTEND (mode, rtlop1);
op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
VAR_INIT_STATUS_INITIALIZED);
^ permalink raw reply [flat|nested] 90+ messages in thread
* [14/nn] Add helpers for shift count modes
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (12 preceding siblings ...)
2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
@ 2017-10-23 11:26 ` Richard Sandiford
2017-10-26 12:07 ` Richard Biener
2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
` (7 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:26 UTC (permalink / raw)
To: gcc-patches
This patch adds a stub helper routine to provide the mode
of a scalar shift amount, given the mode of the values
being shifted.
One long-standing problem has been to decide what this mode
should be for arbitrary rtxes (as opposed to those directly
tied to a target pattern). Is it the mode of the shifted
elements? Is it word_mode? Or maybe QImode? Is it whatever
the corresponding target pattern says? (In which case what
should the mode be when the target doesn't have a pattern?)
For now the patch picks word_mode, which should be safe on
all targets but could perhaps become suboptimal if the helper
routine is used more often than it is in this patch. As it
stands the patch does not change the generated code.
The patch also adds a helper function that constructs rtxes
for constant shift amounts, again given the mode of the value
being shifted. As well as helping with the SVE patches, this
is one step towards allowing CONST_INTs to have a real mode.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* target.h (get_shift_amount_mode): New function.
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN_INT.
* calls.c (shift_return_value): Likewise.
* cse.c (fold_rtx): Likewise.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
(expand_shift, expand_smod_pow2): Likewise.
* lower-subreg.c (shift_cost): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
* combine.c (try_combine, find_split_point, force_int_to_mode)
(simplify_shift_const_1, simplify_shift_const): Likewise.
(change_zero_ext): Likewise. Use simplify_gen_binary.
* optabs.c (expand_superword_shift, expand_doubleword_mult)
(expand_unop): Use gen_int_shift_amount instead of GEN_INT.
(expand_binop): Likewise. Use get_shift_amount_mode instead
of word_mode as the mode of a CONST_INT shift amount.
(shift_amt_for_vec_perm_mask): Add a machine_mode argument.
Use gen_int_shift_amount instead of GEN_INT.
(expand_vec_perm): Update caller accordingly. Use
gen_int_shift_amount instead of GEN_INT.
Index: gcc/target.h
===================================================================
--- gcc/target.h 2017-10-23 11:47:06.643477568 +0100
+++ gcc/target.h 2017-10-23 11:47:11.277288162 +0100
@@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
extern struct gcc_target targetm;
+/* Return the mode that should be used to hold a scalar shift amount
+ when shifting values of the given mode. */
+/* ??? This could in principle be generated automatically from the .md
+ shift patterns, but for now word_mode should be universally OK. */
+
+inline scalar_int_mode
+get_shift_amount_mode (machine_mode)
+{
+ return word_mode;
+}
+
#ifdef GCC_TM_H
#ifndef CUMULATIVE_ARGS_MAGIC
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h 2017-10-23 11:47:06.643477568 +0100
+++ gcc/emit-rtl.h 2017-10-23 11:47:11.274393237 +0100
@@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
extern void adjust_reg_mode (rtx, machine_mode);
extern int mem_expr_equal_p (const_tree, const_tree);
+extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
extern bool need_atomic_barrier_p (enum memmodel, bool);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/emit-rtl.c 2017-10-23 11:47:11.273428262 +0100
@@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
}
}
+/* Return a constant shift amount for shifting a value of mode MODE
+ by VALUE bits. */
+
+rtx
+gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
+{
+ return gen_int_mode (value, get_shift_amount_mode (mode));
+}
+
/* Initialize fields of rtl_data related to stack alignment. */
void
Index: gcc/asan.c
===================================================================
--- gcc/asan.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/asan.c 2017-10-23 11:47:11.270533336 +0100
@@ -1388,7 +1388,7 @@ asan_emit_stack_protection (rtx base, rt
TREE_ASM_WRITTEN (id) = 1;
emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
shadow_base = expand_binop (Pmode, lshr_optab, base,
- GEN_INT (ASAN_SHADOW_SHIFT),
+ gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
NULL_RTX, 1, OPTAB_DIRECT);
shadow_base
= plus_constant (Pmode, shadow_base,
Index: gcc/calls.c
===================================================================
--- gcc/calls.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/calls.c 2017-10-23 11:47:11.270533336 +0100
@@ -2749,15 +2749,17 @@ shift_return_value (machine_mode mode, b
HOST_WIDE_INT shift;
gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
- shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
+ machine_mode value_mode = GET_MODE (value);
+ shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
if (shift == 0)
return false;
/* Use ashr rather than lshr for right shifts. This is for the benefit
of the MIPS port, which requires SImode values to be sign-extended
when stored in 64-bit registers. */
- if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
- value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
+ if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
+ value, gen_int_shift_amount (value_mode, shift),
+ value, 1, OPTAB_WIDEN))
gcc_unreachable ();
return true;
}
Index: gcc/cse.c
===================================================================
--- gcc/cse.c 2017-10-23 11:47:03.707058235 +0100
+++ gcc/cse.c 2017-10-23 11:47:11.273428262 +0100
@@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
|| INTVAL (const_arg1) < 0))
{
if (SHIFT_COUNT_TRUNCATED)
- canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
- & (GET_MODE_UNIT_BITSIZE (mode)
- - 1));
+ canon_const_arg1 = gen_int_shift_amount
+ (mode, (INTVAL (const_arg1)
+ & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
else
break;
}
@@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
|| INTVAL (inner_const) < 0))
{
if (SHIFT_COUNT_TRUNCATED)
- inner_const = GEN_INT (INTVAL (inner_const)
- & (GET_MODE_UNIT_BITSIZE (mode)
- - 1));
+ inner_const = gen_int_shift_amount
+ (mode, (INTVAL (inner_const)
+ & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
else
break;
}
@@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
/* As an exception, we can turn an ASHIFTRT of this
form into a shift of the number of bits - 1. */
if (code == ASHIFTRT)
- new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
+ new_const = gen_int_shift_amount
+ (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
else if (!side_effects_p (XEXP (y, 0)))
return CONST0_RTX (mode);
else
Index: gcc/dse.c
===================================================================
--- gcc/dse.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/dse.c 2017-10-23 11:47:11.273428262 +0100
@@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
store_mode, byte);
if (ret && CONSTANT_P (ret))
{
+ rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
- ret, GEN_INT (shift));
+ ret, shift_rtx);
if (ret && CONSTANT_P (ret))
{
byte = subreg_lowpart_offset (read_mode, new_mode);
@@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
of one dsp where the cost of these two was not the same. But
this really is a rare case anyway. */
target = expand_binop (new_mode, lshr_optab, new_reg,
- GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+ gen_int_shift_amount (new_mode, shift),
+ new_reg, 1, OPTAB_DIRECT);
shift_seq = get_insns ();
end_sequence ();
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/expmed.c 2017-10-23 11:47:11.274393237 +0100
@@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
PUT_MODE (all->zext, wider_mode);
PUT_MODE (all->wide_mult, wider_mode);
PUT_MODE (all->wide_lshr, wider_mode);
- XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
+ XEXP (all->wide_lshr, 1)
+ = gen_int_shift_amount (wider_mode, mode_bitsize);
set_mul_widen_cost (speed, wider_mode,
set_src_cost (all->wide_mult, wider_mode, speed));
@@ -908,12 +909,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
to make sure that for big-endian machines the higher order
bits are used. */
if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
- value_word = simplify_expand_binop (word_mode, lshr_optab,
- value_word,
- GEN_INT (BITS_PER_WORD
- - new_bitsize),
- NULL_RTX, true,
- OPTAB_LIB_WIDEN);
+ {
+ int shift = BITS_PER_WORD - new_bitsize;
+ rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
+ value_word = simplify_expand_binop (word_mode, lshr_optab,
+ value_word, shift_rtx,
+ NULL_RTX, true,
+ OPTAB_LIB_WIDEN);
+ }
if (!store_bit_field_1 (op0, new_bitsize,
bitnum + bit_offset,
@@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
if (CONST_INT_P (op1)
&& ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
(unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
- op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
- % GET_MODE_BITSIZE (scalar_mode));
+ op1 = gen_int_shift_amount (mode,
+ (unsigned HOST_WIDE_INT) INTVAL (op1)
+ % GET_MODE_BITSIZE (scalar_mode));
else if (GET_CODE (op1) == SUBREG
&& subreg_lowpart_p (op1)
&& SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
@@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
&& IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
GET_MODE_BITSIZE (scalar_mode) - 1))
{
- op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
+ op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
+ - INTVAL (op1)));
left = !left;
code = left ? LROTATE_EXPR : RROTATE_EXPR;
}
@@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
if (op1 == const0_rtx)
return shifted;
else if (CONST_INT_P (op1))
- other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
- - INTVAL (op1));
+ other_amount = gen_int_shift_amount
+ (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
else
{
other_amount
@@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
int amount, rtx target, int unsignedp)
{
- return expand_shift_1 (code, mode,
- shifted, GEN_INT (amount), target, unsignedp);
+ return expand_shift_1 (code, mode, shifted,
+ gen_int_shift_amount (mode, amount),
+ target, unsignedp);
}
/* Likewise, but return 0 if that cannot be done. */
@@ -3855,7 +3861,7 @@ expand_smod_pow2 (scalar_int_mode mode,
{
HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
signmask = force_reg (mode, signmask);
- shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
+ shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
/* Use the rtx_cost of a LSHIFTRT instruction to determine
which instruction sequence to use. If logical right shifts
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/lower-subreg.c 2017-10-23 11:47:11.274393237 +0100
@@ -129,7 +129,7 @@ shift_cost (bool speed_p, struct cost_rt
PUT_CODE (rtxes->shift, code);
PUT_MODE (rtxes->shift, mode);
PUT_MODE (rtxes->source, mode);
- XEXP (rtxes->shift, 1) = GEN_INT (op1);
+ XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
return set_src_cost (rtxes->shift, mode, speed_p);
}
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/simplify-rtx.c 2017-10-23 11:47:11.277288162 +0100
@@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
if (STORE_FLAG_VALUE == 1)
{
temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
- GEN_INT (isize - 1));
+ gen_int_shift_amount (inner,
+ isize - 1));
if (int_mode == inner)
return temp;
if (GET_MODE_PRECISION (int_mode) > isize)
@@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
else if (STORE_FLAG_VALUE == -1)
{
temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
- GEN_INT (isize - 1));
+ gen_int_shift_amount (inner,
+ isize - 1));
if (int_mode == inner)
return temp;
if (GET_MODE_PRECISION (int_mode) > isize)
@@ -2679,7 +2681,8 @@ simplify_binary_operation_1 (enum rtx_co
{
val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
if (val >= 0)
- return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
+ return simplify_gen_binary (ASHIFT, mode, op0,
+ gen_int_shift_amount (mode, val));
}
/* x*2 is x+x and x*(-1) is -x */
@@ -3303,7 +3306,8 @@ simplify_binary_operation_1 (enum rtx_co
/* Convert divide by power of two into shift. */
if (CONST_INT_P (trueop1)
&& (val = exact_log2 (UINTVAL (trueop1))) > 0)
- return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
+ return simplify_gen_binary (LSHIFTRT, mode, op0,
+ gen_int_shift_amount (mode, val));
break;
case DIV:
@@ -3423,10 +3427,12 @@ simplify_binary_operation_1 (enum rtx_co
&& IN_RANGE (INTVAL (trueop1),
GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
GET_MODE_UNIT_PRECISION (mode) - 1))
- return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
- mode, op0,
- GEN_INT (GET_MODE_UNIT_PRECISION (mode)
- - INTVAL (trueop1)));
+ {
+ int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
+ rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
+ return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
+ mode, op0, new_amount_rtx);
+ }
#endif
/* FALLTHRU */
case ASHIFTRT:
@@ -3466,8 +3472,8 @@ simplify_binary_operation_1 (enum rtx_co
== GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
&& subreg_lowpart_p (op0))
{
- rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
- + INTVAL (op1));
+ rtx tmp = gen_int_shift_amount
+ (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
tmp = simplify_gen_binary (code, inner_mode,
XEXP (SUBREG_REG (op0), 0),
tmp);
@@ -3478,7 +3484,8 @@ simplify_binary_operation_1 (enum rtx_co
{
val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
if (val != INTVAL (op1))
- return simplify_gen_binary (code, mode, op0, GEN_INT (val));
+ return simplify_gen_binary (code, mode, op0,
+ gen_int_shift_amount (mode, val));
}
break;
Index: gcc/combine.c
===================================================================
--- gcc/combine.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/combine.c 2017-10-23 11:47:11.272463287 +0100
@@ -3773,8 +3773,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
&& INTVAL (XEXP (*split, 1)) > 0
&& (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
{
+ rtx i_rtx = gen_int_shift_amount (split_mode, i);
SUBST (*split, gen_rtx_ASHIFT (split_mode,
- XEXP (*split, 0), GEN_INT (i)));
+ XEXP (*split, 0), i_rtx));
/* Update split_code because we may not have a multiply
anymore. */
split_code = GET_CODE (*split);
@@ -3788,8 +3789,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
&& (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
{
rtx nsplit = XEXP (*split, 0);
+ rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
- XEXP (nsplit, 0), GEN_INT (i)));
+ XEXP (nsplit, 0),
+ i_rtx));
/* Update split_code because we may not have a multiply
anymore. */
split_code = GET_CODE (*split);
@@ -5057,12 +5060,12 @@ find_split_point (rtx *loc, rtx_insn *in
GET_MODE (XEXP (SET_SRC (x), 0))))))
{
machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
-
+ rtx pos_rtx = gen_int_shift_amount (mode, pos);
SUBST (SET_SRC (x),
gen_rtx_NEG (mode,
gen_rtx_LSHIFTRT (mode,
XEXP (SET_SRC (x), 0),
- GEN_INT (pos))));
+ pos_rtx)));
split = find_split_point (&SET_SRC (x), insn, true);
if (split && split != &SET_SRC (x))
@@ -5120,11 +5123,11 @@ find_split_point (rtx *loc, rtx_insn *in
{
unsigned HOST_WIDE_INT mask
= (HOST_WIDE_INT_1U << len) - 1;
+ rtx pos_rtx = gen_int_shift_amount (mode, pos);
SUBST (SET_SRC (x),
gen_rtx_AND (mode,
gen_rtx_LSHIFTRT
- (mode, gen_lowpart (mode, inner),
- GEN_INT (pos)),
+ (mode, gen_lowpart (mode, inner), pos_rtx),
gen_int_mode (mask, mode)));
split = find_split_point (&SET_SRC (x), insn, true);
@@ -5133,14 +5136,15 @@ find_split_point (rtx *loc, rtx_insn *in
}
else
{
+ int left_bits = GET_MODE_PRECISION (mode) - len - pos;
+ int right_bits = GET_MODE_PRECISION (mode) - len;
SUBST (SET_SRC (x),
gen_rtx_fmt_ee
(unsignedp ? LSHIFTRT : ASHIFTRT, mode,
gen_rtx_ASHIFT (mode,
gen_lowpart (mode, inner),
- GEN_INT (GET_MODE_PRECISION (mode)
- - len - pos)),
- GEN_INT (GET_MODE_PRECISION (mode) - len)));
+ gen_int_shift_amount (mode, left_bits)),
+ gen_int_shift_amount (mode, right_bits)));
split = find_split_point (&SET_SRC (x), insn, true);
if (split && split != &SET_SRC (x))
@@ -8915,10 +8919,11 @@ force_int_to_mode (rtx x, scalar_int_mod
/* Must be more sign bit copies than the mask needs. */
&& ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>= exact_log2 (mask + 1)))
- x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
- GEN_INT (GET_MODE_PRECISION (xmode)
- - exact_log2 (mask + 1)));
-
+ {
+ int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
+ x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
+ gen_int_shift_amount (xmode, nbits));
+ }
goto shiftrt;
case ASHIFTRT:
@@ -10415,7 +10420,7 @@ simplify_shift_const_1 (enum rtx_code co
{
enum rtx_code orig_code = code;
rtx orig_varop = varop;
- int count;
+ int count, log2;
machine_mode mode = result_mode;
machine_mode shift_mode;
scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
@@ -10618,13 +10623,11 @@ simplify_shift_const_1 (enum rtx_code co
is cheaper. But it is still better on those machines to
merge two shifts into one. */
if (CONST_INT_P (XEXP (varop, 1))
- && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+ && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
{
- varop
- = simplify_gen_binary (ASHIFT, GET_MODE (varop),
- XEXP (varop, 0),
- GEN_INT (exact_log2 (
- UINTVAL (XEXP (varop, 1)))));
+ rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+ varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
+ XEXP (varop, 0), log2_rtx);
continue;
}
break;
@@ -10632,13 +10635,11 @@ simplify_shift_const_1 (enum rtx_code co
case UDIV:
/* Similar, for when divides are cheaper. */
if (CONST_INT_P (XEXP (varop, 1))
- && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
+ && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
{
- varop
- = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
- XEXP (varop, 0),
- GEN_INT (exact_log2 (
- UINTVAL (XEXP (varop, 1)))));
+ rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
+ varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
+ XEXP (varop, 0), log2_rtx);
continue;
}
break;
@@ -10773,10 +10774,10 @@ simplify_shift_const_1 (enum rtx_code co
mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
int_result_mode);
-
+ rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
mask_rtx
= simplify_const_binary_operation (code, int_result_mode,
- mask_rtx, GEN_INT (count));
+ mask_rtx, count_rtx);
/* Give up if we can't compute an outer operation to use. */
if (mask_rtx == 0
@@ -10832,9 +10833,10 @@ simplify_shift_const_1 (enum rtx_code co
if (code == ASHIFTRT && int_mode != int_result_mode)
break;
+ rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
rtx new_rtx = simplify_const_binary_operation (code, int_mode,
XEXP (varop, 0),
- GEN_INT (count));
+ count_rtx);
varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
count = 0;
continue;
@@ -10900,7 +10902,7 @@ simplify_shift_const_1 (enum rtx_code co
&& (new_rtx = simplify_const_binary_operation
(code, int_result_mode,
gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
- GEN_INT (count))) != 0
+ gen_int_shift_amount (int_result_mode, count))) != 0
&& CONST_INT_P (new_rtx)
&& merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
INTVAL (new_rtx), int_result_mode,
@@ -11043,7 +11045,7 @@ simplify_shift_const_1 (enum rtx_code co
&& (new_rtx = simplify_const_binary_operation
(ASHIFT, int_result_mode,
gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
- GEN_INT (count))) != 0
+ gen_int_shift_amount (int_result_mode, count))) != 0
&& CONST_INT_P (new_rtx)
&& merge_outer_ops (&outer_op, &outer_const, PLUS,
INTVAL (new_rtx), int_result_mode,
@@ -11064,7 +11066,7 @@ simplify_shift_const_1 (enum rtx_code co
&& (new_rtx = simplify_const_binary_operation
(code, int_result_mode,
gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
- GEN_INT (count))) != 0
+ gen_int_shift_amount (int_result_mode, count))) != 0
&& CONST_INT_P (new_rtx)
&& merge_outer_ops (&outer_op, &outer_const, XOR,
INTVAL (new_rtx), int_result_mode,
@@ -11119,12 +11121,12 @@ simplify_shift_const_1 (enum rtx_code co
- GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
{
rtx varop_inner = XEXP (varop, 0);
-
- varop_inner
- = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
- XEXP (varop_inner, 0),
- GEN_INT
- (count + INTVAL (XEXP (varop_inner, 1))));
+ int new_count = count + INTVAL (XEXP (varop_inner, 1));
+ rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
+ new_count);
+ varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
+ XEXP (varop_inner, 0),
+ new_count_rtx);
varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
count = 0;
continue;
@@ -11176,7 +11178,8 @@ simplify_shift_const_1 (enum rtx_code co
x = NULL_RTX;
if (x == NULL_RTX)
- x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
+ x = simplify_gen_binary (code, shift_mode, varop,
+ gen_int_shift_amount (shift_mode, count));
/* If we were doing an LSHIFTRT in a wider mode than it was originally,
turn off all the bits that the shift would have turned off. */
@@ -11238,7 +11241,8 @@ simplify_shift_const (rtx x, enum rtx_co
return tem;
if (!x)
- x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
+ x = simplify_gen_binary (code, GET_MODE (varop), varop,
+ gen_int_shift_amount (GET_MODE (varop), count));
if (GET_MODE (x) != result_mode)
x = gen_lowpart (result_mode, x);
return x;
@@ -11429,8 +11433,9 @@ change_zero_ext (rtx pat)
if (BITS_BIG_ENDIAN)
start = GET_MODE_PRECISION (inner_mode) - size - start;
- if (start)
- x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
+ if (start != 0)
+ x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
+ gen_int_shift_amount (inner_mode, start));
else
x = XEXP (x, 0);
if (mode != inner_mode)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-10-23 11:47:06.643477568 +0100
+++ gcc/optabs.c 2017-10-23 11:47:11.276323187 +0100
@@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
if (binoptab != ashr_optab)
emit_move_insn (outof_target, CONST0_RTX (word_mode));
else
- if (!force_expand_binop (word_mode, binoptab,
- outof_input, GEN_INT (BITS_PER_WORD - 1),
+ if (!force_expand_binop (word_mode, binoptab, outof_input,
+ gen_int_shift_amount (word_mode,
+ BITS_PER_WORD - 1),
outof_target, unsignedp, methods))
return false;
}
@@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
{
int low = (WORDS_BIG_ENDIAN ? 1 : 0);
int high = (WORDS_BIG_ENDIAN ? 0 : 1);
- rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
+ rtx wordm1 = (umulp ? NULL_RTX
+ : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
rtx product, adjust, product_high, temp;
rtx op0_high = operand_subword_force (op0, high, mode);
@@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
unsigned int bits = GET_MODE_PRECISION (int_mode);
if (CONST_INT_P (op1))
- newop1 = GEN_INT (bits - INTVAL (op1));
+ newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
newop1 = negate_rtx (GET_MODE (op1), op1);
else
@@ -1399,11 +1401,11 @@ expand_binop (machine_mode mode, optab b
shift_mask = targetm.shift_truncation_mask (word_mode);
op1_mode = (GET_MODE (op1) != VOIDmode
? as_a <scalar_int_mode> (GET_MODE (op1))
- : word_mode);
+ : get_shift_amount_mode (word_mode));
/* Apply the truncation to constant shifts. */
if (double_shift_mask > 0 && CONST_INT_P (op1))
- op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
+ op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
if (op1 == CONST0_RTX (op1_mode))
return op0;
@@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
else
{
rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
- rtx first_shift_count, second_shift_count;
+ HOST_WIDE_INT first_shift_count, second_shift_count;
optab reverse_unsigned_shift, unsigned_shift;
reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
@@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
if (shift_count > BITS_PER_WORD)
{
- first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
- second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
+ first_shift_count = shift_count - BITS_PER_WORD;
+ second_shift_count = 2 * BITS_PER_WORD - shift_count;
}
else
{
- first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
- second_shift_count = GEN_INT (shift_count);
+ first_shift_count = BITS_PER_WORD - shift_count;
+ second_shift_count = shift_count;
}
+ rtx first_shift_count_rtx
+ = gen_int_shift_amount (word_mode, first_shift_count);
+ rtx second_shift_count_rtx
+ = gen_int_shift_amount (word_mode, second_shift_count);
into_temp1 = expand_binop (word_mode, unsigned_shift,
- outof_input, first_shift_count,
+ outof_input, first_shift_count_rtx,
NULL_RTX, unsignedp, next_methods);
into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
- into_input, second_shift_count,
+ into_input, second_shift_count_rtx,
NULL_RTX, unsignedp, next_methods);
if (into_temp1 != 0 && into_temp2 != 0)
@@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
emit_move_insn (into_target, inter);
outof_temp1 = expand_binop (word_mode, unsigned_shift,
- into_input, first_shift_count,
+ into_input, first_shift_count_rtx,
NULL_RTX, unsignedp, next_methods);
outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
- outof_input, second_shift_count,
+ outof_input, second_shift_count_rtx,
NULL_RTX, unsignedp, next_methods);
if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
@@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
{
- temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
- unsignedp, OPTAB_DIRECT);
+ temp = expand_binop (mode, rotl_optab, op0,
+ gen_int_shift_amount (mode, 8),
+ target, unsignedp, OPTAB_DIRECT);
if (temp)
return temp;
}
if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
{
- temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
- unsignedp, OPTAB_DIRECT);
+ temp = expand_binop (mode, rotr_optab, op0,
+ gen_int_shift_amount (mode, 8),
+ target, unsignedp, OPTAB_DIRECT);
if (temp)
return temp;
}
last = get_last_insn ();
- temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
+ temp1 = expand_binop (mode, ashl_optab, op0,
+ gen_int_shift_amount (mode, 8), NULL_RTX,
unsignedp, OPTAB_WIDEN);
- temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
+ temp2 = expand_binop (mode, lshr_optab, op0,
+ gen_int_shift_amount (mode, 8), NULL_RTX,
unsignedp, OPTAB_WIDEN);
if (temp1 && temp2)
{
@@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
}
/* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
- vec_perm operand, assuming the second operand is a constant vector of zeroes.
- Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
- shift. */
+ vec_perm operand (which has mode OP0_MODE), assuming the second
+ operand is a constant vector of zeroes. Return the shift distance in
+ bits if so, or NULL_RTX if the vec_perm is not a shift. */
static rtx
-shift_amt_for_vec_perm_mask (rtx sel)
+shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
{
unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
@@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
return NULL_RTX;
}
- return GEN_INT (first * bitsize);
+ return gen_int_shift_amount (op0_mode, first * bitsize);
}
/* A subroutine of expand_vec_perm for expanding one vec_perm insn. */
@@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
&& (shift_code != CODE_FOR_nothing
|| shift_code_qi != CODE_FOR_nothing))
{
- shift_amt = shift_amt_for_vec_perm_mask (sel);
+ shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
if (shift_amt)
{
struct expand_operand ops[3];
@@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
NULL, 0, OPTAB_DIRECT);
else
sel = expand_simple_binop (selmode, ASHIFT, sel,
- GEN_INT (exact_log2 (u)),
+ gen_int_shift_amount (selmode,
+ exact_log2 (u)),
NULL, 0, OPTAB_DIRECT);
gcc_assert (sel != NULL);
^ permalink raw reply [flat|nested] 90+ messages in thread
* [16/nn] Factor out the mode handling in lower-subreg.c
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (13 preceding siblings ...)
2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
@ 2017-10-23 11:27 ` Richard Sandiford
2017-10-26 12:09 ` Richard Biener
2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
` (6 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:27 UTC (permalink / raw)
To: gcc-patches
This patch adds a helper routine (interesting_mode_p) to lower-subreg.c,
to make the decision about whether a mode can be split and, if so,
calculate the number of bytes and words in the mode. At present this
function always returns true; a later patch will add cases in which it
can return false.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* lower-subreg.c (interesting_mode_p): New function.
(compute_costs, find_decomposable_subregs, decompose_register)
(simplify_subreg_concatn, can_decompose_p, resolve_simple_move)
(resolve_clobber, dump_choices): Use it.
Index: gcc/lower-subreg.c
===================================================================
--- gcc/lower-subreg.c 2017-10-23 11:47:11.274393237 +0100
+++ gcc/lower-subreg.c 2017-10-23 11:47:23.555013148 +0100
@@ -103,6 +103,18 @@ #define twice_word_mode \
#define choices \
this_target_lower_subreg->x_choices
+/* Return true if MODE is a mode we know how to lower. When returning true,
+ store its byte size in *BYTES and its word size in *WORDS. */
+
+static inline bool
+interesting_mode_p (machine_mode mode, unsigned int *bytes,
+ unsigned int *words)
+{
+ *bytes = GET_MODE_SIZE (mode);
+ *words = CEIL (*bytes, UNITS_PER_WORD);
+ return true;
+}
+
/* RTXes used while computing costs. */
struct cost_rtxes {
/* Source and target registers. */
@@ -199,10 +211,10 @@ compute_costs (bool speed_p, struct cost
for (i = 0; i < MAX_MACHINE_MODE; i++)
{
machine_mode mode = (machine_mode) i;
- int factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
- if (factor > 1)
+ unsigned int size, factor;
+ if (interesting_mode_p (mode, &size, &factor) && factor > 1)
{
- int mode_move_cost;
+ unsigned int mode_move_cost;
PUT_MODE (rtxes->target, mode);
PUT_MODE (rtxes->source, mode);
@@ -469,10 +481,10 @@ find_decomposable_subregs (rtx *loc, enu
continue;
}
- outer_size = GET_MODE_SIZE (GET_MODE (x));
- inner_size = GET_MODE_SIZE (GET_MODE (inner));
- outer_words = (outer_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
- inner_words = (inner_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ if (!interesting_mode_p (GET_MODE (x), &outer_size, &outer_words)
+ || !interesting_mode_p (GET_MODE (inner), &inner_size,
+ &inner_words))
+ continue;
/* We only try to decompose single word subregs of multi-word
registers. When we find one, we return -1 to avoid iterating
@@ -507,7 +519,7 @@ find_decomposable_subregs (rtx *loc, enu
}
else if (REG_P (x))
{
- unsigned int regno;
+ unsigned int regno, size, words;
/* We will see an outer SUBREG before we see the inner REG, so
when we see a plain REG here it means a direct reference to
@@ -527,7 +539,8 @@ find_decomposable_subregs (rtx *loc, enu
regno = REGNO (x);
if (!HARD_REGISTER_NUM_P (regno)
- && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
+ && interesting_mode_p (GET_MODE (x), &size, &words)
+ && words > 1)
{
switch (*pcmi)
{
@@ -567,15 +580,15 @@ find_decomposable_subregs (rtx *loc, enu
decompose_register (unsigned int regno)
{
rtx reg;
- unsigned int words, i;
+ unsigned int size, words, i;
rtvec v;
reg = regno_reg_rtx[regno];
regno_reg_rtx[regno] = NULL_RTX;
- words = GET_MODE_SIZE (GET_MODE (reg));
- words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ if (!interesting_mode_p (GET_MODE (reg), &size, &words))
+ gcc_unreachable ();
v = rtvec_alloc (words);
for (i = 0; i < words; ++i)
@@ -599,25 +612,29 @@ decompose_register (unsigned int regno)
simplify_subreg_concatn (machine_mode outermode, rtx op,
unsigned int byte)
{
- unsigned int inner_size;
+ unsigned int outer_size, outer_words, inner_size, inner_words;
machine_mode innermode, partmode;
rtx part;
unsigned int final_offset;
+ innermode = GET_MODE (op);
+ if (!interesting_mode_p (outermode, &outer_size, &outer_words)
+ || !interesting_mode_p (innermode, &inner_size, &inner_words))
+ gcc_unreachable ();
+
gcc_assert (GET_CODE (op) == CONCATN);
- gcc_assert (byte % GET_MODE_SIZE (outermode) == 0);
+ gcc_assert (byte % outer_size == 0);
- innermode = GET_MODE (op);
- gcc_assert (byte < GET_MODE_SIZE (innermode));
- if (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))
+ gcc_assert (byte < inner_size);
+ if (outer_size > inner_size)
return NULL_RTX;
- inner_size = GET_MODE_SIZE (innermode) / XVECLEN (op, 0);
+ inner_size /= XVECLEN (op, 0);
part = XVECEXP (op, 0, byte / inner_size);
partmode = GET_MODE (part);
final_offset = byte % inner_size;
- if (final_offset + GET_MODE_SIZE (outermode) > inner_size)
+ if (final_offset + outer_size > inner_size)
return NULL_RTX;
/* VECTOR_CSTs in debug expressions are expanded into CONCATN instead of
@@ -801,9 +818,10 @@ can_decompose_p (rtx x)
if (HARD_REGISTER_NUM_P (regno))
{
- unsigned int byte, num_bytes;
+ unsigned int byte, num_bytes, num_words;
- num_bytes = GET_MODE_SIZE (GET_MODE (x));
+ if (!interesting_mode_p (GET_MODE (x), &num_bytes, &num_words))
+ return false;
for (byte = 0; byte < num_bytes; byte += UNITS_PER_WORD)
if (simplify_subreg_regno (regno, GET_MODE (x), byte, word_mode) < 0)
return false;
@@ -826,14 +844,15 @@ resolve_simple_move (rtx set, rtx_insn *
rtx src, dest, real_dest;
rtx_insn *insns;
machine_mode orig_mode, dest_mode;
- unsigned int words;
+ unsigned int orig_size, words;
bool pushing;
src = SET_SRC (set);
dest = SET_DEST (set);
orig_mode = GET_MODE (dest);
- words = (GET_MODE_SIZE (orig_mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ if (!interesting_mode_p (orig_mode, &orig_size, &words))
+ gcc_unreachable ();
gcc_assert (words > 1);
start_sequence ();
@@ -964,7 +983,7 @@ resolve_simple_move (rtx set, rtx_insn *
{
unsigned int i, j, jinc;
- gcc_assert (GET_MODE_SIZE (orig_mode) % UNITS_PER_WORD == 0);
+ gcc_assert (orig_size % UNITS_PER_WORD == 0);
gcc_assert (GET_CODE (XEXP (dest, 0)) != PRE_MODIFY);
gcc_assert (GET_CODE (XEXP (dest, 0)) != POST_MODIFY);
@@ -1059,7 +1078,7 @@ resolve_clobber (rtx pat, rtx_insn *insn
{
rtx reg;
machine_mode orig_mode;
- unsigned int words, i;
+ unsigned int orig_size, words, i;
int ret;
reg = XEXP (pat, 0);
@@ -1067,8 +1086,8 @@ resolve_clobber (rtx pat, rtx_insn *insn
return false;
orig_mode = GET_MODE (reg);
- words = GET_MODE_SIZE (orig_mode);
- words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ if (!interesting_mode_p (orig_mode, &orig_size, &words))
+ gcc_unreachable ();
ret = validate_change (NULL_RTX, &XEXP (pat, 0),
simplify_gen_subreg_concatn (word_mode, reg,
@@ -1332,12 +1351,13 @@ dump_shift_choices (enum rtx_code code,
static void
dump_choices (bool speed_p, const char *description)
{
- unsigned int i;
+ unsigned int size, factor, i;
fprintf (dump_file, "Choices when optimizing for %s:\n", description);
for (i = 0; i < MAX_MACHINE_MODE; i++)
- if (GET_MODE_SIZE ((machine_mode) i) > UNITS_PER_WORD)
+ if (interesting_mode_p ((machine_mode) i, &size, &factor)
+ && factor > 1)
fprintf (dump_file, " %s mode %s for copy lowering.\n",
choices[speed_p].move_modes_to_split[i]
? "Splitting"
^ permalink raw reply [flat|nested] 90+ messages in thread
* [15/nn] Use more specific hash functions in rtlhash.c
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (14 preceding siblings ...)
2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
@ 2017-10-23 11:27 ` Richard Sandiford
2017-10-26 12:08 ` Richard Biener
2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
` (5 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:27 UTC (permalink / raw)
To: gcc-patches
Avoid using add_object when we have more specific routines available.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* rtlhash.c (add_rtx): Use add_hwi for 'w' and add_int for 'i'.
Index: gcc/rtlhash.c
===================================================================
--- gcc/rtlhash.c 2017-02-23 19:54:03.000000000 +0000
+++ gcc/rtlhash.c 2017-10-23 11:47:20.120201389 +0100
@@ -77,11 +77,11 @@ add_rtx (const_rtx x, hash &hstate)
switch (fmt[i])
{
case 'w':
- hstate.add_object (XWINT (x, i));
+ hstate.add_hwi (XWINT (x, i));
break;
case 'n':
case 'i':
- hstate.add_object (XINT (x, i));
+ hstate.add_int (XINT (x, i));
break;
case 'V':
case 'E':
^ permalink raw reply [flat|nested] 90+ messages in thread
* [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (15 preceding siblings ...)
2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
@ 2017-10-23 11:28 ` Richard Sandiford
2017-10-26 12:10 ` Richard Biener
2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
` (4 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:28 UTC (permalink / raw)
To: gcc-patches
This avoids the double evaluation mentioned in the comments and
simplifies the change to make MEM_OFFSET variable.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* var-tracking.c (INT_MEM_OFFSET): Replace with...
(int_mem_offset): ...this new function.
(var_mem_set, var_mem_delete_and_set, var_mem_delete)
(find_mem_expr_in_1pdv, dataflow_set_preserve_mem_locs)
(same_variable_part_p, use_type, add_stores, vt_get_decl_and_offset):
Update accordingly.
Index: gcc/var-tracking.c
===================================================================
--- gcc/var-tracking.c 2017-09-12 14:28:56.401824826 +0100
+++ gcc/var-tracking.c 2017-10-23 11:47:27.197231712 +0100
@@ -390,8 +390,15 @@ struct variable
/* Pointer to the BB's information specific to variable tracking pass. */
#define VTI(BB) ((variable_tracking_info *) (BB)->aux)
-/* Macro to access MEM_OFFSET as an HOST_WIDE_INT. Evaluates MEM twice. */
-#define INT_MEM_OFFSET(mem) (MEM_OFFSET_KNOWN_P (mem) ? MEM_OFFSET (mem) : 0)
+/* Return MEM_OFFSET (MEM) as a HOST_WIDE_INT, or 0 if we can't. */
+
+static inline HOST_WIDE_INT
+int_mem_offset (const_rtx mem)
+{
+ if (MEM_OFFSET_KNOWN_P (mem))
+ return MEM_OFFSET (mem);
+ return 0;
+}
#if CHECKING_P && (GCC_VERSION >= 2007)
@@ -2336,7 +2343,7 @@ var_mem_set (dataflow_set *set, rtx loc,
rtx set_src)
{
tree decl = MEM_EXPR (loc);
- HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
+ HOST_WIDE_INT offset = int_mem_offset (loc);
var_mem_decl_set (set, loc, initialized,
dv_from_decl (decl), offset, set_src, INSERT);
@@ -2354,7 +2361,7 @@ var_mem_delete_and_set (dataflow_set *se
enum var_init_status initialized, rtx set_src)
{
tree decl = MEM_EXPR (loc);
- HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
+ HOST_WIDE_INT offset = int_mem_offset (loc);
clobber_overlapping_mems (set, loc);
decl = var_debug_decl (decl);
@@ -2375,7 +2382,7 @@ var_mem_delete_and_set (dataflow_set *se
var_mem_delete (dataflow_set *set, rtx loc, bool clobber)
{
tree decl = MEM_EXPR (loc);
- HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
+ HOST_WIDE_INT offset = int_mem_offset (loc);
clobber_overlapping_mems (set, loc);
decl = var_debug_decl (decl);
@@ -4618,7 +4625,7 @@ find_mem_expr_in_1pdv (tree expr, rtx va
for (node = var->var_part[0].loc_chain; node; node = node->next)
if (MEM_P (node->loc)
&& MEM_EXPR (node->loc) == expr
- && INT_MEM_OFFSET (node->loc) == 0)
+ && int_mem_offset (node->loc) == 0)
{
where = node;
break;
@@ -4683,7 +4690,7 @@ dataflow_set_preserve_mem_locs (variable
/* We want to remove dying MEMs that don't refer to DECL. */
if (GET_CODE (loc->loc) == MEM
&& (MEM_EXPR (loc->loc) != decl
- || INT_MEM_OFFSET (loc->loc) != 0)
+ || int_mem_offset (loc->loc) != 0)
&& mem_dies_at_call (loc->loc))
break;
/* We want to move here MEMs that do refer to DECL. */
@@ -4727,7 +4734,7 @@ dataflow_set_preserve_mem_locs (variable
if (GET_CODE (loc->loc) != MEM
|| (MEM_EXPR (loc->loc) == decl
- && INT_MEM_OFFSET (loc->loc) == 0)
+ && int_mem_offset (loc->loc) == 0)
|| !mem_dies_at_call (loc->loc))
{
if (old_loc != loc->loc && emit_notes)
@@ -5254,7 +5261,7 @@ same_variable_part_p (rtx loc, tree expr
else if (MEM_P (loc))
{
expr2 = MEM_EXPR (loc);
- offset2 = INT_MEM_OFFSET (loc);
+ offset2 = int_mem_offset (loc);
}
else
return false;
@@ -5522,7 +5529,7 @@ use_type (rtx loc, struct count_use_info
return MO_CLOBBER;
else if (target_for_debug_bind (var_debug_decl (expr)))
return MO_CLOBBER;
- else if (track_loc_p (loc, expr, INT_MEM_OFFSET (loc),
+ else if (track_loc_p (loc, expr, int_mem_offset (loc),
false, modep, NULL)
/* Multi-part variables shouldn't refer to one-part
variable names such as VALUEs (never happens) or
@@ -6017,7 +6024,7 @@ add_stores (rtx loc, const_rtx expr, voi
rtx xexpr = gen_rtx_SET (loc, src);
if (same_variable_part_p (SET_SRC (xexpr),
MEM_EXPR (loc),
- INT_MEM_OFFSET (loc)))
+ int_mem_offset (loc)))
mo.type = MO_COPY;
else
mo.type = MO_SET;
@@ -9579,7 +9586,7 @@ vt_get_decl_and_offset (rtx rtl, tree *d
if (MEM_ATTRS (rtl))
{
*declp = MEM_EXPR (rtl);
- *offsetp = INT_MEM_OFFSET (rtl);
+ *offsetp = int_mem_offset (rtl);
return true;
}
}
^ permalink raw reply [flat|nested] 90+ messages in thread
* [19/nn] Don't treat zero-sized ranges as overlapping
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (16 preceding siblings ...)
2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
@ 2017-10-23 11:29 ` Richard Sandiford
2017-10-26 12:14 ` Richard Biener
2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
` (3 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:29 UTC (permalink / raw)
To: gcc-patches
Most GCC ranges seem to be represented as an offset and a size (rather
than a start and inclusive end or start and exclusive end). The usual
test for whether X is in a range is of course:
x >= start && x < start + size
or:
x >= start && x - start < size
which means that an empty range of size 0 contains nothing. But other
range tests aren't as obvious.
The usual test for whether one range is contained within another
range is:
start1 >= start2 && start1 + size1 <= start2 + size2
while the test for whether two ranges overlap (from ranges_overlap_p) is:
(start1 >= start2 && start1 < start2 + size2)
|| (start2 >= start1 && start2 < start1 + size1)
i.e. the ranges overlap if one range contains the start of the other
range. This leads to strange results like:
(start X, size 0) is a subrange of (start X, size 0) but
(start X, size 0) does not overlap (start X, size 0)
Similarly:
(start 4, size 0) is a subrange of (start 2, size 2) but
(start 4, size 0) does not overlap (start 2, size 2)
It seems like "X is a subrange of Y" should imply "X overlaps Y".
This becomes harder to ignore with the runtime sizes and offsets
added for SVE. The most obvious fix seemed to be to say that
an empty range does not overlap anything, and is therefore not
a subrange of anything.
Using the new definition of subranges didn't seem to cause any
codegen differences in the testsuite. But there was one change
with the new definition of overlapping ranges. strncpy-chk.c has:
memset (dst, 0, sizeof (dst));
if (strncpy (dst, src, 0) != dst || strcmp (dst, ""))
abort();
The strncpy is detected as a zero-size write, and so with the new
definition of overlapping ranges, we treat the strncpy as having
no effect on the strcmp (which is true). The reaching definition
is the memset instead.
This patch makes ranges_overlap_p return false for zero-sized
ranges, even if the other range has an unknown size.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-alias.h (ranges_overlap_p): Return false if either
range is known to be empty.
Index: gcc/tree-ssa-alias.h
===================================================================
--- gcc/tree-ssa-alias.h 2017-03-28 16:19:22.000000000 +0100
+++ gcc/tree-ssa-alias.h 2017-10-23 11:47:38.181155696 +0100
@@ -171,6 +171,8 @@ ranges_overlap_p (HOST_WIDE_INT pos1,
HOST_WIDE_INT pos2,
unsigned HOST_WIDE_INT size2)
{
+ if (size1 == 0 || size2 == 0)
+ return false;
if (pos1 >= pos2
&& (size2 == (unsigned HOST_WIDE_INT)-1
|| pos1 < (pos2 + (HOST_WIDE_INT) size2)))
^ permalink raw reply [flat|nested] 90+ messages in thread
* [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (17 preceding siblings ...)
2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
@ 2017-10-23 11:29 ` Richard Sandiford
2017-10-26 12:13 ` Richard Biener
2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
` (2 subsequent siblings)
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:29 UTC (permalink / raw)
To: gcc-patches
This patch avoids some calculations of the form:
GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)
in simplify-rtx.c. If we're dealing with CONST_VECTORs, it's better
to use CONST_VECTOR_NUNITS, since that remains constant even after the
SVE patches. In other cases we can get the number from GET_MODE_NUNITS.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
and CONST_VECTOR_NUNITS instead of computing the number of units from
the byte sizes of the vector and element.
(simplify_binary_operation_1): Likewise.
(simplify_const_binary_operation): Likewise.
(simplify_ternary_operation): Likewise.
Index: gcc/simplify-rtx.c
===================================================================
--- gcc/simplify-rtx.c 2017-10-23 11:47:11.277288162 +0100
+++ gcc/simplify-rtx.c 2017-10-23 11:47:32.868935554 +0100
@@ -1752,18 +1752,12 @@ simplify_const_unary_operation (enum rtx
return gen_const_vec_duplicate (mode, op);
if (GET_CODE (op) == CONST_VECTOR)
{
- int elt_size = GET_MODE_UNIT_SIZE (mode);
- unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
- rtvec v = rtvec_alloc (n_elts);
- unsigned int i;
-
- machine_mode inmode = GET_MODE (op);
- int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
- unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
-
+ unsigned int n_elts = GET_MODE_NUNITS (mode);
+ unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
gcc_assert (in_n_elts < n_elts);
gcc_assert ((n_elts % in_n_elts) == 0);
- for (i = 0; i < n_elts; i++)
+ rtvec v = rtvec_alloc (n_elts);
+ for (unsigned i = 0; i < n_elts; i++)
RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
return gen_rtx_CONST_VECTOR (mode, v);
}
@@ -3608,9 +3602,7 @@ simplify_binary_operation_1 (enum rtx_co
rtx op0 = XEXP (trueop0, 0);
rtx op1 = XEXP (trueop0, 1);
- machine_mode opmode = GET_MODE (op0);
- int elt_size = GET_MODE_UNIT_SIZE (opmode);
- int n_elts = GET_MODE_SIZE (opmode) / elt_size;
+ int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
int i = INTVAL (XVECEXP (trueop1, 0, 0));
int elem;
@@ -3637,21 +3629,8 @@ simplify_binary_operation_1 (enum rtx_co
mode01 = GET_MODE (op01);
/* Find out number of elements of each operand. */
- if (VECTOR_MODE_P (mode00))
- {
- elt_size = GET_MODE_UNIT_SIZE (mode00);
- n_elts00 = GET_MODE_SIZE (mode00) / elt_size;
- }
- else
- n_elts00 = 1;
-
- if (VECTOR_MODE_P (mode01))
- {
- elt_size = GET_MODE_UNIT_SIZE (mode01);
- n_elts01 = GET_MODE_SIZE (mode01) / elt_size;
- }
- else
- n_elts01 = 1;
+ n_elts00 = GET_MODE_NUNITS (mode00);
+ n_elts01 = GET_MODE_NUNITS (mode01);
gcc_assert (n_elts == n_elts00 + n_elts01);
@@ -3771,9 +3750,8 @@ simplify_binary_operation_1 (enum rtx_co
rtx subop1 = XEXP (trueop0, 1);
machine_mode mode0 = GET_MODE (subop0);
machine_mode mode1 = GET_MODE (subop1);
- int li = GET_MODE_UNIT_SIZE (mode0);
- int l0 = GET_MODE_SIZE (mode0) / li;
- int l1 = GET_MODE_SIZE (mode1) / li;
+ int l0 = GET_MODE_NUNITS (mode0);
+ int l1 = GET_MODE_NUNITS (mode1);
int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
{
@@ -3931,14 +3909,10 @@ simplify_binary_operation_1 (enum rtx_co
|| CONST_SCALAR_INT_P (trueop1)
|| CONST_DOUBLE_AS_FLOAT_P (trueop1)))
{
- int elt_size = GET_MODE_UNIT_SIZE (mode);
- unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
+ unsigned n_elts = GET_MODE_NUNITS (mode);
+ unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
rtvec v = rtvec_alloc (n_elts);
unsigned int i;
- unsigned in_n_elts = 1;
-
- if (VECTOR_MODE_P (op0_mode))
- in_n_elts = (GET_MODE_SIZE (op0_mode) / elt_size);
for (i = 0; i < n_elts; i++)
{
if (i < in_n_elts)
@@ -4026,16 +4000,12 @@ simplify_const_binary_operation (enum rt
&& GET_CODE (op0) == CONST_VECTOR
&& GET_CODE (op1) == CONST_VECTOR)
{
- unsigned n_elts = GET_MODE_NUNITS (mode);
- machine_mode op0mode = GET_MODE (op0);
- unsigned op0_n_elts = GET_MODE_NUNITS (op0mode);
- machine_mode op1mode = GET_MODE (op1);
- unsigned op1_n_elts = GET_MODE_NUNITS (op1mode);
+ unsigned int n_elts = CONST_VECTOR_NUNITS (op0);
+ gcc_assert (n_elts == (unsigned int) CONST_VECTOR_NUNITS (op1));
+ gcc_assert (n_elts == GET_MODE_NUNITS (mode));
rtvec v = rtvec_alloc (n_elts);
unsigned int i;
- gcc_assert (op0_n_elts == n_elts);
- gcc_assert (op1_n_elts == n_elts);
for (i = 0; i < n_elts; i++)
{
rtx x = simplify_binary_operation (code, GET_MODE_INNER (mode),
@@ -5712,8 +5682,7 @@ simplify_ternary_operation (enum rtx_cod
trueop2 = avoid_constant_pool_reference (op2);
if (CONST_INT_P (trueop2))
{
- int elt_size = GET_MODE_UNIT_SIZE (mode);
- unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
+ unsigned n_elts = GET_MODE_NUNITS (mode);
unsigned HOST_WIDE_INT sel = UINTVAL (trueop2);
unsigned HOST_WIDE_INT mask;
if (n_elts == HOST_BITS_PER_WIDE_INT)
^ permalink raw reply [flat|nested] 90+ messages in thread
* [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (18 preceding siblings ...)
2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
@ 2017-10-23 11:30 ` Richard Sandiford
2017-10-30 17:49 ` Jeff Law
2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:30 UTC (permalink / raw)
To: gcc-patches
This patch moves the check for an overlapping byte to normalize_ref
from its callers, so that it's easier to convert to poly_ints later.
It's not really worth it on its own.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
gcc/
* tree-ssa-dse.c (normalize_ref): Check whether the ranges overlap
and return false if not.
(clear_bytes_written_by, live_bytes_read): Update accordingly.
Index: gcc/tree-ssa-dse.c
===================================================================
--- gcc/tree-ssa-dse.c 2017-10-23 11:41:23.587123840 +0100
+++ gcc/tree-ssa-dse.c 2017-10-23 11:47:41.546155781 +0100
@@ -137,13 +137,11 @@ valid_ao_ref_for_dse (ao_ref *ref)
&& (ref->size != -1));
}
-/* Normalize COPY (an ao_ref) relative to REF. Essentially when we are
- done COPY will only refer bytes found within REF.
+/* Try to normalize COPY (an ao_ref) relative to REF. Essentially when we are
+ done COPY will only refer bytes found within REF. Return true if COPY
+ is known to intersect at least one byte of REF. */
- We have already verified that COPY intersects at least one
- byte with REF. */
-
-static void
+static bool
normalize_ref (ao_ref *copy, ao_ref *ref)
{
/* If COPY starts before REF, then reset the beginning of
@@ -151,13 +149,22 @@ normalize_ref (ao_ref *copy, ao_ref *ref
number of bytes removed from COPY. */
if (copy->offset < ref->offset)
{
- copy->size -= (ref->offset - copy->offset);
+ HOST_WIDE_INT diff = ref->offset - copy->offset;
+ if (copy->size <= diff)
+ return false;
+ copy->size -= diff;
copy->offset = ref->offset;
}
+ HOST_WIDE_INT diff = copy->offset - ref->offset;
+ if (ref->size <= diff)
+ return false;
+
/* If COPY extends beyond REF, chop off its size appropriately. */
- if (copy->offset + copy->size > ref->offset + ref->size)
- copy->size -= (copy->offset + copy->size - (ref->offset + ref->size));
+ HOST_WIDE_INT limit = ref->size - diff;
+ if (copy->size > limit)
+ copy->size = limit;
+ return true;
}
/* Clear any bytes written by STMT from the bitmap LIVE_BYTES. The base
@@ -179,14 +186,10 @@ clear_bytes_written_by (sbitmap live_byt
if (valid_ao_ref_for_dse (&write)
&& operand_equal_p (write.base, ref->base, OEP_ADDRESS_OF)
&& write.size == write.max_size
- && ((write.offset < ref->offset
- && write.offset + write.size > ref->offset)
- || (write.offset >= ref->offset
- && write.offset < ref->offset + ref->size)))
- {
- normalize_ref (&write, ref);
- bitmap_clear_range (live_bytes,
- (write.offset - ref->offset) / BITS_PER_UNIT,
+ && normalize_ref (&write, ref))
+ {
+ HOST_WIDE_INT start = write.offset - ref->offset;
+ bitmap_clear_range (live_bytes, start / BITS_PER_UNIT,
write.size / BITS_PER_UNIT);
}
}
@@ -480,21 +483,20 @@ live_bytes_read (ao_ref use_ref, ao_ref
{
/* We have already verified that USE_REF and REF hit the same object.
Now verify that there's actually an overlap between USE_REF and REF. */
- if (ranges_overlap_p (use_ref.offset, use_ref.size, ref->offset, ref->size))
+ if (normalize_ref (&use_ref, ref))
{
- normalize_ref (&use_ref, ref);
+ HOST_WIDE_INT start = use_ref.offset - ref->offset;
+ HOST_WIDE_INT size = use_ref.size;
/* If USE_REF covers all of REF, then it will hit one or more
live bytes. This avoids useless iteration over the bitmap
below. */
- if (use_ref.offset <= ref->offset
- && use_ref.offset + use_ref.size >= ref->offset + ref->size)
+ if (start == 0 && size == ref->size)
return true;
/* Now check if any of the remaining bits in use_ref are set in LIVE. */
- unsigned int start = (use_ref.offset - ref->offset) / BITS_PER_UNIT;
- unsigned int end = ((use_ref.offset + use_ref.size) / BITS_PER_UNIT) - 1;
- return bitmap_bit_in_range_p (live, start, end);
+ return bitmap_bit_in_range_p (live, start / BITS_PER_UNIT,
+ (start + size - 1) / BITS_PER_UNIT);
}
return true;
}
^ permalink raw reply [flat|nested] 90+ messages in thread
* [21/nn] Minor vn_reference_lookup_3 tweak
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (19 preceding siblings ...)
2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
@ 2017-10-23 11:31 ` Richard Sandiford
2017-10-26 12:18 ` Richard Biener
2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:31 UTC (permalink / raw)
To: gcc-patches
The repeated checks for MEM_REF made this code hard to convert to
poly_ints as-is. Hopefully the new structure also makes it clearer
at a glance what the two cases are.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid repeated
checks for MEM_REF.
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c 2017-10-23 11:47:03.852769480 +0100
+++ gcc/tree-ssa-sccvn.c 2017-10-23 11:47:44.596155858 +0100
@@ -2234,6 +2234,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
|| offset % BITS_PER_UNIT != 0
|| ref->size % BITS_PER_UNIT != 0)
return (void *)-1;
+ at = offset / BITS_PER_UNIT;
/* Extract a pointer base and an offset for the destination. */
lhs = gimple_call_arg (def_stmt, 0);
@@ -2301,19 +2302,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
/* The bases of the destination and the references have to agree. */
- if ((TREE_CODE (base) != MEM_REF
- && !DECL_P (base))
- || (TREE_CODE (base) == MEM_REF
- && (TREE_OPERAND (base, 0) != lhs
- || !tree_fits_uhwi_p (TREE_OPERAND (base, 1))))
- || (DECL_P (base)
- && (TREE_CODE (lhs) != ADDR_EXPR
- || TREE_OPERAND (lhs, 0) != base)))
+ if (TREE_CODE (base) == MEM_REF)
+ {
+ if (TREE_OPERAND (base, 0) != lhs
+ || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
+ return (void *) -1;
+ at += tree_to_uhwi (TREE_OPERAND (base, 1));
+ }
+ else if (!DECL_P (base)
+ || TREE_CODE (lhs) != ADDR_EXPR
+ || TREE_OPERAND (lhs, 0) != base)
return (void *)-1;
- at = offset / BITS_PER_UNIT;
- if (TREE_CODE (base) == MEM_REF)
- at += tree_to_uhwi (TREE_OPERAND (base, 1));
/* If the access is completely outside of the memcpy destination
area there is no aliasing. */
if (lhs_offset >= at + maxsize / BITS_PER_UNIT
^ permalink raw reply [flat|nested] 90+ messages in thread
* [22/nn] Make dse.c use offset/width instead of start/end
2017-10-23 11:16 [00/nn] Patches preparing for runtime offsets and sizes Richard Sandiford
` (20 preceding siblings ...)
2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
@ 2017-10-23 11:45 ` Richard Sandiford
2017-10-26 12:18 ` Richard Biener
21 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-23 11:45 UTC (permalink / raw)
To: gcc-patches
store_info and read_info_type in dse.c represented the ranges as
start/end, but a lot of the internal code used offset/width instead.
Using offset/width throughout fits better with the poly_int.h
range-checking functions.
2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* dse.c (store_info, read_info_type): Replace begin and end with
offset and width.
(print_range): New function.
(set_all_positions_unneeded, any_positions_needed_p)
(check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Update
accordingly.
(record_store): Likewise. Optimize the case in which all positions
are unneeded.
(get_stored_val): Replace read_begin and read_end with read_offset
and read_width.
(replace_read): Update call accordingly.
Index: gcc/dse.c
===================================================================
--- gcc/dse.c 2017-10-23 11:47:11.273428262 +0100
+++ gcc/dse.c 2017-10-23 11:47:48.294155952 +0100
@@ -243,9 +243,12 @@ struct store_info
/* Canonized MEM address for use by canon_true_dependence. */
rtx mem_addr;
- /* The offset of the first and byte before the last byte associated
- with the operation. */
- HOST_WIDE_INT begin, end;
+ /* The offset of the first byte associated with the operation. */
+ HOST_WIDE_INT offset;
+
+ /* The number of bytes covered by the operation. This is always exact
+ and known (rather than -1). */
+ HOST_WIDE_INT width;
union
{
@@ -261,7 +264,7 @@ struct store_info
bitmap bmap;
/* Number of set bits (i.e. unneeded bytes) in BITMAP. If it is
- equal to END - BEGIN, the whole store is unused. */
+ equal to WIDTH, the whole store is unused. */
int count;
} large;
} positions_needed;
@@ -304,10 +307,11 @@ struct read_info_type
/* The id of the mem group of the base address. */
int group_id;
- /* The offset of the first and byte after the last byte associated
- with the operation. If begin == end == 0, the read did not have
- a constant offset. */
- int begin, end;
+ /* The offset of the first byte associated with the operation. */
+ HOST_WIDE_INT offset;
+
+ /* The number of bytes covered by the operation, or -1 if not known. */
+ HOST_WIDE_INT width;
/* The mem being read. */
rtx mem;
@@ -586,6 +590,18 @@ static deferred_change *deferred_change_
/* The number of bits used in the global bitmaps. */
static unsigned int current_position;
+
+/* Print offset range [OFFSET, OFFSET + WIDTH) to FILE. */
+
+static void
+print_range (FILE *file, poly_int64 offset, poly_int64 width)
+{
+ fprintf (file, "[");
+ print_dec (offset, file, SIGNED);
+ fprintf (file, "..");
+ print_dec (offset + width, file, SIGNED);
+ fprintf (file, ")");
+}
\f
/*----------------------------------------------------------------------------
Zeroth step.
@@ -1212,10 +1228,9 @@ set_all_positions_unneeded (store_info *
{
if (__builtin_expect (s_info->is_large, false))
{
- int pos, end = s_info->end - s_info->begin;
- for (pos = 0; pos < end; pos++)
- bitmap_set_bit (s_info->positions_needed.large.bmap, pos);
- s_info->positions_needed.large.count = end;
+ bitmap_set_range (s_info->positions_needed.large.bmap,
+ 0, s_info->width);
+ s_info->positions_needed.large.count = s_info->width;
}
else
s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
@@ -1227,8 +1242,7 @@ set_all_positions_unneeded (store_info *
any_positions_needed_p (store_info *s_info)
{
if (__builtin_expect (s_info->is_large, false))
- return (s_info->positions_needed.large.count
- < s_info->end - s_info->begin);
+ return s_info->positions_needed.large.count < s_info->width;
else
return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
}
@@ -1355,8 +1369,12 @@ record_store (rtx body, bb_info_t bb_inf
set_usage_bits (group, offset, width, expr);
if (dump_file && (dump_flags & TDF_DETAILS))
- fprintf (dump_file, " processing const base store gid=%d[%d..%d)\n",
- group_id, (int)offset, (int)(offset+width));
+ {
+ fprintf (dump_file, " processing const base store gid=%d",
+ group_id);
+ print_range (dump_file, offset, width);
+ fprintf (dump_file, "\n");
+ }
}
else
{
@@ -1368,8 +1386,11 @@ record_store (rtx body, bb_info_t bb_inf
group_id = -1;
if (dump_file && (dump_flags & TDF_DETAILS))
- fprintf (dump_file, " processing cselib store [%d..%d)\n",
- (int)offset, (int)(offset+width));
+ {
+ fprintf (dump_file, " processing cselib store ");
+ print_range (dump_file, offset, width);
+ fprintf (dump_file, "\n");
+ }
}
const_rhs = rhs = NULL_RTX;
@@ -1435,18 +1456,21 @@ record_store (rtx body, bb_info_t bb_inf
{
HOST_WIDE_INT i;
if (dump_file && (dump_flags & TDF_DETAILS))
- fprintf (dump_file, " trying store in insn=%d gid=%d[%d..%d)\n",
- INSN_UID (ptr->insn), s_info->group_id,
- (int)s_info->begin, (int)s_info->end);
+ {
+ fprintf (dump_file, " trying store in insn=%d gid=%d",
+ INSN_UID (ptr->insn), s_info->group_id);
+ print_range (dump_file, s_info->offset, s_info->width);
+ fprintf (dump_file, "\n");
+ }
/* Even if PTR won't be eliminated as unneeded, if both
PTR and this insn store the same constant value, we might
eliminate this insn instead. */
if (s_info->const_rhs
&& const_rhs
- && offset >= s_info->begin
- && offset + width <= s_info->end
- && all_positions_needed_p (s_info, offset - s_info->begin,
+ && known_subrange_p (offset, width,
+ s_info->offset, s_info->width)
+ && all_positions_needed_p (s_info, offset - s_info->offset,
width))
{
if (GET_MODE (mem) == BLKmode)
@@ -1462,8 +1486,7 @@ record_store (rtx body, bb_info_t bb_inf
{
rtx val;
start_sequence ();
- val = get_stored_val (s_info, GET_MODE (mem),
- offset, offset + width,
+ val = get_stored_val (s_info, GET_MODE (mem), offset, width,
BLOCK_FOR_INSN (insn_info->insn),
true);
if (get_insns () != NULL)
@@ -1474,10 +1497,18 @@ record_store (rtx body, bb_info_t bb_inf
}
}
- for (i = MAX (offset, s_info->begin);
- i < offset + width && i < s_info->end;
- i++)
- set_position_unneeded (s_info, i - s_info->begin);
+ if (known_subrange_p (s_info->offset, s_info->width, offset, width))
+ /* The new store touches every byte that S_INFO does. */
+ set_all_positions_unneeded (s_info);
+ else
+ {
+ HOST_WIDE_INT begin_unneeded = offset - s_info->offset;
+ HOST_WIDE_INT end_unneeded = begin_unneeded + width;
+ begin_unneeded = MAX (begin_unneeded, 0);
+ end_unneeded = MIN (end_unneeded, s_info->width);
+ for (i = begin_unneeded; i < end_unneeded; ++i)
+ set_position_unneeded (s_info, i);
+ }
}
else if (s_info->rhs)
/* Need to see if it is possible for this store to overwrite
@@ -1535,8 +1566,8 @@ record_store (rtx body, bb_info_t bb_inf
store_info->positions_needed.small_bitmask = lowpart_bitmask (width);
}
store_info->group_id = group_id;
- store_info->begin = offset;
- store_info->end = offset + width;
+ store_info->offset = offset;
+ store_info->width = width;
store_info->is_set = GET_CODE (body) == SET;
store_info->rhs = rhs;
store_info->const_rhs = const_rhs;
@@ -1700,39 +1731,38 @@ look_for_hardregs (rtx x, const_rtx pat
}
/* Helper function for replace_read and record_store.
- Attempt to return a value stored in STORE_INFO, from READ_BEGIN
- to one before READ_END bytes read in READ_MODE. Return NULL
+ Attempt to return a value of mode READ_MODE stored in STORE_INFO,
+ consisting of READ_WIDTH bytes starting from READ_OFFSET. Return NULL
if not successful. If REQUIRE_CST is true, return always constant. */
static rtx
get_stored_val (store_info *store_info, machine_mode read_mode,
- HOST_WIDE_INT read_begin, HOST_WIDE_INT read_end,
+ HOST_WIDE_INT read_offset, HOST_WIDE_INT read_width,
basic_block bb, bool require_cst)
{
machine_mode store_mode = GET_MODE (store_info->mem);
- int shift;
- int access_size; /* In bytes. */
+ HOST_WIDE_INT gap;
rtx read_reg;
/* To get here the read is within the boundaries of the write so
shift will never be negative. Start out with the shift being in
bytes. */
if (store_mode == BLKmode)
- shift = 0;
+ gap = 0;
else if (BYTES_BIG_ENDIAN)
- shift = store_info->end - read_end;
+ gap = ((store_info->offset + store_info->width)
+ - (read_offset + read_width));
else
- shift = read_begin - store_info->begin;
-
- access_size = shift + GET_MODE_SIZE (read_mode);
-
- /* From now on it is bits. */
- shift *= BITS_PER_UNIT;
+ gap = read_offset - store_info->offset;
- if (shift)
- read_reg = find_shift_sequence (access_size, store_info, read_mode, shift,
- optimize_bb_for_speed_p (bb),
- require_cst);
+ if (gap != 0)
+ {
+ HOST_WIDE_INT shift = gap * BITS_PER_UNIT;
+ HOST_WIDE_INT access_size = GET_MODE_SIZE (read_mode) + gap;
+ read_reg = find_shift_sequence (access_size, store_info, read_mode,
+ shift, optimize_bb_for_speed_p (bb),
+ require_cst);
+ }
else if (store_mode == BLKmode)
{
/* The store is a memset (addr, const_val, const_size). */
@@ -1835,7 +1865,7 @@ replace_read (store_info *store_info, in
start_sequence ();
bb = BLOCK_FOR_INSN (read_insn->insn);
read_reg = get_stored_val (store_info,
- read_mode, read_info->begin, read_info->end,
+ read_mode, read_info->offset, read_info->width,
bb, false);
if (read_reg == NULL_RTX)
{
@@ -1986,8 +2016,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t
read_info = read_info_type_pool.allocate ();
read_info->group_id = group_id;
read_info->mem = mem;
- read_info->begin = offset;
- read_info->end = offset + width;
+ read_info->offset = offset;
+ read_info->width = width;
read_info->next = insn_info->read_rec;
insn_info->read_rec = read_info;
if (group_id < 0)
@@ -2013,8 +2043,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
fprintf (dump_file, " processing const load gid=%d[BLK]\n",
group_id);
else
- fprintf (dump_file, " processing const load gid=%d[%d..%d)\n",
- group_id, (int)offset, (int)(offset+width));
+ {
+ fprintf (dump_file, " processing const load gid=%d", group_id);
+ print_range (dump_file, offset, width);
+ fprintf (dump_file, "\n");
+ }
}
while (i_ptr)
@@ -2052,19 +2085,19 @@ check_mem_read_rtx (rtx *loc, bb_info_t
else
{
if (store_info->rhs
- && offset >= store_info->begin
- && offset + width <= store_info->end
+ && known_subrange_p (offset, width, store_info->offset,
+ store_info->width)
&& all_positions_needed_p (store_info,
- offset - store_info->begin,
+ offset - store_info->offset,
width)
&& replace_read (store_info, i_ptr, read_info,
insn_info, loc, bb_info->regs_live))
return;
/* The bases are the same, just see if the offsets
- overlap. */
- if ((offset < store_info->end)
- && (offset + width > store_info->begin))
+ could overlap. */
+ if (ranges_may_overlap_p (offset, width, store_info->offset,
+ store_info->width))
remove = true;
}
}
@@ -2119,11 +2152,10 @@ check_mem_read_rtx (rtx *loc, bb_info_t
if (store_info->rhs
&& store_info->group_id == -1
&& store_info->cse_base == base
- && width != -1
- && offset >= store_info->begin
- && offset + width <= store_info->end
+ && known_subrange_p (offset, width, store_info->offset,
+ store_info->width)
&& all_positions_needed_p (store_info,
- offset - store_info->begin, width)
+ offset - store_info->offset, width)
&& replace_read (store_info, i_ptr, read_info, insn_info, loc,
bb_info->regs_live))
return;
@@ -2775,16 +2807,19 @@ scan_stores (store_info *store_info, bit
group_info *group_info
= rtx_group_vec[store_info->group_id];
if (group_info->process_globally)
- for (i = store_info->begin; i < store_info->end; i++)
- {
- int index = get_bitmap_index (group_info, i);
- if (index != 0)
- {
- bitmap_set_bit (gen, index);
- if (kill)
- bitmap_clear_bit (kill, index);
- }
- }
+ {
+ HOST_WIDE_INT end = store_info->offset + store_info->width;
+ for (i = store_info->offset; i < end; i++)
+ {
+ int index = get_bitmap_index (group_info, i);
+ if (index != 0)
+ {
+ bitmap_set_bit (gen, index);
+ if (kill)
+ bitmap_clear_bit (kill, index);
+ }
+ }
+ }
store_info = store_info->next;
}
}
@@ -2834,9 +2869,9 @@ scan_reads (insn_info_t insn_info, bitma
{
if (i == read_info->group_id)
{
- if (read_info->begin > read_info->end)
+ if (!known_size_p (read_info->width))
{
- /* Begin > end for block mode reads. */
+ /* Handle block mode reads. */
if (kill)
bitmap_ior_into (kill, group->group_kill);
bitmap_and_compl_into (gen, group->group_kill);
@@ -2846,7 +2881,8 @@ scan_reads (insn_info_t insn_info, bitma
/* The groups are the same, just process the
offsets. */
HOST_WIDE_INT j;
- for (j = read_info->begin; j < read_info->end; j++)
+ HOST_WIDE_INT end = read_info->offset + read_info->width;
+ for (j = read_info->offset; j < end; j++)
{
int index = get_bitmap_index (group, j);
if (index != 0)
@@ -3265,7 +3301,8 @@ dse_step5 (void)
HOST_WIDE_INT i;
group_info *group_info = rtx_group_vec[store_info->group_id];
- for (i = store_info->begin; i < store_info->end; i++)
+ HOST_WIDE_INT end = store_info->offset + store_info->width;
+ for (i = store_info->offset; i < end; i++)
{
int index = get_bitmap_index (group_info, i);
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [01/nn] Add gen_(const_)vec_duplicate helpers
2017-10-23 11:17 ` [01/nn] Add gen_(const_)vec_duplicate helpers Richard Sandiford
@ 2017-10-25 16:29 ` Jeff Law
2017-10-27 16:12 ` Richard Sandiford
0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-25 16:29 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:16 AM, Richard Sandiford wrote:
> This patch adds helper functions for generating constant and
> non-constant vector duplicates. These routines help with SVE because
> it is then easier to use:
>
> (const:M (vec_duplicate:M X))
>
> for a broadcast of X, even if the number of elements in M isn't known
> at compile time. It also makes it easier for general rtx code to treat
> constant and non-constant duplicates in the same way.
>
> In the target code, the patch uses gen_vec_duplicate instead of
> gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
> useful. It might be that some or all of the call sites only handle
> non-constants in practice, in which case the change is a harmless
> no-op (and a saving of a few characters).
>
> Otherwise, the target changes use gen_const_vec_duplicate instead
> of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
> They also include some changes to use CONSTxx_RTX for easy global
> constants.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * emit-rtl.h (gen_const_vec_duplicate): Declare.
> (gen_vec_duplicate): Likewise.
> * emit-rtl.c (gen_const_vec_duplicate_1): New function, split
> out from...
> (gen_const_vector): ...here.
> (gen_const_vec_duplicate, gen_vec_duplicate): New functions.
> (gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
> whose elements are all equal.
> * optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
> * simplify-rtx.c (simplify_const_unary_operation): Likewise.
> (simplify_relational_operation): Likewise.
> * config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
> Likewise.
> (aarch64_simd_dup_constant): Use gen_vec_duplicate.
> (aarch64_expand_vector_init): Likewise.
> * config/arm/arm.c (neon_vdup_constant): Likewise.
> (neon_expand_vector_init): Likewise.
> (arm_expand_vec_perm): Use gen_const_vec_duplicate.
> (arm_block_set_unaligned_vect): Likewise.
> (arm_block_set_aligned_vect): Likewise.
> * config/arm/neon.md (neon_copysignf<mode>): Likewise.
> * config/i386/i386.c (ix86_expand_vec_perm): Likewise.
> (expand_vec_perm_even_odd_pack): Likewise.
> (ix86_vector_duplicate_value): Use gen_vec_duplicate.
> * config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
> * config/ia64/ia64.c (ia64_expand_vecint_compare): Use
> gen_const_vec_duplicate.
> * config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
> * config/mips/mips.c (mips_gen_const_int_vector): Use
> gen_const_vec_duplicate.
> (mips_expand_vector_init): Use CONST0_RTX.
> * config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
> (define_split): Use gen_const_vec_duplicate.
> * config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
> (define_split): Use gen_const_vec_duplicate.
> * config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
> (vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
> * config/spu/spu.c (spu_const): Likewise.
I'd started looking at this a couple times when it was originally
submitted, but never seemed to get through it. It seems like a nice
cleanup.
So in gen_const_vector we had an assert to verify that const_tiny_rtx
was set up. That seems to have been lost. It's probably not a big
deal, but I mention it in case the loss was unintentional.
OK. Your call whether or not to re-introduce the assert for const_tiny_rtx.
Jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [02/nn] Add more vec_duplicate simplifications
2017-10-23 11:19 ` [02/nn] Add more vec_duplicate simplifications Richard Sandiford
@ 2017-10-25 16:35 ` Jeff Law
2017-11-10 9:42 ` Christophe Lyon
0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-25 16:35 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:17 AM, Richard Sandiford wrote:
> This patch adds a vec_duplicate_p helper that tests for constant
> or non-constant vector duplicates. Together with the existing
> const_vec_duplicate_p, this complements the gen_vec_duplicate
> and gen_const_vec_duplicate added by a previous patch.
>
> The patch uses the new routines to add more rtx simplifications
> involving vector duplicates. These mirror simplifications that
> we already do for CONST_VECTOR broadcasts and are needed for
> variable-length SVE, which uses:
>
> (const:M (vec_duplicate:M X))
>
> to represent constant broadcasts instead. The simplifications do
> trigger on the testsuite for variable duplicates too, and in each
> case I saw the change was an improvement. E.g.:
>
[ snip ]
>
> The best way of testing the new simplifications seemed to be
> via selftests. The patch cribs part of David's patch here:
> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
Cool. I really wish I had more time to promote David's work by adding
selftests to various things. There's certainly cases where it's the
most direct and useful way to test certain bits of lower level
infrastructure we have. Glad to see you found it useful here.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> David Malcolm <dmalcolm@redhat.com>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * rtl.h (vec_duplicate_p): New function.
> * selftest-rtl.c (assert_rtx_eq_at): New function.
> * selftest-rtl.h (ASSERT_RTX_EQ): New macro.
> (assert_rtx_eq_at): Declare.
> * selftest.h (selftest::simplify_rtx_c_tests): Declare.
> * selftest-run-tests.c (selftest::run_tests): Call it.
> * simplify-rtx.c: Include selftest.h and selftest-rtl.h.
> (simplify_unary_operation_1): Recursively handle vector duplicates.
> (simplify_binary_operation_1): Likewise. Handle VEC_SELECTs of
> vector duplicates.
> (simplify_subreg): Handle subregs of vector duplicates.
> (make_test_reg, test_vector_ops_duplicate, test_vector_ops)
> (selftest::simplify_rtx_c_tests): New functions.
Thanks for the examples of how this affects various targets. Seems like
it ought to be a consistent win when they trigger.
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [03/nn] Allow vector CONSTs
2017-10-23 11:19 ` [03/nn] Allow vector CONSTs Richard Sandiford
@ 2017-10-25 16:59 ` Jeff Law
2017-10-27 16:19 ` Richard Sandiford
0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-25 16:59 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:18 AM, Richard Sandiford wrote:
> This patch allows (const ...) wrappers to be used for rtx vector
> constants, as an alternative to const_vector. This is useful
> for SVE, where the number of elements isn't known until runtime.
Right. It's constant, but not knowable at compile time. That seems an
exact match for how we've used CONST.
>
> It could also be useful in future for fixed-length vectors, to
> reduce the amount of memory needed to represent simple constants
> with high element counts. However, one nice thing about keeping
> it restricted to variable-length vectors is that there is never
> any need to handle combinations of (const ...) and CONST_VECTOR.
Yea, but is the memory consumption of these large vectors a real
problem? I suspect, relative to other memory issues they're in the noise.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * doc/rtl.texi (const): Update description of address constants.
> Say that vector constants are allowed too.
> * common.md (E, F): Use CONSTANT_P instead of checking for
> CONST_VECTOR.
> * emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
> checking for CONST_VECTOR.
> * expmed.c (make_tree): Use build_vector_from_val for a CONST
> VEC_DUPLICATE.
> * expr.c (expand_expr_real_2): Check for vector modes instead
> of checking for CONST_VECTOR.
> * rtl.h (const_vec_p): New function.
> (const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
> (unwrap_const_vec_duplicate): Handle them here too.
My only worry here is code that is a bit loose in checking for a CONST,
but not the innards and perhaps isn't prepared for for the new forms
that appear inside the CONST.
If we have such problems I'd expect it's in the targets as the targets
have traditionally have had to validate the innards of a CONST to ensure
it could be handled by the assembler/linker. Hmm, that may save the
targets since they'd likely need an update to LEGITIMATE_CONSTANT_P to
ever see these new forms.
Presumably an aarch64 specific patch to recognize these as valid
constants in LEGITIMATE_CONSTANT_P is in the works?
OK for the trunk.
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [04/nn] Add a VEC_SERIES rtl code
2017-10-23 11:20 ` [04/nn] Add a VEC_SERIES rtl code Richard Sandiford
@ 2017-10-26 11:49 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:49 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:19 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds an rtl representation of a vector linear series
> of the form:
>
> a[I] = BASE + I * STEP
>
> Like vec_duplicate;
>
> - the new rtx can be used for both constant and non-constant vectors
> - when used for constant vectors it is wrapped in a (const ...)
> - the constant form is only used for variable-length vectors;
> fixed-length vectors still use CONST_VECTOR
>
> At the moment the code is restricted to integer elements, to avoid
> concerns over floating-point rounding.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * doc/rtl.texi (vec_series): Document.
> (const): Say that the operand can be a vec_series.
> * rtl.def (VEC_SERIES): New rtx code.
> * rtl.h (const_vec_series_p_1): Declare.
> (const_vec_series_p): New function.
> * emit-rtl.h (gen_const_vec_series): Declare.
> (gen_vec_series): Likewise.
> * emit-rtl.c (const_vec_series_p_1, gen_const_vec_series)
> (gen_vec_series): Likewise.
> * optabs.c (expand_mult_highpart): Use gen_const_vec_series.
> * simplify-rtx.c (simplify_unary_operation): Handle negations
> of vector series.
> (simplify_binary_operation_series): New function.
> (simplify_binary_operation_1): Use it. Handle VEC_SERIES.
> (test_vector_ops_series): New function.
> (test_vector_ops): Call it.
> * config/powerpcspe/altivec.md (altivec_lvsl): Use
> gen_const_vec_series.
> (altivec_lvsr): Likewise.
> * config/rs6000/altivec.md (altivec_lvsl, altivec_lvsr): Likewise.
>
> Index: gcc/doc/rtl.texi
> ===================================================================
> --- gcc/doc/rtl.texi 2017-10-23 11:41:39.185050437 +0100
> +++ gcc/doc/rtl.texi 2017-10-23 11:41:41.547050496 +0100
> @@ -1677,7 +1677,8 @@ are target-specific and typically repres
> operator. @var{m} should be a valid address mode.
>
> The second use of @code{const} is to wrap a vector operation.
> -In this case @var{exp} must be a @code{vec_duplicate} expression.
> +In this case @var{exp} must be a @code{vec_duplicate} or
> +@code{vec_series} expression.
>
> @findex high
> @item (high:@var{m} @var{exp})
> @@ -2722,6 +2723,10 @@ the same submodes as the input vector mo
> number of output parts must be an integer multiple of the number of input
> parts.
>
> +@findex vec_series
> +@item (vec_series:@var{m} @var{base} @var{step})
> +This operation creates a vector in which element @var{i} is equal to
> +@samp{@var{base} + @var{i}*@var{step}}. @var{m} must be a vector integer mode.
> @end table
>
> @node Conversions
> Index: gcc/rtl.def
> ===================================================================
> --- gcc/rtl.def 2017-10-23 11:40:11.378243915 +0100
> +++ gcc/rtl.def 2017-10-23 11:41:41.549050496 +0100
> @@ -710,6 +710,11 @@ DEF_RTL_EXPR(VEC_CONCAT, "vec_concat", "
> an integer multiple of the number of input parts. */
> DEF_RTL_EXPR(VEC_DUPLICATE, "vec_duplicate", "e", RTX_UNARY)
>
> +/* Creation of a vector in which element I has the value BASE + I * STEP,
> + where BASE is the first operand and STEP is the second. The result
> + must have a vector integer mode. */
> +DEF_RTL_EXPR(VEC_SERIES, "vec_series", "ee", RTX_BIN_ARITH)
> +
> /* Addition with signed saturation */
> DEF_RTL_EXPR(SS_PLUS, "ss_plus", "ee", RTX_COMM_ARITH)
>
> Index: gcc/rtl.h
> ===================================================================
> --- gcc/rtl.h 2017-10-23 11:41:39.188050437 +0100
> +++ gcc/rtl.h 2017-10-23 11:41:41.549050496 +0100
> @@ -2816,6 +2816,51 @@ unwrap_const_vec_duplicate (T x)
> return x;
> }
>
> +/* In emit-rtl.c. */
> +extern bool const_vec_series_p_1 (const_rtx, rtx *, rtx *);
> +
> +/* Return true if X is a constant vector that contains a linear series
> + of the form:
> +
> + { B, B + S, B + 2 * S, B + 3 * S, ... }
> +
> + for a nonzero S. Store B and S in *BASE_OUT and *STEP_OUT on sucess. */
> +
> +inline bool
> +const_vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> + if (GET_CODE (x) == CONST_VECTOR
> + && GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_INT)
> + return const_vec_series_p_1 (x, base_out, step_out);
> + if (GET_CODE (x) == CONST && GET_CODE (XEXP (x, 0)) == VEC_SERIES)
> + {
> + *base_out = XEXP (XEXP (x, 0), 0);
> + *step_out = XEXP (XEXP (x, 0), 1);
> + return true;
> + }
> + return false;
> +}
> +
> +/* Return true if X is a vector that contains a linear series of the
> + form:
> +
> + { B, B + S, B + 2 * S, B + 3 * S, ... }
> +
> + where B and S are constant or nonconstant. Store B and S in
> + *BASE_OUT and *STEP_OUT on sucess. */
> +
> +inline bool
> +vec_series_p (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> + if (GET_CODE (x) == VEC_SERIES)
> + {
> + *base_out = XEXP (x, 0);
> + *step_out = XEXP (x, 1);
> + return true;
> + }
> + return const_vec_series_p (x, base_out, step_out);
> +}
> +
> /* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X. */
>
> inline scalar_int_mode
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h 2017-10-23 11:41:32.369050264 +0100
> +++ gcc/emit-rtl.h 2017-10-23 11:41:41.548050496 +0100
> @@ -441,6 +441,9 @@ get_max_uid (void)
> extern rtx gen_const_vec_duplicate (machine_mode, rtx);
> extern rtx gen_vec_duplicate (machine_mode, rtx);
>
> +extern rtx gen_const_vec_series (machine_mode, rtx, rtx);
> +extern rtx gen_vec_series (machine_mode, rtx, rtx);
> +
> extern void set_decl_incoming_rtl (tree, rtx, bool);
>
> /* Return a memory reference like MEMREF, but with its mode changed
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c 2017-10-23 11:41:39.186050437 +0100
> +++ gcc/emit-rtl.c 2017-10-23 11:41:41.548050496 +0100
> @@ -5796,6 +5796,69 @@ gen_vec_duplicate (machine_mode mode, rt
> return gen_rtx_VEC_DUPLICATE (mode, x);
> }
>
> +/* A subroutine of const_vec_series_p that handles the case in which
> + X is known to be an integer CONST_VECTOR. */
> +
> +bool
> +const_vec_series_p_1 (const_rtx x, rtx *base_out, rtx *step_out)
> +{
> + unsigned int nelts = CONST_VECTOR_NUNITS (x);
> + if (nelts < 2)
> + return false;
> +
> + scalar_mode inner = GET_MODE_INNER (GET_MODE (x));
> + rtx base = CONST_VECTOR_ELT (x, 0);
> + rtx step = simplify_binary_operation (MINUS, inner,
> + CONST_VECTOR_ELT (x, 1), base);
> + if (rtx_equal_p (step, CONST0_RTX (inner)))
> + return false;
> +
> + for (unsigned int i = 2; i < nelts; ++i)
> + {
> + rtx diff = simplify_binary_operation (MINUS, inner,
> + CONST_VECTOR_ELT (x, i),
> + CONST_VECTOR_ELT (x, i - 1));
> + if (!rtx_equal_p (step, diff))
> + return false;
> + }
> +
> + *base_out = base;
> + *step_out = step;
> + return true;
> +}
> +
> +/* Generate a vector constant of mode MODE in which element I has
> + the value BASE + I * STEP. */
> +
> +rtx
> +gen_const_vec_series (machine_mode mode, rtx base, rtx step)
> +{
> + gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
> +
> + int nunits = GET_MODE_NUNITS (mode);
> + rtvec v = rtvec_alloc (nunits);
> + scalar_mode inner_mode = GET_MODE_INNER (mode);
> + RTVEC_ELT (v, 0) = base;
> + for (int i = 1; i < nunits; ++i)
> + RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
> + RTVEC_ELT (v, i - 1), step);
> + return gen_rtx_raw_CONST_VECTOR (mode, v);
> +}
> +
> +/* Generate a vector of mode MODE in which element I has the value
> + BASE + I * STEP. The result will be a constant if BASE and STEP
> + are both constants. */
> +
> +rtx
> +gen_vec_series (machine_mode mode, rtx base, rtx step)
> +{
> + if (step == const0_rtx)
> + return gen_vec_duplicate (mode, base);
> + if (CONSTANT_P (base) && CONSTANT_P (step))
> + return gen_const_vec_series (mode, base, step);
> + return gen_rtx_VEC_SERIES (mode, base, step);
> +}
> +
> /* Generate a new vector constant for mode MODE and constant value
> CONSTANT. */
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-10-23 11:41:32.369050264 +0100
> +++ gcc/optabs.c 2017-10-23 11:41:41.549050496 +0100
> @@ -5784,13 +5784,13 @@ expand_mult_highpart (machine_mode mode,
> for (i = 0; i < nunits; ++i)
> RTVEC_ELT (v, i) = GEN_INT (!BYTES_BIG_ENDIAN + (i & ~1)
> + ((i & 1) ? nunits : 0));
> + perm = gen_rtx_CONST_VECTOR (mode, v);
> }
> else
> {
> - for (i = 0; i < nunits; ++i)
> - RTVEC_ELT (v, i) = GEN_INT (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
> + int base = BYTES_BIG_ENDIAN ? 0 : 1;
> + perm = gen_const_vec_series (mode, GEN_INT (base), GEN_INT (2));
> }
> - perm = gen_rtx_CONST_VECTOR (mode, v);
>
> return expand_vec_perm (mode, m1, m2, perm, target);
> }
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c 2017-10-23 11:41:36.309050364 +0100
> +++ gcc/simplify-rtx.c 2017-10-23 11:41:41.550050496 +0100
> @@ -927,7 +927,7 @@ exact_int_to_float_conversion_p (const_r
> simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
> {
> enum rtx_code reversed;
> - rtx temp, elt;
> + rtx temp, elt, base, step;
> scalar_int_mode inner, int_mode, op_mode, op0_mode;
>
> switch (code)
> @@ -1185,6 +1185,22 @@ simplify_unary_operation_1 (enum rtx_cod
> return simplify_gen_unary (TRUNCATE, int_mode, temp, inner);
> }
> }
> +
> + if (vec_series_p (op, &base, &step))
> + {
> + /* Only create a new series if we can simplify both parts. In other
> + cases this isn't really a simplification, and it's not necessarily
> + a win to replace a vector operation with a scalar operation. */
> + scalar_mode inner_mode = GET_MODE_INNER (mode);
> + base = simplify_unary_operation (NEG, inner_mode, base, inner_mode);
> + if (base)
> + {
> + step = simplify_unary_operation (NEG, inner_mode,
> + step, inner_mode);
> + if (step)
> + return gen_vec_series (mode, base, step);
> + }
> + }
> break;
>
> case TRUNCATE:
> @@ -2153,6 +2169,46 @@ simplify_binary_operation (enum rtx_code
> return NULL_RTX;
> }
>
> +/* Subroutine of simplify_binary_operation_1 that looks for cases in
> + which OP0 and OP1 are both vector series or vector duplicates
> + (which are really just series with a step of 0). If so, try to
> + form a new series by applying CODE to the bases and to the steps.
> + Return null if no simplification is possible.
> +
> + MODE is the mode of the operation and is known to be a vector
> + integer mode. */
> +
> +static rtx
> +simplify_binary_operation_series (rtx_code code, machine_mode mode,
> + rtx op0, rtx op1)
> +{
> + rtx base0, step0;
> + if (vec_duplicate_p (op0, &base0))
> + step0 = const0_rtx;
> + else if (!vec_series_p (op0, &base0, &step0))
> + return NULL_RTX;
> +
> + rtx base1, step1;
> + if (vec_duplicate_p (op1, &base1))
> + step1 = const0_rtx;
> + else if (!vec_series_p (op1, &base1, &step1))
> + return NULL_RTX;
> +
> + /* Only create a new series if we can simplify both parts. In other
> + cases this isn't really a simplification, and it's not necessarily
> + a win to replace a vector operation with a scalar operation. */
> + scalar_mode inner_mode = GET_MODE_INNER (mode);
> + rtx new_base = simplify_binary_operation (code, inner_mode, base0, base1);
> + if (!new_base)
> + return NULL_RTX;
> +
> + rtx new_step = simplify_binary_operation (code, inner_mode, step0, step1);
> + if (!new_step)
> + return NULL_RTX;
> +
> + return gen_vec_series (mode, new_base, new_step);
> +}
> +
> /* Subroutine of simplify_binary_operation. Simplify a binary operation
> CODE with result mode MODE, operating on OP0 and OP1. If OP0 and/or
> OP1 are constant pool references, TRUEOP0 and TRUEOP1 represent the
> @@ -2333,6 +2389,14 @@ simplify_binary_operation_1 (enum rtx_co
> if (tem)
> return tem;
> }
> +
> + /* Handle vector series. */
> + if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> + {
> + tem = simplify_binary_operation_series (code, mode, op0, op1);
> + if (tem)
> + return tem;
> + }
> break;
>
> case COMPARE:
> @@ -2544,6 +2608,14 @@ simplify_binary_operation_1 (enum rtx_co
> || plus_minus_operand_p (op1))
> && (tem = simplify_plus_minus (code, mode, op0, op1)) != 0)
> return tem;
> +
> + /* Handle vector series. */
> + if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> + {
> + tem = simplify_binary_operation_series (code, mode, op0, op1);
> + if (tem)
> + return tem;
> + }
> break;
>
> case MULT:
> @@ -3495,6 +3567,11 @@ simplify_binary_operation_1 (enum rtx_co
> /* ??? There are simplifications that can be done. */
> return 0;
>
> + case VEC_SERIES:
> + if (op1 == CONST0_RTX (GET_MODE_INNER (mode)))
> + return gen_vec_duplicate (mode, op0);
> + return 0;
> +
> case VEC_SELECT:
> if (!VECTOR_MODE_P (mode))
> {
> @@ -6490,6 +6567,60 @@ test_vector_ops_duplicate (machine_mode
> }
> }
>
> +/* Test vector simplifications involving VEC_SERIES in which the
> + operands and result have vector mode MODE. SCALAR_REG is a pseudo
> + register that holds one element of MODE. */
> +
> +static void
> +test_vector_ops_series (machine_mode mode, rtx scalar_reg)
> +{
> + /* Test unary cases with VEC_SERIES arguments. */
> + scalar_mode inner_mode = GET_MODE_INNER (mode);
> + rtx duplicate = gen_rtx_VEC_DUPLICATE (mode, scalar_reg);
> + rtx neg_scalar_reg = gen_rtx_NEG (inner_mode, scalar_reg);
> + rtx series_0_r = gen_rtx_VEC_SERIES (mode, const0_rtx, scalar_reg);
> + rtx series_0_nr = gen_rtx_VEC_SERIES (mode, const0_rtx, neg_scalar_reg);
> + rtx series_nr_1 = gen_rtx_VEC_SERIES (mode, neg_scalar_reg, const1_rtx);
> + rtx series_r_m1 = gen_rtx_VEC_SERIES (mode, scalar_reg, constm1_rtx);
> + rtx series_r_r = gen_rtx_VEC_SERIES (mode, scalar_reg, scalar_reg);
> + rtx series_nr_nr = gen_rtx_VEC_SERIES (mode, neg_scalar_reg,
> + neg_scalar_reg);
> + ASSERT_RTX_EQ (series_0_r,
> + simplify_unary_operation (NEG, mode, series_0_nr, mode));
> + ASSERT_RTX_EQ (series_r_m1,
> + simplify_unary_operation (NEG, mode, series_nr_1, mode));
> + ASSERT_RTX_EQ (series_r_r,
> + simplify_unary_operation (NEG, mode, series_nr_nr, mode));
> +
> + /* Test that a VEC_SERIES with a zero step is simplified away. */
> + ASSERT_RTX_EQ (duplicate,
> + simplify_binary_operation (VEC_SERIES, mode,
> + scalar_reg, const0_rtx));
> +
> + /* Test PLUS and MINUS with VEC_SERIES. */
> + rtx series_0_1 = gen_const_vec_series (mode, const0_rtx, const1_rtx);
> + rtx series_0_m1 = gen_const_vec_series (mode, const0_rtx, constm1_rtx);
> + rtx series_r_1 = gen_rtx_VEC_SERIES (mode, scalar_reg, const1_rtx);
> + ASSERT_RTX_EQ (series_r_r,
> + simplify_binary_operation (PLUS, mode, series_0_r,
> + duplicate));
> + ASSERT_RTX_EQ (series_r_1,
> + simplify_binary_operation (PLUS, mode, duplicate,
> + series_0_1));
> + ASSERT_RTX_EQ (series_r_m1,
> + simplify_binary_operation (PLUS, mode, duplicate,
> + series_0_m1));
> + ASSERT_RTX_EQ (series_0_r,
> + simplify_binary_operation (MINUS, mode, series_r_r,
> + duplicate));
> + ASSERT_RTX_EQ (series_r_m1,
> + simplify_binary_operation (MINUS, mode, duplicate,
> + series_0_1));
> + ASSERT_RTX_EQ (series_r_1,
> + simplify_binary_operation (MINUS, mode, duplicate,
> + series_0_m1));
> +}
> +
> /* Verify some simplifications involving vectors. */
>
> static void
> @@ -6502,6 +6633,9 @@ test_vector_ops ()
> {
> rtx scalar_reg = make_test_reg (GET_MODE_INNER (mode));
> test_vector_ops_duplicate (mode, scalar_reg);
> + if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT
> + && GET_MODE_NUNITS (mode) > 2)
> + test_vector_ops_series (mode, scalar_reg);
> }
> }
> }
> Index: gcc/config/powerpcspe/altivec.md
> ===================================================================
> --- gcc/config/powerpcspe/altivec.md 2017-10-23 11:41:32.366050264 +0100
> +++ gcc/config/powerpcspe/altivec.md 2017-10-23 11:41:41.546050496 +0100
> @@ -2456,13 +2456,10 @@ (define_expand "altivec_lvsl"
> emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
> else
> {
> - int i;
> - rtx mask, perm[16], constv, vperm;
> + rtx mask, constv, vperm;
> mask = gen_reg_rtx (V16QImode);
> emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
> - for (i = 0; i < 16; ++i)
> - perm[i] = GEN_INT (i);
> - constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> + constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
> constv = force_reg (V16QImode, constv);
> vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
> UNSPEC_VPERM);
> @@ -2488,13 +2485,10 @@ (define_expand "altivec_lvsr"
> emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
> else
> {
> - int i;
> - rtx mask, perm[16], constv, vperm;
> + rtx mask, constv, vperm;
> mask = gen_reg_rtx (V16QImode);
> emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
> - for (i = 0; i < 16; ++i)
> - perm[i] = GEN_INT (i);
> - constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> + constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
> constv = force_reg (V16QImode, constv);
> vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
> UNSPEC_VPERM);
> Index: gcc/config/rs6000/altivec.md
> ===================================================================
> --- gcc/config/rs6000/altivec.md 2017-10-23 11:41:32.366050264 +0100
> +++ gcc/config/rs6000/altivec.md 2017-10-23 11:41:41.547050496 +0100
> @@ -2573,13 +2573,10 @@ (define_expand "altivec_lvsl"
> emit_insn (gen_altivec_lvsl_direct (operands[0], operands[1]));
> else
> {
> - int i;
> - rtx mask, perm[16], constv, vperm;
> + rtx mask, constv, vperm;
> mask = gen_reg_rtx (V16QImode);
> emit_insn (gen_altivec_lvsl_direct (mask, operands[1]));
> - for (i = 0; i < 16; ++i)
> - perm[i] = GEN_INT (i);
> - constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> + constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
> constv = force_reg (V16QImode, constv);
> vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
> UNSPEC_VPERM);
> @@ -2614,13 +2611,10 @@ (define_expand "altivec_lvsr"
> emit_insn (gen_altivec_lvsr_direct (operands[0], operands[1]));
> else
> {
> - int i;
> - rtx mask, perm[16], constv, vperm;
> + rtx mask, constv, vperm;
> mask = gen_reg_rtx (V16QImode);
> emit_insn (gen_altivec_lvsr_direct (mask, operands[1]));
> - for (i = 0; i < 16; ++i)
> - perm[i] = GEN_INT (i);
> - constv = gen_rtx_CONST_VECTOR (V16QImode, gen_rtvec_v (16, perm));
> + constv = gen_const_vec_series (V16QImode, const0_rtx, const1_rtx);
> constv = force_reg (V16QImode, constv);
> vperm = gen_rtx_UNSPEC (V16QImode, gen_rtvec (3, mask, mask, constv),
> UNSPEC_VPERM);
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
2017-10-23 11:21 ` [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab Richard Sandiford
@ 2017-10-26 11:53 ` Richard Biener
2017-11-06 15:09 ` Richard Sandiford
2017-12-15 0:29 ` Richard Sandiford
1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:53 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> SVE needs a way of broadcasting a scalar to a variable-length vector.
> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
> be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree
> equivalent of the existing rtl code VEC_DUPLICATE.
>
> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
> to mark constant nodes, but in response to last year's RFC, Richard B.
> suggested it would be better to have separate codes for the constant
> and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated
> as a normal unary operation and avoids the previous need for treating
> it as a GIMPLE_SINGLE_RHS.
>
> It might make sense to use VEC_DUPLICATE_CST for all duplicated
> vector constants, since it's a bit more compact than VECTOR_CST
> in that case, and is potentially more efficient to process.
> However, the nice thing about keeping it restricted to variable-length
> vectors is that there is then no need to handle combinations of
> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
> VECTOR_CST or never use it.
>
> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100
+++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
@@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
ssa_uniform_vector_p (tree op)
{
if (TREE_CODE (op) == VECTOR_CST
+ || TREE_CODE (op) == VEC_DUPLICATE_CST
|| TREE_CODE (op) == CONSTRUCTOR)
return uniform_vector_p (op);
VEC_DUPLICATE_EXPR handling? Looks like for VEC_DUPLICATE_CST
it could directly return true.
I didn't see uniform_vector_p being updated?
Can you add verification to either verify_expr or build_vec_duplicate_cst
that the type is one of variable size? And amend tree.def docs
accordingly. Because otherwise we miss a lot of cases in constant
folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).
Otherwise looks ok to me.
Thanks,
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hawyard@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
> (VEC_COND_EXPR): Add missing @tindex.
> * doc/md.texi (vec_duplicate@var{m}): Document.
> * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
> * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
> are used for VEC_DUPLICATE_CST as well.
> (tree_vector): Access base.n.nelts directly.
> * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
> valid codes.
> (VEC_DUPLICATE_CST_ELT): New macro.
> (build_vec_duplicate_cst): Declare.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (integer_zerop, integer_onep, integer_all_onesp, integer_truep)
> (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
> (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
> (build_vec_duplicate_cst): New function.
> (uniform_vector_p): Handle the new codes.
> (test_vec_duplicate_predicates_int): New function.
> (test_vec_duplicate_predicates_float): Likewise.
> (test_vec_duplicate_predicates): Likewise.
> (tree_c_tests): Call test_vec_duplicate_predicates.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-isl-ast-to-gimple.c
> (translate_isl_ast_to_gimple::is_constant): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * match.pd (negate_expr_p): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-chkp.c (chkp_find_bounds_1): Likewise.
> * tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
> (fold_convert_const, operand_equal_p, fold_view_convert_expr)
> (exact_inverse, fold_checksum_tree): Likewise.
> (const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant.
> (test_vec_duplicate_folding): New function.
> (fold_const_c_tests): Call it.
> * optabs.def (vec_duplicate_optab): New optab.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
> * optabs.h (expand_vector_broadcast): Declare.
> * optabs.c (expand_vector_broadcast): Make non-static. Try using
> vec_duplicate_optab.
> * expr.c (store_constructor): Try using vec_duplicate_optab for
> uniform vectors.
> (const_vector_element): New function, split out from...
> (const_vector_from_tree): ...here.
> (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
> (expand_expr_real_1): Handle VEC_DUPLICATE_CST.
> * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
> instead of checking for VECTOR_CST.
> * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
> (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
> * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100
> @@ -1036,6 +1036,7 @@ As this example indicates, the operands
> @tindex FIXED_CST
> @tindex COMPLEX_CST
> @tindex VECTOR_CST
> +@tindex VEC_DUPLICATE_CST
> @tindex STRING_CST
> @findex TREE_STRING_LENGTH
> @findex TREE_STRING_POINTER
> @@ -1089,6 +1090,14 @@ constant nodes. Each individual constan
> double constant node. The first operand is a @code{TREE_LIST} of the
> constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
>
> +@item VEC_DUPLICATE_CST
> +These nodes represent a vector constant in which every element has the
> +same scalar value. At present only variable-length vectors use
> +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
> +instead. The scalar element value is given by
> +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
> +element of a @code{VECTOR_CST}.
> +
> @item STRING_CST
> These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
> returns the length of the string, as an @code{int}. The
> @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
>
> @node Vectors
> @subsection Vectors
> +@tindex VEC_DUPLICATE_EXPR
> @tindex VEC_LSHIFT_EXPR
> @tindex VEC_RSHIFT_EXPR
> @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
> @tindex VEC_PACK_TRUNC_EXPR
> @tindex VEC_PACK_SAT_EXPR
> @tindex VEC_PACK_FIX_TRUNC_EXPR
> +@tindex VEC_COND_EXPR
> @tindex SAD_EXPR
>
> @table @code
> +@item VEC_DUPLICATE_EXPR
> +This node has a single operand and represents a vector in which every
> +element is equal to that operand.
> +
> @item VEC_LSHIFT_EXPR
> @itemx VEC_RSHIFT_EXPR
> These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi 2017-10-23 11:41:22.189466342 +0100
> +++ gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100
> @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
> the vector mode @var{m}, or a vector mode with the same element mode and
> smaller number of elements.
>
> +@cindex @code{vec_duplicate@var{m}} instruction pattern
> +@item @samp{vec_duplicate@var{m}}
> +Initialize vector output operand 0 so that each element has the value given
> +by scalar input operand 1. The vector has mode @var{m} and the scalar has
> +the mode appropriate for one element of @var{m}.
> +
> +This pattern only handles duplicates of non-constant inputs. Constant
> +vectors go through the @code{mov@var{m}} pattern instead.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
> @item @samp{vec_cmp@var{m}@var{n}}
> Output a vector comparison. Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree.def 2017-10-23 11:41:51.774917721 +0100
> @@ -304,6 +304,10 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
> /* Contents are in VECTOR_CST_ELTS field. */
> DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which every element is equal to
> + VEC_DUPLICATE_CST_ELT. */
> +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
> +
> /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
> DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -534,6 +538,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
> 1 and 2 are NULL. The operands are then taken from the cfg edges. */
> DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Represents a vector in which every element is equal to operand 0. */
> +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
> +
> /* Vector conditional expression. It is like COND_EXPR, but with
> vector operands.
>
> Index: gcc/tree-core.h
> ===================================================================
> --- gcc/tree-core.h 2017-10-23 11:41:25.862065318 +0100
> +++ gcc/tree-core.h 2017-10-23 11:41:51.771059237 +0100
> @@ -975,7 +975,8 @@ struct GTY(()) tree_base {
> /* VEC length. This field is only used with TREE_VEC. */
> int length;
>
> - /* Number of elements. This field is only used with VECTOR_CST. */
> + /* Number of elements. This field is only used with VECTOR_CST
> + and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */
> unsigned int nelts;
>
> /* SSA version number. This field is only used with SSA_NAME. */
> @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
> public_flag:
>
> TREE_OVERFLOW in
> - INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
> + INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
>
> TREE_PUBLIC in
> VAR_DECL, FUNCTION_DECL
> @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
>
> struct GTY(()) tree_vector {
> struct tree_typed typed;
> - tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
> + tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
> };
>
> struct GTY(()) tree_identifier {
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h 2017-10-23 11:41:23.517482774 +0100
> +++ gcc/tree.h 2017-10-23 11:41:51.775882341 +0100
> @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
> #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
> (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
> - there was an overflow in folding. */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> + this means there was an overflow in folding. */
>
> #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1030,6 +1030,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
> #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
> #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
>
> +/* In a VEC_DUPLICATE_CST node. */
> +#define VEC_DUPLICATE_CST_ELT(NODE) \
> + (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
> +
> /* Define fields and accessors for some special-purpose tree nodes. */
>
> #define IDENTIFIER_LENGTH(NODE) \
> @@ -4025,6 +4029,7 @@ extern tree build_int_cst (tree, HOST_WI
> extern tree build_int_cstu (tree type, unsigned HOST_WIDE_INT cst);
> extern tree build_int_cst_type (tree, HOST_WIDE_INT);
> extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
> +extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
> extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
> extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
> extern tree build_vector_from_val (tree, tree);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c 2017-10-23 11:41:23.515548300 +0100
> +++ gcc/tree.c 2017-10-23 11:41:51.774917721 +0100
> @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
> case FIXED_CST: return TS_FIXED_CST;
> case COMPLEX_CST: return TS_COMPLEX;
> case VECTOR_CST: return TS_VECTOR;
> + case VEC_DUPLICATE_CST: return TS_VECTOR;
> case STRING_CST: return TS_STRING;
> /* tcc_exceptional cases. */
> case ERROR_MARK: return TS_COMMON;
> @@ -816,6 +817,7 @@ tree_code_size (enum tree_code code)
> case FIXED_CST: return sizeof (struct tree_fixed_cst);
> case COMPLEX_CST: return sizeof (struct tree_complex);
> case VECTOR_CST: return sizeof (struct tree_vector);
> + case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
> case STRING_CST: gcc_unreachable ();
> default:
> return lang_hooks.tree_size (code);
> @@ -875,6 +877,9 @@ tree_size (const_tree node)
> return (sizeof (struct tree_vector)
> + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
>
> + case VEC_DUPLICATE_CST:
> + return sizeof (struct tree_vector);
> +
> case STRING_CST:
> return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1682,6 +1687,30 @@ cst_and_fits_in_hwi (const_tree x)
> && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
> }
>
> +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
> +
> + Note that this function is only suitable for callers that specifically
> + need a VEC_DUPLICATE_CST node. Use build_vector_from_val to duplicate
> + a general scalar into a general vector type. */
> +
> +tree
> +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
> +{
> + int length = sizeof (struct tree_vector);
> +
> + record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
> +
> + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> + TREE_SET_CODE (t, VEC_DUPLICATE_CST);
> + TREE_TYPE (t) = type;
> + t->base.u.nelts = 1;
> + VEC_DUPLICATE_CST_ELT (t) = exp;
> + TREE_CONSTANT (t) = 1;
> +
> + return t;
> +}
> +
> /* Build a newly constructed VECTOR_CST node of length LEN. */
>
> tree
> @@ -2343,6 +2372,8 @@ integer_zerop (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2369,6 +2400,8 @@ integer_onep (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2407,6 +2440,9 @@ integer_all_onesp (const_tree expr)
> return 1;
> }
>
> + else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
> + return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
> +
> else if (TREE_CODE (expr) != INTEGER_CST)
> return 0;
>
> @@ -2463,7 +2499,7 @@ integer_nonzerop (const_tree expr)
> int
> integer_truep (const_tree expr)
> {
> - if (TREE_CODE (expr) == VECTOR_CST)
> + if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
> return integer_all_onesp (expr);
> return integer_onep (expr);
> }
> @@ -2634,6 +2670,8 @@ real_zerop (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2662,6 +2700,8 @@ real_onep (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return real_onep (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2689,6 +2729,8 @@ real_minus_onep (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -7091,6 +7133,9 @@ add_expr (const_tree t, inchash::hash &h
> inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
> return;
> }
> + case VEC_DUPLICATE_CST:
> + inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
> + return;
> case SSA_NAME:
> /* We can just compare by pointer. */
> hstate.add_wide_int (SSA_NAME_VERSION (t));
> @@ -10345,6 +10390,9 @@ initializer_zerop (const_tree init)
> return true;
> }
>
> + case VEC_DUPLICATE_CST:
> + return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
> +
> case CONSTRUCTOR:
> {
> unsigned HOST_WIDE_INT idx;
> @@ -10390,7 +10438,13 @@ uniform_vector_p (const_tree vec)
>
> gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
>
> - if (TREE_CODE (vec) == VECTOR_CST)
> + if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
> + return VEC_DUPLICATE_CST_ELT (vec);
> +
> + else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
> + return TREE_OPERAND (vec, 0);
> +
> + else if (TREE_CODE (vec) == VECTOR_CST)
> {
> first = VECTOR_CST_ELT (vec, 0);
> for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
> @@ -11095,6 +11149,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
> case REAL_CST:
> case FIXED_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> case BLOCK:
> case PLACEHOLDER_EXPR:
> @@ -12381,6 +12436,12 @@ drop_tree_overflow (tree t)
> elt = drop_tree_overflow (elt);
> }
> }
> + if (TREE_CODE (t) == VEC_DUPLICATE_CST)
> + {
> + tree *elt = &VEC_DUPLICATE_CST_ELT (t);
> + if (TREE_OVERFLOW (*elt))
> + *elt = drop_tree_overflow (*elt);
> + }
> return t;
> }
>
> @@ -13798,6 +13859,92 @@ test_integer_constants ()
> ASSERT_EQ (type, TREE_TYPE (zero));
> }
>
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
> + for integral type TYPE. */
> +
> +static void
> +test_vec_duplicate_predicates_int (tree type)
> +{
> + tree vec_type = build_vector_type (type, 4);
> +
> + tree zero = build_zero_cst (type);
> + tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
> + ASSERT_TRUE (integer_zerop (vec_zero));
> + ASSERT_FALSE (integer_onep (vec_zero));
> + ASSERT_FALSE (integer_minus_onep (vec_zero));
> + ASSERT_FALSE (integer_all_onesp (vec_zero));
> + ASSERT_FALSE (integer_truep (vec_zero));
> + ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> + tree one = build_one_cst (type);
> + tree vec_one = build_vec_duplicate_cst (vec_type, one);
> + ASSERT_FALSE (integer_zerop (vec_one));
> + ASSERT_TRUE (integer_onep (vec_one));
> + ASSERT_FALSE (integer_minus_onep (vec_one));
> + ASSERT_FALSE (integer_all_onesp (vec_one));
> + ASSERT_FALSE (integer_truep (vec_one));
> + ASSERT_FALSE (initializer_zerop (vec_one));
> +
> + tree minus_one = build_minus_one_cst (type);
> + tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
> + ASSERT_FALSE (integer_zerop (vec_minus_one));
> + ASSERT_FALSE (integer_onep (vec_minus_one));
> + ASSERT_TRUE (integer_minus_onep (vec_minus_one));
> + ASSERT_TRUE (integer_all_onesp (vec_minus_one));
> + ASSERT_TRUE (integer_truep (vec_minus_one));
> + ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> + tree x = create_tmp_var_raw (type, "x");
> + tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
> + ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> + ASSERT_EQ (uniform_vector_p (vec_one), one);
> + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> + ASSERT_EQ (uniform_vector_p (vec_x), x);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
> + type TYPE. */
> +
> +static void
> +test_vec_duplicate_predicates_float (tree type)
> +{
> + tree vec_type = build_vector_type (type, 4);
> +
> + tree zero = build_zero_cst (type);
> + tree vec_zero = build_vec_duplicate_cst (vec_type, zero);
> + ASSERT_TRUE (real_zerop (vec_zero));
> + ASSERT_FALSE (real_onep (vec_zero));
> + ASSERT_FALSE (real_minus_onep (vec_zero));
> + ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> + tree one = build_one_cst (type);
> + tree vec_one = build_vec_duplicate_cst (vec_type, one);
> + ASSERT_FALSE (real_zerop (vec_one));
> + ASSERT_TRUE (real_onep (vec_one));
> + ASSERT_FALSE (real_minus_onep (vec_one));
> + ASSERT_FALSE (initializer_zerop (vec_one));
> +
> + tree minus_one = build_minus_one_cst (type);
> + tree vec_minus_one = build_vec_duplicate_cst (vec_type, minus_one);
> + ASSERT_FALSE (real_zerop (vec_minus_one));
> + ASSERT_FALSE (real_onep (vec_minus_one));
> + ASSERT_TRUE (real_minus_onep (vec_minus_one));
> + ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> + ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> + ASSERT_EQ (uniform_vector_p (vec_one), one);
> + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
> +
> +static void
> +test_vec_duplicate_predicates ()
> +{
> + test_vec_duplicate_predicates_int (integer_type_node);
> + test_vec_duplicate_predicates_float (float_type_node);
> +}
> +
> /* Verify identifiers. */
>
> static void
> @@ -13826,6 +13973,7 @@ test_labels ()
> tree_c_tests ()
> {
> test_integer_constants ();
> + test_vec_duplicate_predicates ();
> test_identifiers ();
> test_labels ();
> }
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c 2017-10-23 11:41:23.137358624 +0100
> +++ gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100
> @@ -5049,6 +5049,8 @@ expand_debug_expr (tree exp)
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> case VEC_PERM_EXPR:
> + case VEC_DUPLICATE_CST:
> + case VEC_DUPLICATE_EXPR:
> return NULL;
>
> /* Misc codes. */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100
> @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
> }
> break;
>
> + case VEC_DUPLICATE_CST:
> + pp_string (pp, "{ ");
> + dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
> + pp_string (pp, ", ... }");
> + break;
> +
> case FUNCTION_TYPE:
> case METHOD_TYPE:
> dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
> pp_string (pp, " > ");
> break;
>
> + case VEC_DUPLICATE_EXPR:
> + pp_space (pp);
> + for (str = get_tree_code_name (code); *str; str++)
> + pp_character (pp, TOUPPER (*str));
> + pp_string (pp, " < ");
> + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> + pp_string (pp, " > ");
> + break;
> +
> case VEC_UNPACK_HI_EXPR:
> pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
> dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c 2017-10-23 11:41:24.407340836 +0100
> +++ gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100
> @@ -18862,6 +18862,7 @@ rtl_for_decl_init (tree init, tree type)
> switch (TREE_CODE (init))
> {
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> break;
> case CONSTRUCTOR:
> if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100
> @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
> case FIXED_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c 2017-10-23 11:41:25.531270256 +0100
> +++ gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100
> @@ -11506,6 +11506,7 @@ gimplify_expr (tree *expr_p, gimple_seq
> case STRING_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> /* Drop the overflow flag on constants, we do not want
> that in the GIMPLE IL. */
> if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-isl-ast-to-gimple.c
> ===================================================================
> --- gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:23.205065216 +0100
> +++ gcc/graphite-isl-ast-to-gimple.c 2017-10-23 11:41:51.767200753 +0100
> @@ -222,7 +222,8 @@ enum phi_node_kind
> return TREE_CODE (op) == INTEGER_CST
> || TREE_CODE (op) == REAL_CST
> || TREE_CODE (op) == COMPLEX_CST
> - || TREE_CODE (op) == VECTOR_CST;
> + || TREE_CODE (op) == VECTOR_CST
> + || TREE_CODE (op) == VEC_DUPLICATE_CST;
> }
>
> private:
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c 2017-10-23 11:41:25.533204730 +0100
> +++ gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100
> @@ -1243,6 +1243,7 @@ scan_tree_for_params (sese_info_p s, tre
> case REAL_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> break;
>
> default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100
> @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> case REAL_CST:
> {
> @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> case REAL_CST:
> case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c 2017-10-23 11:41:25.874639400 +0100
> +++ gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100
> @@ -1478,6 +1478,7 @@ sem_item::add_expr (const_tree exp, inch
> case STRING_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> inchash::add_expr (exp, hstate);
> break;
> case CONSTRUCTOR:
> @@ -2030,6 +2031,9 @@ sem_variable::equals (tree t1, tree t2)
>
> return 1;
> }
> + case VEC_DUPLICATE_CST:
> + return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
> + VEC_DUPLICATE_CST_ELT (t2));
> case ARRAY_REF:
> case ARRAY_RANGE_REF:
> {
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/match.pd 2017-10-23 11:41:51.768165374 +0100
> @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (match negate_expr_p
> VECTOR_CST
> (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
> +(match negate_expr_p
> + VEC_DUPLICATE_CST
> + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
>
> /* (-A) * (-B) -> A * B */
> (simplify
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100
> @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
> }
> break;
>
> + case VEC_DUPLICATE_CST:
> + print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
> + break;
> +
> case COMPLEX_CST:
> print_node (file, "real", TREE_REALPART (node), indent + 4);
> print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-chkp.c
> ===================================================================
> --- gcc/tree-chkp.c 2017-10-23 11:41:23.201196268 +0100
> +++ gcc/tree-chkp.c 2017-10-23 11:41:51.770094616 +0100
> @@ -3800,6 +3800,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> if (integer_zerop (ptr_src))
> bounds = chkp_get_none_bounds ();
> else
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c 2017-10-23 11:41:23.228278904 +0100
> +++ gcc/tree-loop-distribution.c 2017-10-23 11:41:51.771059237 +0100
> @@ -921,6 +921,9 @@ const_with_all_bytes_same (tree val)
> && CONSTRUCTOR_NELTS (val) == 0))
> return 0;
>
> + if (TREE_CODE (val) == VEC_DUPLICATE_CST)
> + return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
> +
> if (real_zerop (val))
> {
> /* Only return 0 for +0.0, not for -0.0, which doesn't have
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
> @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
> case STRING_CST:
> case RESULT_DECL:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case COMPLEX_CST:
> case INTEGER_CST:
> case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c 2017-10-23 11:41:25.549647760 +0100
> +++ gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100
> @@ -2675,6 +2675,7 @@ create_component_ref_by_pieces_1 (basic_
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100
> @@ -858,6 +858,7 @@ copy_reference_ops_from_ref (tree ref, v
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case REAL_CST:
> case FIXED_CST:
> case CONSTRUCTOR:
> @@ -1050,6 +1051,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case CONST_DECL:
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
> ssa_uniform_vector_p (tree op)
> {
> if (TREE_CODE (op) == VECTOR_CST
> + || TREE_CODE (op) == VEC_DUPLICATE_CST
> || TREE_CODE (op) == CONSTRUCTOR)
> return uniform_vector_p (op);
> if (TREE_CODE (op) == SSA_NAME)
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c 2017-10-23 11:41:25.822408600 +0100
> +++ gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100
> @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
> CASE_CONVERT:
> return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> + case VEC_DUPLICATE_CST:
> + return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
> +
> default:
> /* A language specific constant. Just hash the code. */
> return code;
> @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
> return 1;
> }
>
> + case VEC_DUPLICATE_CST:
> + return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
> + VEC_DUPLICATE_CST_ELT (t2));
> +
> case CONSTRUCTOR:
> {
> vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c 2017-10-23 11:41:23.535860278 +0100
> +++ gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100
> @@ -418,6 +418,9 @@ negate_expr_p (tree t)
> return true;
> }
>
> + case VEC_DUPLICATE_CST:
> + return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
> +
> case COMPLEX_EXPR:
> return negate_expr_p (TREE_OPERAND (t, 0))
> && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
> return build_vector (type, elts);
> }
>
> + case VEC_DUPLICATE_CST:
> + {
> + tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (type, sub);
> + }
> +
> case COMPLEX_EXPR:
> if (negate_expr_p (t))
> return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
> return build_vector (type, elts);
> }
>
> + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> + && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
> + {
> + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
> + VEC_DUPLICATE_CST_ELT (arg2));
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (TREE_TYPE (arg1), sub);
> + }
> +
> /* Shifts allow a scalar offset for a vector. */
> if (TREE_CODE (arg1) == VECTOR_CST
> && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
>
> return build_vector (type, elts);
> }
> +
> + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> + && TREE_CODE (arg2) == INTEGER_CST)
> + {
> + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (TREE_TYPE (arg1), sub);
> + }
> return NULL_TREE;
> }
>
> @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
> if (i == count)
> return build_vector (type, elements);
> }
> + else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
> + {
> + tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (arg0));
> + if (sub)
> + return build_vector_from_val (type, sub);
> + }
> break;
>
> case TRUTH_NOT_EXPR:
> @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
> return res;
> }
>
> + case VEC_DUPLICATE_EXPR:
> + if (CONSTANT_CLASS_P (arg0))
> + return build_vector_from_val (type, arg0);
> + return NULL_TREE;
> +
> default:
> break;
> }
> @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
> }
> return build_vector (type, v);
> }
> + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> + && (TYPE_VECTOR_SUBPARTS (type)
> + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
> + {
> + tree sub = fold_convert_const (code, TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (arg1));
> + if (sub)
> + return build_vector_from_val (type, sub);
> + }
> }
> return NULL_TREE;
> }
> @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
> return 1;
> }
>
> + case VEC_DUPLICATE_CST:
> + return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
> + VEC_DUPLICATE_CST_ELT (arg1), flags);
> +
> case COMPLEX_CST:
> return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
> flags)
> @@ -7492,6 +7547,20 @@ can_native_interpret_type_p (tree type)
> static tree
> fold_view_convert_expr (tree type, tree expr)
> {
> + /* Recurse on duplicated vectors if the target type is also a vector
> + and if the elements line up. */
> + tree expr_type = TREE_TYPE (expr);
> + if (TREE_CODE (expr) == VEC_DUPLICATE_CST
> + && VECTOR_TYPE_P (type)
> + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
> + && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
> + {
> + tree sub = fold_view_convert_expr (TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (expr));
> + if (sub)
> + return build_vector_from_val (type, sub);
> + }
> +
> /* We support up to 512-bit values (for V8DFmode). */
> unsigned char buffer[64];
> int len;
> @@ -8891,6 +8960,15 @@ exact_inverse (tree type, tree cst)
> return build_vector (type, elts);
> }
>
> + case VEC_DUPLICATE_CST:
> + {
> + tree sub = exact_inverse (TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (cst));
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (type, sub);
> + }
> +
> default:
> return NULL_TREE;
> }
> @@ -11969,6 +12047,9 @@ fold_checksum_tree (const_tree expr, str
> for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
> fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
> break;
> + case VEC_DUPLICATE_CST:
> + fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
> + break;
> default:
> break;
> }
> @@ -14436,6 +14517,36 @@ test_vector_folding ()
> ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
> }
>
> +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
> +
> +static void
> +test_vec_duplicate_folding ()
> +{
> + tree type = build_vector_type (ssizetype, 4);
> + tree dup5 = build_vec_duplicate_cst (type, ssize_int (5));
> + tree dup3 = build_vec_duplicate_cst (type, ssize_int (3));
> +
> + tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
> + ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
> +
> + tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
> + ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
> +
> + tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
> + ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
> +
> + tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
> + ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
> +
> + tree size_vector = build_vector_type (sizetype, 4);
> + tree size_dup5 = fold_convert (size_vector, dup5);
> + ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
> +
> + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
> + tree dup5_cst = build_vector_from_val (type, ssize_int (5));
> + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
> +}
> +
> /* Run all of the selftests within this file. */
>
> void
> @@ -14443,6 +14554,7 @@ fold_const_c_tests ()
> {
> test_arithmetic_folding ();
> test_vector_folding ();
> + test_vec_duplicate_folding ();
> }
>
> } // namespace selftest
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100
> @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
>
> OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
> OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
> +
> +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100
> @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
> return TYPE_UNSIGNED (type) ?
> vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
>
> + case VEC_DUPLICATE_EXPR:
> + return vec_duplicate_optab;
> +
> default:
> break;
> }
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100
> @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
> enum optab_methods methods);
> extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
> enum optab_methods);
> +extern rtx expand_vector_broadcast (machine_mode, rtx);
>
> /* Generate code for a simple binary or unary operation. "Simple" in
> this case means "can be unambiguously described by a (mode, code)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-10-23 11:41:41.549050496 +0100
> +++ gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100
> @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
> mode of OP must be the element mode of VMODE. If OP is a constant,
> then the return value will be a constant. */
>
> -static rtx
> +rtx
> expand_vector_broadcast (machine_mode vmode, rtx op)
> {
> enum insn_code icode;
> @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
> if (CONSTANT_P (op))
> return gen_const_vec_duplicate (vmode, op);
>
> + icode = optab_handler (vec_duplicate_optab, vmode);
> + if (icode != CODE_FOR_nothing)
> + {
> + struct expand_operand ops[2];
> + create_output_operand (&ops[0], NULL_RTX, vmode);
> + create_input_operand (&ops[1], op, GET_MODE (op));
> + expand_insn (icode, 2, ops);
> + return ops[0].value;
> + }
> +
> /* ??? If the target doesn't have a vec_init, then we have no easy way
> of performing this operation. Most of this sort of generic support
> is hidden away in the vector lowering support in gimple. */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c 2017-10-23 11:41:39.187050437 +0100
> +++ gcc/expr.c 2017-10-23 11:41:51.764306890 +0100
> @@ -6572,7 +6572,8 @@ store_constructor (tree exp, rtx target,
> constructor_elt *ce;
> int i;
> int need_to_clear;
> - int icode = CODE_FOR_nothing;
> + insn_code icode = CODE_FOR_nothing;
> + tree elt;
> tree elttype = TREE_TYPE (type);
> int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> machine_mode eltmode = TYPE_MODE (elttype);
> @@ -6582,13 +6583,30 @@ store_constructor (tree exp, rtx target,
> unsigned n_elts;
> alias_set_type alias;
> bool vec_vec_init_p = false;
> + machine_mode mode = GET_MODE (target);
>
> gcc_assert (eltmode != BLKmode);
>
> + /* Try using vec_duplicate_optab for uniform vectors. */
> + if (!TREE_SIDE_EFFECTS (exp)
> + && VECTOR_MODE_P (mode)
> + && eltmode == GET_MODE_INNER (mode)
> + && ((icode = optab_handler (vec_duplicate_optab, mode))
> + != CODE_FOR_nothing)
> + && (elt = uniform_vector_p (exp)))
> + {
> + struct expand_operand ops[2];
> + create_output_operand (&ops[0], target, mode);
> + create_input_operand (&ops[1], expand_normal (elt), eltmode);
> + expand_insn (icode, 2, ops);
> + if (!rtx_equal_p (target, ops[0].value))
> + emit_move_insn (target, ops[0].value);
> + break;
> + }
> +
> n_elts = TYPE_VECTOR_SUBPARTS (type);
> - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
> + if (REG_P (target) && VECTOR_MODE_P (mode))
> {
> - machine_mode mode = GET_MODE (target);
> machine_mode emode = eltmode;
>
> if (CONSTRUCTOR_NELTS (exp)
> @@ -6600,7 +6618,7 @@ store_constructor (tree exp, rtx target,
> == n_elts);
> emode = TYPE_MODE (etype);
> }
> - icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
> + icode = convert_optab_handler (vec_init_optab, mode, emode);
> if (icode != CODE_FOR_nothing)
> {
> unsigned int i, n = n_elts;
> @@ -6648,7 +6666,7 @@ store_constructor (tree exp, rtx target,
> if (need_to_clear && size > 0 && !vector)
> {
> if (REG_P (target))
> - emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> + emit_move_insn (target, CONST0_RTX (mode));
> else
> clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
> cleared = 1;
> @@ -6656,7 +6674,7 @@ store_constructor (tree exp, rtx target,
>
> /* Inform later passes that the old value is dead. */
> if (!cleared && !vector && REG_P (target))
> - emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> + emit_move_insn (target, CONST0_RTX (mode));
>
> if (MEM_P (target))
> alias = MEM_ALIAS_SET (target);
> @@ -6707,8 +6725,7 @@ store_constructor (tree exp, rtx target,
>
> if (vector)
> emit_insn (GEN_FCN (icode) (target,
> - gen_rtx_PARALLEL (GET_MODE (target),
> - vector)));
> + gen_rtx_PARALLEL (mode, vector)));
> break;
> }
>
> @@ -7686,6 +7703,19 @@ expand_operands (tree exp0, tree exp1, r
> }
>
>
> +/* Expand constant vector element ELT, which has mode MODE. This is used
> + for members of VECTOR_CST and VEC_DUPLICATE_CST. */
> +
> +static rtx
> +const_vector_element (scalar_mode mode, const_tree elt)
> +{
> + if (TREE_CODE (elt) == REAL_CST)
> + return const_double_from_real_value (TREE_REAL_CST (elt), mode);
> + if (TREE_CODE (elt) == FIXED_CST)
> + return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
> + return immed_wide_int_const (wi::to_wide (elt), mode);
> +}
> +
> /* Return a MEM that contains constant EXP. DEFER is as for
> output_constant_def and MODIFIER is as for expand_expr. */
>
> @@ -9551,6 +9581,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
> target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
> return target;
>
> + case VEC_DUPLICATE_EXPR:
> + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
> + target = expand_vector_broadcast (mode, op0);
> + gcc_assert (target);
> + return target;
> +
> case BIT_INSERT_EXPR:
> {
> unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -10003,6 +10039,11 @@ expand_expr_real_1 (tree exp, rtx target
> tmode, modifier);
> }
>
> + case VEC_DUPLICATE_CST:
> + op0 = const_vector_element (GET_MODE_INNER (mode),
> + VEC_DUPLICATE_CST_ELT (exp));
> + return gen_const_vec_duplicate (mode, op0);
> +
> case CONST_DECL:
> if (modifier == EXPAND_WRITE)
> {
> @@ -11764,8 +11805,7 @@ const_vector_from_tree (tree exp)
> {
> rtvec v;
> unsigned i, units;
> - tree elt;
> - machine_mode inner, mode;
> + machine_mode mode;
>
> mode = TYPE_MODE (TREE_TYPE (exp));
>
> @@ -11776,23 +11816,12 @@ const_vector_from_tree (tree exp)
> return const_vector_mask_from_tree (exp);
>
> units = VECTOR_CST_NELTS (exp);
> - inner = GET_MODE_INNER (mode);
>
> v = rtvec_alloc (units);
>
> for (i = 0; i < units; ++i)
> - {
> - elt = VECTOR_CST_ELT (exp, i);
> -
> - if (TREE_CODE (elt) == REAL_CST)
> - RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
> - inner);
> - else if (TREE_CODE (elt) == FIXED_CST)
> - RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
> - inner);
> - else
> - RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
> - }
> + RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
> + VECTOR_CST_ELT (exp, i));
>
> return gen_rtx_CONST_VECTOR (mode, v);
> }
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c 2017-10-23 11:41:23.529089619 +0100
> +++ gcc/internal-fn.c 2017-10-23 11:41:51.767200753 +0100
> @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
> emit_move_insn (cntvar, const0_rtx);
> emit_label (loop_lab);
> }
> - if (TREE_CODE (arg0) != VECTOR_CST)
> + if (!CONSTANT_CLASS_P (arg0))
> {
> rtx arg0r = expand_normal (arg0);
> arg0 = make_tree (TREE_TYPE (arg0), arg0r);
> }
> - if (TREE_CODE (arg1) != VECTOR_CST)
> + if (!CONSTANT_CLASS_P (arg1))
> {
> rtx arg1r = expand_normal (arg1);
> arg1 = make_tree (TREE_TYPE (arg1), arg1r);
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c 2017-10-23 11:41:25.864967029 +0100
> +++ gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100
> @@ -3803,6 +3803,17 @@ verify_gimple_assign_unary (gassign *stm
> case CONJ_EXPR:
> break;
>
> + case VEC_DUPLICATE_EXPR:
> + if (TREE_CODE (lhs_type) != VECTOR_TYPE
> + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> + {
> + error ("vec_duplicate should be from a scalar to a like vector");
> + debug_generic_expr (lhs_type);
> + debug_generic_expr (rhs1_type);
> + return true;
> + }
> + return false;
> +
> default:
> gcc_unreachable ();
> }
> @@ -4473,6 +4484,7 @@ verify_gimple_assign_single (gassign *st
> case FIXED_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> return res;
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c 2017-10-23 11:41:25.833048208 +0100
> +++ gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100
> @@ -4002,6 +4002,7 @@ estimate_operator_cost (enum tree_code c
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> + case VEC_DUPLICATE_EXPR:
>
> return 1;
>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [08/nn] Add a fixed_size_mode class
2017-10-23 11:22 ` [08/nn] Add a fixed_size_mode class Richard Sandiford
@ 2017-10-26 11:57 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:57 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a fixed_size_mode machine_mode wrapper
> for modes that are known to have a fixed size. That applies
> to all current modes, but future patches will add support for
> variable-sized modes.
>
> The use of this class should be pretty restricted. One important
> use case is to hold the mode of static data, which can never be
> variable-sized with current file formats. Another is to hold
> the modes of registers involved in __builtin_apply and
> __builtin_result, since those interfaces don't cope well with
> variable-sized data.
>
> The class can also be useful when reinterpreting the contents of
> a fixed-length bit string as a different kind of value.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * machmode.h (fixed_size_mode): New class.
> * rtl.h (get_pool_mode): Return fixed_size_mode.
> * gengtype.c (main): Add fixed_size_mode.
> * target.def (get_raw_result_mode): Return a fixed_size_mode.
> (get_raw_arg_mode): Likewise.
> * doc/tm.texi: Regenerate.
> * targhooks.h (default_get_reg_raw_mode): Return a fixed_size_mode.
> * targhooks.c (default_get_reg_raw_mode): Likewise.
> * config/ia64/ia64.c (ia64_get_reg_raw_mode): Likewise.
> * config/mips/mips.c (mips_get_reg_raw_mode): Likewise.
> * config/msp430/msp430.c (msp430_get_raw_arg_mode): Likewise.
> (msp430_get_raw_result_mode): Likewise.
> * config/avr/avr-protos.h (regmask): Use as_a <fixed_side_mode>
> * dbxout.c (dbxout_parms): Require fixed-size modes.
> * expr.c (copy_blkmode_from_reg, copy_blkmode_to_reg): Likewise.
> * gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
> * omp-low.c (lower_oacc_reductions): Likewise.
> * simplify-rtx.c (simplify_immed_subreg): Take fixed_size_modes.
> (simplify_subreg): Update accordingly.
> * varasm.c (constant_descriptor_rtx::mode): Change to fixed_size_mode.
> (force_const_mem): Update accordingly. Return NULL_RTX for modes
> that aren't fixed-size.
> (get_pool_mode): Return a fixed_size_mode.
> (output_constant_pool_2): Take a fixed_size_mode.
>
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h 2017-09-15 14:47:33.184331588 +0100
> +++ gcc/machmode.h 2017-10-23 11:42:52.014721093 +0100
> @@ -652,6 +652,39 @@ GET_MODE_2XWIDER_MODE (const T &m)
> extern const unsigned char mode_complex[NUM_MACHINE_MODES];
> #define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
>
> +/* Represents a machine mode that must have a fixed size. The main
> + use of this class is to represent the modes of objects that always
> + have static storage duration, such as constant pool entries.
> + (No current target supports the concept of variable-size static data.) */
> +class fixed_size_mode
> +{
> +public:
> + typedef mode_traits<fixed_size_mode>::from_int from_int;
> +
> + ALWAYS_INLINE fixed_size_mode () {}
> + ALWAYS_INLINE fixed_size_mode (from_int m) : m_mode (machine_mode (m)) {}
> + ALWAYS_INLINE fixed_size_mode (const scalar_mode &m) : m_mode (m) {}
> + ALWAYS_INLINE fixed_size_mode (const scalar_int_mode &m) : m_mode (m) {}
> + ALWAYS_INLINE fixed_size_mode (const scalar_float_mode &m) : m_mode (m) {}
> + ALWAYS_INLINE fixed_size_mode (const scalar_mode_pod &m) : m_mode (m) {}
> + ALWAYS_INLINE fixed_size_mode (const scalar_int_mode_pod &m) : m_mode (m) {}
> + ALWAYS_INLINE fixed_size_mode (const complex_mode &m) : m_mode (m) {}
> + ALWAYS_INLINE operator machine_mode () const { return m_mode; }
> +
> + static bool includes_p (machine_mode);
> +
> +protected:
> + machine_mode m_mode;
> +};
> +
> +/* Return true if MODE has a fixed size. */
> +
> +inline bool
> +fixed_size_mode::includes_p (machine_mode)
> +{
> + return true;
> +}
> +
> extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
>
> /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
> Index: gcc/rtl.h
> ===================================================================
> --- gcc/rtl.h 2017-10-23 11:42:47.297720974 +0100
> +++ gcc/rtl.h 2017-10-23 11:42:52.015721094 +0100
> @@ -3020,7 +3020,7 @@ extern rtx force_const_mem (machine_mode
> struct function;
> extern rtx get_pool_constant (const_rtx);
> extern rtx get_pool_constant_mark (rtx, bool *);
> -extern machine_mode get_pool_mode (const_rtx);
> +extern fixed_size_mode get_pool_mode (const_rtx);
> extern rtx simplify_subtraction (rtx);
> extern void decide_function_section (tree);
>
> Index: gcc/gengtype.c
> ===================================================================
> --- gcc/gengtype.c 2017-05-23 19:29:56.919436344 +0100
> +++ gcc/gengtype.c 2017-10-23 11:42:52.014721093 +0100
> @@ -5197,6 +5197,7 @@ #define POS_HERE(Call) do { pos.file = t
> POS_HERE (do_scalar_typedef ("JCF_u2", &pos));
> POS_HERE (do_scalar_typedef ("void", &pos));
> POS_HERE (do_scalar_typedef ("machine_mode", &pos));
> + POS_HERE (do_scalar_typedef ("fixed_size_mode", &pos));
> POS_HERE (do_typedef ("PTR",
> create_pointer (resolve_typedef ("void", &pos)),
> &pos));
> Index: gcc/target.def
> ===================================================================
> --- gcc/target.def 2017-10-23 11:41:23.134456913 +0100
> +++ gcc/target.def 2017-10-23 11:42:52.017721094 +0100
> @@ -5021,7 +5021,7 @@ DEFHOOK
> "This target hook returns the mode to be used when accessing raw return\
> registers in @code{__builtin_return}. Define this macro if the value\
> in @var{reg_raw_mode} is not correct.",
> - machine_mode, (int regno),
> + fixed_size_mode, (int regno),
> default_get_reg_raw_mode)
>
> /* Return a mode wide enough to copy any argument value that might be
> @@ -5031,7 +5031,7 @@ DEFHOOK
> "This target hook returns the mode to be used when accessing raw argument\
> registers in @code{__builtin_apply_args}. Define this macro if the value\
> in @var{reg_raw_mode} is not correct.",
> - machine_mode, (int regno),
> + fixed_size_mode, (int regno),
> default_get_reg_raw_mode)
>
> HOOK_VECTOR_END (calls)
> Index: gcc/doc/tm.texi
> ===================================================================
> --- gcc/doc/tm.texi 2017-10-23 11:41:22.175925023 +0100
> +++ gcc/doc/tm.texi 2017-10-23 11:42:52.012721093 +0100
> @@ -4536,11 +4536,11 @@ This macro has effect in @option{-fpcc-s
> nothing when you use @option{-freg-struct-return} mode.
> @end defmac
>
> -@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
> +@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_RESULT_MODE (int @var{regno})
> This target hook returns the mode to be used when accessing raw return registers in @code{__builtin_return}. Define this macro if the value in @var{reg_raw_mode} is not correct.
> @end deftypefn
>
> -@deftypefn {Target Hook} machine_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
> +@deftypefn {Target Hook} fixed_size_mode TARGET_GET_RAW_ARG_MODE (int @var{regno})
> This target hook returns the mode to be used when accessing raw argument registers in @code{__builtin_apply_args}. Define this macro if the value in @var{reg_raw_mode} is not correct.
> @end deftypefn
>
> Index: gcc/targhooks.h
> ===================================================================
> --- gcc/targhooks.h 2017-10-02 09:08:43.318933786 +0100
> +++ gcc/targhooks.h 2017-10-23 11:42:52.017721094 +0100
> @@ -233,7 +233,7 @@ extern int default_jump_align_max_skip (
> extern section * default_function_section(tree decl, enum node_frequency freq,
> bool startup, bool exit);
> extern machine_mode default_dwarf_frame_reg_mode (int);
> -extern machine_mode default_get_reg_raw_mode (int);
> +extern fixed_size_mode default_get_reg_raw_mode (int);
> extern bool default_keep_leaf_when_profiled ();
>
> extern void *default_get_pch_validity (size_t *);
> Index: gcc/targhooks.c
> ===================================================================
> --- gcc/targhooks.c 2017-10-23 11:41:23.195392846 +0100
> +++ gcc/targhooks.c 2017-10-23 11:42:52.017721094 +0100
> @@ -1834,10 +1834,12 @@ default_dwarf_frame_reg_mode (int regno)
> /* To be used by targets where reg_raw_mode doesn't return the right
> mode for registers used in apply_builtin_return and apply_builtin_arg. */
>
> -machine_mode
> +fixed_size_mode
> default_get_reg_raw_mode (int regno)
> {
> - return reg_raw_mode[regno];
> + /* Targets must override this hook if the underlying register is
> + variable-sized. */
> + return as_a <fixed_size_mode> (reg_raw_mode[regno]);
> }
>
> /* Return true if a leaf function should stay leaf even with profiling
> Index: gcc/config/ia64/ia64.c
> ===================================================================
> --- gcc/config/ia64/ia64.c 2017-10-23 11:41:32.363050263 +0100
> +++ gcc/config/ia64/ia64.c 2017-10-23 11:42:52.009721093 +0100
> @@ -329,7 +329,7 @@ static tree ia64_fold_builtin (tree, int
> static tree ia64_builtin_decl (unsigned, bool);
>
> static reg_class_t ia64_preferred_reload_class (rtx, reg_class_t);
> -static machine_mode ia64_get_reg_raw_mode (int regno);
> +static fixed_size_mode ia64_get_reg_raw_mode (int regno);
> static section * ia64_hpux_function_section (tree, enum node_frequency,
> bool, bool);
>
> @@ -11328,7 +11328,7 @@ ia64_dconst_0_375 (void)
> return ia64_dconst_0_375_rtx;
> }
>
> -static machine_mode
> +static fixed_size_mode
> ia64_get_reg_raw_mode (int regno)
> {
> if (FR_REGNO_P (regno))
> Index: gcc/config/mips/mips.c
> ===================================================================
> --- gcc/config/mips/mips.c 2017-10-23 11:41:32.365050264 +0100
> +++ gcc/config/mips/mips.c 2017-10-23 11:42:52.010721093 +0100
> @@ -1132,7 +1132,6 @@ static rtx mips_find_pic_call_symbol (rt
> static int mips_register_move_cost (machine_mode, reg_class_t,
> reg_class_t);
> static unsigned int mips_function_arg_boundary (machine_mode, const_tree);
> -static machine_mode mips_get_reg_raw_mode (int regno);
> static rtx mips_gen_const_int_vector_shuffle (machine_mode, int);
>
> /* This hash table keeps track of implicit "mips16" and "nomips16" attributes
> @@ -6111,7 +6110,7 @@ mips_function_arg_boundary (machine_mode
>
> /* Implement TARGET_GET_RAW_RESULT_MODE and TARGET_GET_RAW_ARG_MODE. */
>
> -static machine_mode
> +static fixed_size_mode
> mips_get_reg_raw_mode (int regno)
> {
> if (TARGET_FLOATXX && FP_REG_P (regno))
> Index: gcc/config/msp430/msp430.c
> ===================================================================
> --- gcc/config/msp430/msp430.c 2017-10-23 11:41:23.047405581 +0100
> +++ gcc/config/msp430/msp430.c 2017-10-23 11:42:52.011721093 +0100
> @@ -1398,16 +1398,17 @@ msp430_return_in_memory (const_tree ret_
> #undef TARGET_GET_RAW_ARG_MODE
> #define TARGET_GET_RAW_ARG_MODE msp430_get_raw_arg_mode
>
> -static machine_mode
> +static fixed_size_mode
> msp430_get_raw_arg_mode (int regno)
> {
> - return (regno == ARG_POINTER_REGNUM) ? VOIDmode : Pmode;
> + return as_a <fixed_size_mode> (regno == ARG_POINTER_REGNUM
> + ? VOIDmode : Pmode);
> }
>
> #undef TARGET_GET_RAW_RESULT_MODE
> #define TARGET_GET_RAW_RESULT_MODE msp430_get_raw_result_mode
>
> -static machine_mode
> +static fixed_size_mode
> msp430_get_raw_result_mode (int regno ATTRIBUTE_UNUSED)
> {
> return Pmode;
> Index: gcc/config/avr/avr-protos.h
> ===================================================================
> --- gcc/config/avr/avr-protos.h 2017-10-23 11:41:22.812366984 +0100
> +++ gcc/config/avr/avr-protos.h 2017-10-23 11:42:52.007721093 +0100
> @@ -132,7 +132,7 @@ extern bool avr_casei_sequence_check_ope
> static inline unsigned
> regmask (machine_mode mode, unsigned regno)
> {
> - return ((1u << GET_MODE_SIZE (mode)) - 1) << regno;
> + return ((1u << GET_MODE_SIZE (as_a <fixed_size_mode> (mode))) - 1) << regno;
> }
>
> extern void avr_fix_inputs (rtx*, unsigned, unsigned);
> Index: gcc/dbxout.c
> ===================================================================
> --- gcc/dbxout.c 2017-10-10 17:55:22.088175460 +0100
> +++ gcc/dbxout.c 2017-10-23 11:42:52.011721093 +0100
> @@ -3393,12 +3393,16 @@ dbxout_parms (tree parms)
> {
> ++debug_nesting;
> emit_pending_bincls_if_required ();
> + fixed_size_mode rtl_mode, type_mode;
>
> for (; parms; parms = DECL_CHAIN (parms))
> if (DECL_NAME (parms)
> && TREE_TYPE (parms) != error_mark_node
> && DECL_RTL_SET_P (parms)
> - && DECL_INCOMING_RTL (parms))
> + && DECL_INCOMING_RTL (parms)
> + /* We can't represent variable-sized types in this format. */
> + && is_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (parms)), &type_mode)
> + && is_a <fixed_size_mode> (GET_MODE (DECL_RTL (parms)), &rtl_mode))
> {
> tree eff_type;
> char letter;
> @@ -3555,10 +3559,9 @@ dbxout_parms (tree parms)
> /* Make a big endian correction if the mode of the type of the
> parameter is not the same as the mode of the rtl. */
> if (BYTES_BIG_ENDIAN
> - && TYPE_MODE (TREE_TYPE (parms)) != GET_MODE (DECL_RTL (parms))
> - && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))) < UNITS_PER_WORD)
> - number += (GET_MODE_SIZE (GET_MODE (DECL_RTL (parms)))
> - - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (parms))));
> + && type_mode != rtl_mode
> + && GET_MODE_SIZE (type_mode) < UNITS_PER_WORD)
> + number += GET_MODE_SIZE (rtl_mode) - GET_MODE_SIZE (type_mode);
> }
> else
> /* ??? We don't know how to represent this argument. */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c 2017-10-23 11:42:34.915720660 +0100
> +++ gcc/expr.c 2017-10-23 11:42:52.013721093 +0100
> @@ -2628,9 +2628,10 @@ copy_blkmode_from_reg (rtx target, rtx s
> rtx src = NULL, dst = NULL;
> unsigned HOST_WIDE_INT bitsize = MIN (TYPE_ALIGN (type), BITS_PER_WORD);
> unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0;
> - machine_mode mode = GET_MODE (srcreg);
> - machine_mode tmode = GET_MODE (target);
> - machine_mode copy_mode;
> + /* No current ABI uses variable-sized modes to pass a BLKmnode type. */
> + fixed_size_mode mode = as_a <fixed_size_mode> (GET_MODE (srcreg));
> + fixed_size_mode tmode = as_a <fixed_size_mode> (GET_MODE (target));
> + fixed_size_mode copy_mode;
>
> /* BLKmode registers created in the back-end shouldn't have survived. */
> gcc_assert (mode != BLKmode);
> @@ -2728,19 +2729,21 @@ copy_blkmode_from_reg (rtx target, rtx s
> }
> }
>
> -/* Copy BLKmode value SRC into a register of mode MODE. Return the
> +/* Copy BLKmode value SRC into a register of mode MODE_IN. Return the
> register if it contains any data, otherwise return null.
>
> This is used on targets that return BLKmode values in registers. */
>
> rtx
> -copy_blkmode_to_reg (machine_mode mode, tree src)
> +copy_blkmode_to_reg (machine_mode mode_in, tree src)
> {
> int i, n_regs;
> unsigned HOST_WIDE_INT bitpos, xbitpos, padding_correction = 0, bytes;
> unsigned int bitsize;
> rtx *dst_words, dst, x, src_word = NULL_RTX, dst_word = NULL_RTX;
> - machine_mode dst_mode;
> + /* No current ABI uses variable-sized modes to pass a BLKmnode type. */
> + fixed_size_mode mode = as_a <fixed_size_mode> (mode_in);
> + fixed_size_mode dst_mode;
>
> gcc_assert (TYPE_MODE (TREE_TYPE (src)) == BLKmode);
>
> Index: gcc/gimple-ssa-store-merging.c
> ===================================================================
> --- gcc/gimple-ssa-store-merging.c 2017-10-09 11:50:52.446411111 +0100
> +++ gcc/gimple-ssa-store-merging.c 2017-10-23 11:42:52.014721093 +0100
> @@ -401,8 +401,11 @@ encode_tree_to_bitpos (tree expr, unsign
> The awkwardness comes from the fact that bitpos is counted from the
> most significant bit of a byte. */
>
> + /* We must be dealing with fixed-size data at this point, since the
> + total size is also fixed. */
> + fixed_size_mode mode = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (expr)));
> /* Allocate an extra byte so that we have space to shift into. */
> - unsigned int byte_size = GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (expr))) + 1;
> + unsigned int byte_size = GET_MODE_SIZE (mode) + 1;
> unsigned char *tmpbuf = XALLOCAVEC (unsigned char, byte_size);
> memset (tmpbuf, '\0', byte_size);
> /* The store detection code should only have allowed constants that are
> Index: gcc/omp-low.c
> ===================================================================
> --- gcc/omp-low.c 2017-10-10 17:55:22.100175459 +0100
> +++ gcc/omp-low.c 2017-10-23 11:42:52.015721094 +0100
> @@ -5067,8 +5067,10 @@ lower_oacc_reductions (location_t loc, t
> v1 = v2 = v3 = var;
>
> /* Determine position in reduction buffer, which may be used
> - by target. */
> - machine_mode mode = TYPE_MODE (TREE_TYPE (var));
> + by target. The parser has ensured that this is not a
> + variable-sized type. */
> + fixed_size_mode mode
> + = as_a <fixed_size_mode> (TYPE_MODE (TREE_TYPE (var)));
> unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
> offset = (offset + align - 1) & ~(align - 1);
> tree off = build_int_cst (sizetype, offset);
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c 2017-10-23 11:41:41.550050496 +0100
> +++ gcc/simplify-rtx.c 2017-10-23 11:42:52.016721094 +0100
> @@ -48,8 +48,6 @@ #define HWI_SIGN_EXTEND(low) \
> static rtx neg_const_int (machine_mode, const_rtx);
> static bool plus_minus_operand_p (const_rtx);
> static rtx simplify_plus_minus (enum rtx_code, machine_mode, rtx, rtx);
> -static rtx simplify_immed_subreg (machine_mode, rtx, machine_mode,
> - unsigned int);
> static rtx simplify_associative_operation (enum rtx_code, machine_mode,
> rtx, rtx);
> static rtx simplify_relational_operation_1 (enum rtx_code, machine_mode,
> @@ -5802,8 +5800,8 @@ simplify_ternary_operation (enum rtx_cod
> and then repacking them again for OUTERMODE. */
>
> static rtx
> -simplify_immed_subreg (machine_mode outermode, rtx op,
> - machine_mode innermode, unsigned int byte)
> +simplify_immed_subreg (fixed_size_mode outermode, rtx op,
> + fixed_size_mode innermode, unsigned int byte)
> {
> enum {
> value_bit = 8,
> @@ -6171,7 +6169,18 @@ simplify_subreg (machine_mode outermode,
> || CONST_DOUBLE_AS_FLOAT_P (op)
> || GET_CODE (op) == CONST_FIXED
> || GET_CODE (op) == CONST_VECTOR)
> - return simplify_immed_subreg (outermode, op, innermode, byte);
> + {
> + /* simplify_immed_subreg deconstructs OP into bytes and constructs
> + the result from bytes, so it only works if the sizes of the modes
> + are known at compile time. Cases that apply to general modes
> + should be handled here before calling simplify_immed_subreg. */
> + fixed_size_mode fs_outermode, fs_innermode;
> + if (is_a <fixed_size_mode> (outermode, &fs_outermode)
> + && is_a <fixed_size_mode> (innermode, &fs_innermode))
> + return simplify_immed_subreg (fs_outermode, op, fs_innermode, byte);
> +
> + return NULL_RTX;
> + }
>
> /* Changing mode twice with SUBREG => just change it once,
> or not at all if changing back op starting mode. */
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c 2017-10-23 11:42:34.927720660 +0100
> +++ gcc/varasm.c 2017-10-23 11:42:52.018721094 +0100
> @@ -3584,7 +3584,7 @@ struct GTY((chain_next ("%h.next"), for_
> rtx constant;
> HOST_WIDE_INT offset;
> hashval_t hash;
> - machine_mode mode;
> + fixed_size_mode mode;
> unsigned int align;
> int labelno;
> int mark;
> @@ -3760,10 +3760,11 @@ simplify_subtraction (rtx x)
> }
>
> /* Given a constant rtx X, make (or find) a memory constant for its value
> - and return a MEM rtx to refer to it in memory. */
> + and return a MEM rtx to refer to it in memory. IN_MODE is the mode
> + of X. */
>
> rtx
> -force_const_mem (machine_mode mode, rtx x)
> +force_const_mem (machine_mode in_mode, rtx x)
> {
> struct constant_descriptor_rtx *desc, tmp;
> struct rtx_constant_pool *pool;
> @@ -3772,6 +3773,11 @@ force_const_mem (machine_mode mode, rtx
> hashval_t hash;
> unsigned int align;
> constant_descriptor_rtx **slot;
> + fixed_size_mode mode;
> +
> + /* We can't force variable-sized objects to memory. */
> + if (!is_a <fixed_size_mode> (in_mode, &mode))
> + return NULL_RTX;
>
> /* If we're not allowed to drop X into the constant pool, don't. */
> if (targetm.cannot_force_const_mem (mode, x))
> @@ -3881,7 +3887,7 @@ get_pool_constant_mark (rtx addr, bool *
>
> /* Similar, return the mode. */
>
> -machine_mode
> +fixed_size_mode
> get_pool_mode (const_rtx addr)
> {
> return SYMBOL_REF_CONSTANT (addr)->mode;
> @@ -3901,7 +3907,7 @@ constant_pool_empty_p (void)
> in MODE with known alignment ALIGN. */
>
> static void
> -output_constant_pool_2 (machine_mode mode, rtx x, unsigned int align)
> +output_constant_pool_2 (fixed_size_mode mode, rtx x, unsigned int align)
> {
> switch (GET_MODE_CLASS (mode))
> {
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [12/nn] Add an is_narrower_int_mode helper function
2017-10-23 11:25 ` [12/nn] Add an is_narrower_int_mode helper function Richard Sandiford
@ 2017-10-26 11:59 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:59 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:24 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a function for testing whether an arbitrary mode X
> is an integer mode that is narrower than integer mode Y. This is
> useful for code like expand_float and expand_fix that could in
> principle handle vectors as well as scalars.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * machmode.h (is_narrower_int_mode): New function
> * optabs.c (expand_float, expand_fix): Use it.
> * dwarf2out.c (rotate_loc_descriptor): Likewise.
>
> Index: gcc/machmode.h
> ===================================================================
> --- gcc/machmode.h 2017-10-23 11:44:06.561720156 +0100
> +++ gcc/machmode.h 2017-10-23 11:44:23.979432614 +0100
> @@ -893,6 +893,17 @@ is_complex_float_mode (machine_mode mode
> return false;
> }
>
> +/* Return true if MODE is a scalar integer mode with a precision
> + smaller than LIMIT's precision. */
> +
> +inline bool
> +is_narrower_int_mode (machine_mode mode, scalar_int_mode limit)
> +{
> + scalar_int_mode int_mode;
> + return (is_a <scalar_int_mode> (mode, &int_mode)
> + && GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (limit));
> +}
> +
> namespace mode_iterator
> {
> /* Start mode iterator *ITER at the first mode in class MCLASS, if any. */
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-10-23 11:44:07.732431531 +0100
> +++ gcc/optabs.c 2017-10-23 11:44:23.980398548 +0100
> @@ -4820,7 +4820,7 @@ expand_float (rtx to, rtx from, int unsi
> rtx value;
> convert_optab tab = unsignedp ? ufloat_optab : sfloat_optab;
>
> - if (GET_MODE_PRECISION (GET_MODE (from)) < GET_MODE_PRECISION (SImode))
> + if (is_narrower_int_mode (GET_MODE (from), SImode))
> from = convert_to_mode (SImode, from, unsignedp);
>
> libfunc = convert_optab_libfunc (tab, GET_MODE (to), GET_MODE (from));
> @@ -5002,7 +5002,7 @@ expand_fix (rtx to, rtx from, int unsign
> that the mode of TO is at least as wide as SImode, since those are the
> only library calls we know about. */
>
> - if (GET_MODE_PRECISION (GET_MODE (to)) < GET_MODE_PRECISION (SImode))
> + if (is_narrower_int_mode (GET_MODE (to), SImode))
> {
> target = gen_reg_rtx (SImode);
>
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c 2017-10-23 11:44:05.684652559 +0100
> +++ gcc/dwarf2out.c 2017-10-23 11:44:23.979432614 +0100
> @@ -14530,8 +14530,7 @@ rotate_loc_descriptor (rtx rtl, scalar_i
> dw_loc_descr_ref op0, op1, ret, mask[2] = { NULL, NULL };
> int i;
>
> - if (GET_MODE (rtlop1) != VOIDmode
> - && GET_MODE_BITSIZE (GET_MODE (rtlop1)) < GET_MODE_BITSIZE (mode))
> + if (is_narrower_int_mode (GET_MODE (rtlop1), mode))
> rtlop1 = gen_rtx_ZERO_EXTEND (mode, rtlop1);
> op0 = mem_loc_descriptor (XEXP (rtl, 0), mode, mem_mode,
> VAR_INIT_STATUS_INITIALIZED);
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-23 11:23 ` [09/nn] Add a fixed_size_mode_pod class Richard Sandiford
@ 2017-10-26 11:59 ` Richard Biener
2017-10-26 12:18 ` Richard Sandiford
0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 11:59 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a POD version of fixed_size_mode. The only current use
> is for storing the __builtin_apply and __builtin_result register modes,
> which were made fixed_size_modes by the previous patch.
Bah - can we update our host compiler to C++11/14 please ...?
(maybe requiring that build with GCC 4.8 as host compiler works,
GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
Ok.
Thanks,
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * coretypes.h (fixed_size_mode): Declare.
> (fixed_size_mode_pod): New typedef.
> * builtins.h (target_builtins::x_apply_args_mode)
> (target_builtins::x_apply_result_mode): Change type to
> fixed_size_mode_pod.
> * builtins.c (apply_args_size, apply_result_size, result_vector)
> (expand_builtin_apply_args_1, expand_builtin_apply)
> (expand_builtin_return): Update accordingly.
>
> Index: gcc/coretypes.h
> ===================================================================
> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
> class scalar_int_mode;
> class scalar_float_mode;
> class complex_mode;
> +class fixed_size_mode;
> template<typename> class opt_mode;
> typedef opt_mode<scalar_mode> opt_scalar_mode;
> typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
> @@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
> template<typename> class pod_mode;
> typedef pod_mode<scalar_mode> scalar_mode_pod;
> typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
> +typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
>
> /* Subclasses of rtx_def, using indentation to show the class
> hierarchy, along with the relevant invariant.
> Index: gcc/builtins.h
> ===================================================================
> --- gcc/builtins.h 2017-08-30 12:18:46.602740973 +0100
> +++ gcc/builtins.h 2017-10-23 11:42:57.592545063 +0100
> @@ -29,14 +29,14 @@ struct target_builtins {
> the register is not used for calling a function. If the machine
> has register windows, this gives only the outbound registers.
> INCOMING_REGNO gives the corresponding inbound register. */
> - machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
> + fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>
> /* For each register that may be used for returning values, this gives
> a mode used to copy the register's value. VOIDmode indicates the
> register is not used for returning values. If the machine has
> register windows, this gives only the outbound registers.
> INCOMING_REGNO gives the corresponding inbound register. */
> - machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
> + fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
> };
>
> extern struct target_builtins default_target_builtins;
> Index: gcc/builtins.c
> ===================================================================
> --- gcc/builtins.c 2017-10-23 11:41:23.140260335 +0100
> +++ gcc/builtins.c 2017-10-23 11:42:57.592545063 +0100
> @@ -1358,7 +1358,6 @@ apply_args_size (void)
> static int size = -1;
> int align;
> unsigned int regno;
> - machine_mode mode;
>
> /* The values computed by this function never change. */
> if (size < 0)
> @@ -1374,7 +1373,7 @@ apply_args_size (void)
> for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> if (FUNCTION_ARG_REGNO_P (regno))
> {
> - mode = targetm.calls.get_raw_arg_mode (regno);
> + fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>
> gcc_assert (mode != VOIDmode);
>
> @@ -1386,7 +1385,7 @@ apply_args_size (void)
> }
> else
> {
> - apply_args_mode[regno] = VOIDmode;
> + apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
> }
> }
> return size;
> @@ -1400,7 +1399,6 @@ apply_result_size (void)
> {
> static int size = -1;
> int align, regno;
> - machine_mode mode;
>
> /* The values computed by this function never change. */
> if (size < 0)
> @@ -1410,7 +1408,7 @@ apply_result_size (void)
> for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
> if (targetm.calls.function_value_regno_p (regno))
> {
> - mode = targetm.calls.get_raw_result_mode (regno);
> + fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>
> gcc_assert (mode != VOIDmode);
>
> @@ -1421,7 +1419,7 @@ apply_result_size (void)
> apply_result_mode[regno] = mode;
> }
> else
> - apply_result_mode[regno] = VOIDmode;
> + apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>
> /* Allow targets that use untyped_call and untyped_return to override
> the size so that machine-specific information can be stored here. */
> @@ -1440,7 +1438,7 @@ apply_result_size (void)
> result_vector (int savep, rtx result)
> {
> int regno, size, align, nelts;
> - machine_mode mode;
> + fixed_size_mode mode;
> rtx reg, mem;
> rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
>
> @@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
> {
> rtx registers, tem;
> int size, align, regno;
> - machine_mode mode;
> + fixed_size_mode mode;
> rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
>
> /* Create a block where the arg-pointer, structure value address,
> @@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
> expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
> {
> int size, align, regno;
> - machine_mode mode;
> + fixed_size_mode mode;
> rtx incoming_args, result, reg, dest, src;
> rtx_call_insn *call_insn;
> rtx old_stack_level = 0;
> @@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
> expand_builtin_return (rtx result)
> {
> int size, align, regno;
> - machine_mode mode;
> + fixed_size_mode mode;
> rtx reg;
> rtx_insn *call_fusage = 0;
>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [13/nn] More is_a <scalar_int_mode>
2017-10-23 11:25 ` [13/nn] More is_a <scalar_int_mode> Richard Sandiford
@ 2017-10-26 12:03 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:03 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> alias.c:find_base_term and find_base_value checked:
>
> if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
>
> but (a) comparing the precision seems more correct, since it's possible
> for modes to have the same memory size as Pmode but fewer bits and
> (b) the functions are called on arbitrary rtl, so there's no guarantee
> that we're handling an integer truncation.
>
> Since there's no point processing truncations of anything other than an
> integer, this patch checks that first.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * alias.c (find_base_value, find_base_term): Only process integer
> truncations. Check the precision rather than the size.
>
> Index: gcc/alias.c
> ===================================================================
> --- gcc/alias.c 2017-10-23 11:41:25.511925516 +0100
> +++ gcc/alias.c 2017-10-23 11:44:27.544693078 +0100
> @@ -1349,6 +1349,7 @@ known_base_value_p (rtx x)
> find_base_value (rtx src)
> {
> unsigned int regno;
> + scalar_int_mode int_mode;
>
> #if defined (FIND_BASE_TERM)
> /* Try machine-dependent ways to find the base term. */
> @@ -1475,7 +1476,8 @@ find_base_value (rtx src)
> address modes depending on the address space. */
> if (!target_default_pointer_address_modes_p ())
> break;
> - if (GET_MODE_SIZE (GET_MODE (src)) < GET_MODE_SIZE (Pmode))
> + if (!is_a <scalar_int_mode> (GET_MODE (src), &int_mode)
> + || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
> break;
> /* Fall through. */
> case HIGH:
> @@ -1876,6 +1878,7 @@ find_base_term (rtx x)
> cselib_val *val;
> struct elt_loc_list *l, *f;
> rtx ret;
> + scalar_int_mode int_mode;
>
> #if defined (FIND_BASE_TERM)
> /* Try machine-dependent ways to find the base term. */
> @@ -1893,7 +1896,8 @@ find_base_term (rtx x)
> address modes depending on the address space. */
> if (!target_default_pointer_address_modes_p ())
> return 0;
> - if (GET_MODE_SIZE (GET_MODE (x)) < GET_MODE_SIZE (Pmode))
> + if (!is_a <scalar_int_mode> (GET_MODE (x), &int_mode)
> + || GET_MODE_PRECISION (int_mode) < GET_MODE_PRECISION (Pmode))
> return 0;
> /* Fall through. */
> case HIGH:
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [14/nn] Add helpers for shift count modes
2017-10-23 11:26 ` [14/nn] Add helpers for shift count modes Richard Sandiford
@ 2017-10-26 12:07 ` Richard Biener
2017-10-26 12:07 ` Richard Biener
2017-10-30 15:03 ` Jeff Law
0 siblings, 2 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:07 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a stub helper routine to provide the mode
> of a scalar shift amount, given the mode of the values
> being shifted.
>
> One long-standing problem has been to decide what this mode
> should be for arbitrary rtxes (as opposed to those directly
> tied to a target pattern). Is it the mode of the shifted
> elements? Is it word_mode? Or maybe QImode? Is it whatever
> the corresponding target pattern says? (In which case what
> should the mode be when the target doesn't have a pattern?)
>
> For now the patch picks word_mode, which should be safe on
> all targets but could perhaps become suboptimal if the helper
> routine is used more often than it is in this patch. As it
> stands the patch does not change the generated code.
>
> The patch also adds a helper function that constructs rtxes
> for constant shift amounts, again given the mode of the value
> being shifted. As well as helping with the SVE patches, this
> is one step towards allowing CONST_INTs to have a real mode.
I think gen_shift_amount_mode is flawed and while encapsulating
constant shift amount RTX generation into a gen_int_shift_amount
looks good to me I'd rather have that ??? in this function (and
I'd use the mode of the RTX shifted, not word_mode...).
In the end it's up to insn recognizing to convert the op to the
expected mode and for generic RTL it's us that should decide
on the mode -- on GENERIC the shift amount has to be an
integer so why not simply use a mode that is large enough to
make the constant fit?
Just throwing in some comments here, RTL isn't my primary
expertise.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * target.h (get_shift_amount_mode): New function.
> * emit-rtl.h (gen_int_shift_amount): Declare.
> * emit-rtl.c (gen_int_shift_amount): New function.
> * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
> instead of GEN_INT.
> * calls.c (shift_return_value): Likewise.
> * cse.c (fold_rtx): Likewise.
> * dse.c (find_shift_sequence): Likewise.
> * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
> (expand_shift, expand_smod_pow2): Likewise.
> * lower-subreg.c (shift_cost): Likewise.
> * simplify-rtx.c (simplify_unary_operation_1): Likewise.
> (simplify_binary_operation_1): Likewise.
> * combine.c (try_combine, find_split_point, force_int_to_mode)
> (simplify_shift_const_1, simplify_shift_const): Likewise.
> (change_zero_ext): Likewise. Use simplify_gen_binary.
> * optabs.c (expand_superword_shift, expand_doubleword_mult)
> (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
> (expand_binop): Likewise. Use get_shift_amount_mode instead
> of word_mode as the mode of a CONST_INT shift amount.
> (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
> Use gen_int_shift_amount instead of GEN_INT.
> (expand_vec_perm): Update caller accordingly. Use
> gen_int_shift_amount instead of GEN_INT.
>
> Index: gcc/target.h
> ===================================================================
> --- gcc/target.h 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/target.h 2017-10-23 11:47:11.277288162 +0100
> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>
> extern struct gcc_target targetm;
>
> +/* Return the mode that should be used to hold a scalar shift amount
> + when shifting values of the given mode. */
> +/* ??? This could in principle be generated automatically from the .md
> + shift patterns, but for now word_mode should be universally OK. */
> +
> +inline scalar_int_mode
> +get_shift_amount_mode (machine_mode)
> +{
> + return word_mode;
> +}
> +
> #ifdef GCC_TM_H
>
> #ifndef CUMULATIVE_ARGS_MAGIC
> Index: gcc/emit-rtl.h
> ===================================================================
> --- gcc/emit-rtl.h 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/emit-rtl.h 2017-10-23 11:47:11.274393237 +0100
> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
> extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
> extern void adjust_reg_mode (rtx, machine_mode);
> extern int mem_expr_equal_p (const_tree, const_tree);
> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>
> extern bool need_atomic_barrier_p (enum memmodel, bool);
>
> Index: gcc/emit-rtl.c
> ===================================================================
> --- gcc/emit-rtl.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/emit-rtl.c 2017-10-23 11:47:11.273428262 +0100
> @@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
> }
> }
>
> +/* Return a constant shift amount for shifting a value of mode MODE
> + by VALUE bits. */
> +
> +rtx
> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
> +{
> + return gen_int_mode (value, get_shift_amount_mode (mode));
> +}
> +
> /* Initialize fields of rtl_data related to stack alignment. */
>
> void
> Index: gcc/asan.c
> ===================================================================
> --- gcc/asan.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/asan.c 2017-10-23 11:47:11.270533336 +0100
> @@ -1388,7 +1388,7 @@ asan_emit_stack_protection (rtx base, rt
> TREE_ASM_WRITTEN (id) = 1;
> emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
> shadow_base = expand_binop (Pmode, lshr_optab, base,
> - GEN_INT (ASAN_SHADOW_SHIFT),
> + gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
> NULL_RTX, 1, OPTAB_DIRECT);
> shadow_base
> = plus_constant (Pmode, shadow_base,
> Index: gcc/calls.c
> ===================================================================
> --- gcc/calls.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/calls.c 2017-10-23 11:47:11.270533336 +0100
> @@ -2749,15 +2749,17 @@ shift_return_value (machine_mode mode, b
> HOST_WIDE_INT shift;
>
> gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
> - shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
> + machine_mode value_mode = GET_MODE (value);
> + shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
> if (shift == 0)
> return false;
>
> /* Use ashr rather than lshr for right shifts. This is for the benefit
> of the MIPS port, which requires SImode values to be sign-extended
> when stored in 64-bit registers. */
> - if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
> - value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
> + if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
> + value, gen_int_shift_amount (value_mode, shift),
> + value, 1, OPTAB_WIDEN))
> gcc_unreachable ();
> return true;
> }
> Index: gcc/cse.c
> ===================================================================
> --- gcc/cse.c 2017-10-23 11:47:03.707058235 +0100
> +++ gcc/cse.c 2017-10-23 11:47:11.273428262 +0100
> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
> || INTVAL (const_arg1) < 0))
> {
> if (SHIFT_COUNT_TRUNCATED)
> - canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
> - & (GET_MODE_UNIT_BITSIZE (mode)
> - - 1));
> + canon_const_arg1 = gen_int_shift_amount
> + (mode, (INTVAL (const_arg1)
> + & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
> else
> break;
> }
> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
> || INTVAL (inner_const) < 0))
> {
> if (SHIFT_COUNT_TRUNCATED)
> - inner_const = GEN_INT (INTVAL (inner_const)
> - & (GET_MODE_UNIT_BITSIZE (mode)
> - - 1));
> + inner_const = gen_int_shift_amount
> + (mode, (INTVAL (inner_const)
> + & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
> else
> break;
> }
> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
> /* As an exception, we can turn an ASHIFTRT of this
> form into a shift of the number of bits - 1. */
> if (code == ASHIFTRT)
> - new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
> + new_const = gen_int_shift_amount
> + (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
> else if (!side_effects_p (XEXP (y, 0)))
> return CONST0_RTX (mode);
> else
> Index: gcc/dse.c
> ===================================================================
> --- gcc/dse.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/dse.c 2017-10-23 11:47:11.273428262 +0100
> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
> store_mode, byte);
> if (ret && CONSTANT_P (ret))
> {
> + rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
> ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
> - ret, GEN_INT (shift));
> + ret, shift_rtx);
> if (ret && CONSTANT_P (ret))
> {
> byte = subreg_lowpart_offset (read_mode, new_mode);
> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
> of one dsp where the cost of these two was not the same. But
> this really is a rare case anyway. */
> target = expand_binop (new_mode, lshr_optab, new_reg,
> - GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
> + gen_int_shift_amount (new_mode, shift),
> + new_reg, 1, OPTAB_DIRECT);
>
> shift_seq = get_insns ();
> end_sequence ();
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/expmed.c 2017-10-23 11:47:11.274393237 +0100
> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
> PUT_MODE (all->zext, wider_mode);
> PUT_MODE (all->wide_mult, wider_mode);
> PUT_MODE (all->wide_lshr, wider_mode);
> - XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
> + XEXP (all->wide_lshr, 1)
> + = gen_int_shift_amount (wider_mode, mode_bitsize);
>
> set_mul_widen_cost (speed, wider_mode,
> set_src_cost (all->wide_mult, wider_mode, speed));
> @@ -908,12 +909,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
> to make sure that for big-endian machines the higher order
> bits are used. */
> if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
> - value_word = simplify_expand_binop (word_mode, lshr_optab,
> - value_word,
> - GEN_INT (BITS_PER_WORD
> - - new_bitsize),
> - NULL_RTX, true,
> - OPTAB_LIB_WIDEN);
> + {
> + int shift = BITS_PER_WORD - new_bitsize;
> + rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
> + value_word = simplify_expand_binop (word_mode, lshr_optab,
> + value_word, shift_rtx,
> + NULL_RTX, true,
> + OPTAB_LIB_WIDEN);
> + }
>
> if (!store_bit_field_1 (op0, new_bitsize,
> bitnum + bit_offset,
> @@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
> if (CONST_INT_P (op1)
> && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
> (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
> - op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
> - % GET_MODE_BITSIZE (scalar_mode));
> + op1 = gen_int_shift_amount (mode,
> + (unsigned HOST_WIDE_INT) INTVAL (op1)
> + % GET_MODE_BITSIZE (scalar_mode));
> else if (GET_CODE (op1) == SUBREG
> && subreg_lowpart_p (op1)
> && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
> @@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
> && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
> GET_MODE_BITSIZE (scalar_mode) - 1))
> {
> - op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
> + op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
> + - INTVAL (op1)));
> left = !left;
> code = left ? LROTATE_EXPR : RROTATE_EXPR;
> }
> @@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
> if (op1 == const0_rtx)
> return shifted;
> else if (CONST_INT_P (op1))
> - other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
> - - INTVAL (op1));
> + other_amount = gen_int_shift_amount
> + (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
> else
> {
> other_amount
> @@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
> expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
> int amount, rtx target, int unsignedp)
> {
> - return expand_shift_1 (code, mode,
> - shifted, GEN_INT (amount), target, unsignedp);
> + return expand_shift_1 (code, mode, shifted,
> + gen_int_shift_amount (mode, amount),
> + target, unsignedp);
> }
>
> /* Likewise, but return 0 if that cannot be done. */
> @@ -3855,7 +3861,7 @@ expand_smod_pow2 (scalar_int_mode mode,
> {
> HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
> signmask = force_reg (mode, signmask);
> - shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
> + shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>
> /* Use the rtx_cost of a LSHIFTRT instruction to determine
> which instruction sequence to use. If logical right shifts
> Index: gcc/lower-subreg.c
> ===================================================================
> --- gcc/lower-subreg.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/lower-subreg.c 2017-10-23 11:47:11.274393237 +0100
> @@ -129,7 +129,7 @@ shift_cost (bool speed_p, struct cost_rt
> PUT_CODE (rtxes->shift, code);
> PUT_MODE (rtxes->shift, mode);
> PUT_MODE (rtxes->source, mode);
> - XEXP (rtxes->shift, 1) = GEN_INT (op1);
> + XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
> return set_src_cost (rtxes->shift, mode, speed_p);
> }
>
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/simplify-rtx.c 2017-10-23 11:47:11.277288162 +0100
> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
> if (STORE_FLAG_VALUE == 1)
> {
> temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
> - GEN_INT (isize - 1));
> + gen_int_shift_amount (inner,
> + isize - 1));
> if (int_mode == inner)
> return temp;
> if (GET_MODE_PRECISION (int_mode) > isize)
> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
> else if (STORE_FLAG_VALUE == -1)
> {
> temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
> - GEN_INT (isize - 1));
> + gen_int_shift_amount (inner,
> + isize - 1));
> if (int_mode == inner)
> return temp;
> if (GET_MODE_PRECISION (int_mode) > isize)
> @@ -2679,7 +2681,8 @@ simplify_binary_operation_1 (enum rtx_co
> {
> val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
> if (val >= 0)
> - return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
> + return simplify_gen_binary (ASHIFT, mode, op0,
> + gen_int_shift_amount (mode, val));
> }
>
> /* x*2 is x+x and x*(-1) is -x */
> @@ -3303,7 +3306,8 @@ simplify_binary_operation_1 (enum rtx_co
> /* Convert divide by power of two into shift. */
> if (CONST_INT_P (trueop1)
> && (val = exact_log2 (UINTVAL (trueop1))) > 0)
> - return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
> + return simplify_gen_binary (LSHIFTRT, mode, op0,
> + gen_int_shift_amount (mode, val));
> break;
>
> case DIV:
> @@ -3423,10 +3427,12 @@ simplify_binary_operation_1 (enum rtx_co
> && IN_RANGE (INTVAL (trueop1),
> GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
> GET_MODE_UNIT_PRECISION (mode) - 1))
> - return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
> - mode, op0,
> - GEN_INT (GET_MODE_UNIT_PRECISION (mode)
> - - INTVAL (trueop1)));
> + {
> + int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
> + rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
> + return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
> + mode, op0, new_amount_rtx);
> + }
> #endif
> /* FALLTHRU */
> case ASHIFTRT:
> @@ -3466,8 +3472,8 @@ simplify_binary_operation_1 (enum rtx_co
> == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
> && subreg_lowpart_p (op0))
> {
> - rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
> - + INTVAL (op1));
> + rtx tmp = gen_int_shift_amount
> + (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
> tmp = simplify_gen_binary (code, inner_mode,
> XEXP (SUBREG_REG (op0), 0),
> tmp);
> @@ -3478,7 +3484,8 @@ simplify_binary_operation_1 (enum rtx_co
> {
> val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
> if (val != INTVAL (op1))
> - return simplify_gen_binary (code, mode, op0, GEN_INT (val));
> + return simplify_gen_binary (code, mode, op0,
> + gen_int_shift_amount (mode, val));
> }
> break;
>
> Index: gcc/combine.c
> ===================================================================
> --- gcc/combine.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/combine.c 2017-10-23 11:47:11.272463287 +0100
> @@ -3773,8 +3773,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
> && INTVAL (XEXP (*split, 1)) > 0
> && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
> {
> + rtx i_rtx = gen_int_shift_amount (split_mode, i);
> SUBST (*split, gen_rtx_ASHIFT (split_mode,
> - XEXP (*split, 0), GEN_INT (i)));
> + XEXP (*split, 0), i_rtx));
> /* Update split_code because we may not have a multiply
> anymore. */
> split_code = GET_CODE (*split);
> @@ -3788,8 +3789,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
> && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
> {
> rtx nsplit = XEXP (*split, 0);
> + rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
> SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
> - XEXP (nsplit, 0), GEN_INT (i)));
> + XEXP (nsplit, 0),
> + i_rtx));
> /* Update split_code because we may not have a multiply
> anymore. */
> split_code = GET_CODE (*split);
> @@ -5057,12 +5060,12 @@ find_split_point (rtx *loc, rtx_insn *in
> GET_MODE (XEXP (SET_SRC (x), 0))))))
> {
> machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
> -
> + rtx pos_rtx = gen_int_shift_amount (mode, pos);
> SUBST (SET_SRC (x),
> gen_rtx_NEG (mode,
> gen_rtx_LSHIFTRT (mode,
> XEXP (SET_SRC (x), 0),
> - GEN_INT (pos))));
> + pos_rtx)));
>
> split = find_split_point (&SET_SRC (x), insn, true);
> if (split && split != &SET_SRC (x))
> @@ -5120,11 +5123,11 @@ find_split_point (rtx *loc, rtx_insn *in
> {
> unsigned HOST_WIDE_INT mask
> = (HOST_WIDE_INT_1U << len) - 1;
> + rtx pos_rtx = gen_int_shift_amount (mode, pos);
> SUBST (SET_SRC (x),
> gen_rtx_AND (mode,
> gen_rtx_LSHIFTRT
> - (mode, gen_lowpart (mode, inner),
> - GEN_INT (pos)),
> + (mode, gen_lowpart (mode, inner), pos_rtx),
> gen_int_mode (mask, mode)));
>
> split = find_split_point (&SET_SRC (x), insn, true);
> @@ -5133,14 +5136,15 @@ find_split_point (rtx *loc, rtx_insn *in
> }
> else
> {
> + int left_bits = GET_MODE_PRECISION (mode) - len - pos;
> + int right_bits = GET_MODE_PRECISION (mode) - len;
> SUBST (SET_SRC (x),
> gen_rtx_fmt_ee
> (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
> gen_rtx_ASHIFT (mode,
> gen_lowpart (mode, inner),
> - GEN_INT (GET_MODE_PRECISION (mode)
> - - len - pos)),
> - GEN_INT (GET_MODE_PRECISION (mode) - len)));
> + gen_int_shift_amount (mode, left_bits)),
> + gen_int_shift_amount (mode, right_bits)));
>
> split = find_split_point (&SET_SRC (x), insn, true);
> if (split && split != &SET_SRC (x))
> @@ -8915,10 +8919,11 @@ force_int_to_mode (rtx x, scalar_int_mod
> /* Must be more sign bit copies than the mask needs. */
> && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
> >= exact_log2 (mask + 1)))
> - x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
> - GEN_INT (GET_MODE_PRECISION (xmode)
> - - exact_log2 (mask + 1)));
> -
> + {
> + int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
> + x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
> + gen_int_shift_amount (xmode, nbits));
> + }
> goto shiftrt;
>
> case ASHIFTRT:
> @@ -10415,7 +10420,7 @@ simplify_shift_const_1 (enum rtx_code co
> {
> enum rtx_code orig_code = code;
> rtx orig_varop = varop;
> - int count;
> + int count, log2;
> machine_mode mode = result_mode;
> machine_mode shift_mode;
> scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
> @@ -10618,13 +10623,11 @@ simplify_shift_const_1 (enum rtx_code co
> is cheaper. But it is still better on those machines to
> merge two shifts into one. */
> if (CONST_INT_P (XEXP (varop, 1))
> - && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
> + && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
> {
> - varop
> - = simplify_gen_binary (ASHIFT, GET_MODE (varop),
> - XEXP (varop, 0),
> - GEN_INT (exact_log2 (
> - UINTVAL (XEXP (varop, 1)))));
> + rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
> + varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
> + XEXP (varop, 0), log2_rtx);
> continue;
> }
> break;
> @@ -10632,13 +10635,11 @@ simplify_shift_const_1 (enum rtx_code co
> case UDIV:
> /* Similar, for when divides are cheaper. */
> if (CONST_INT_P (XEXP (varop, 1))
> - && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
> + && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
> {
> - varop
> - = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
> - XEXP (varop, 0),
> - GEN_INT (exact_log2 (
> - UINTVAL (XEXP (varop, 1)))));
> + rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
> + varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
> + XEXP (varop, 0), log2_rtx);
> continue;
> }
> break;
> @@ -10773,10 +10774,10 @@ simplify_shift_const_1 (enum rtx_code co
>
> mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
> int_result_mode);
> -
> + rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> mask_rtx
> = simplify_const_binary_operation (code, int_result_mode,
> - mask_rtx, GEN_INT (count));
> + mask_rtx, count_rtx);
>
> /* Give up if we can't compute an outer operation to use. */
> if (mask_rtx == 0
> @@ -10832,9 +10833,10 @@ simplify_shift_const_1 (enum rtx_code co
> if (code == ASHIFTRT && int_mode != int_result_mode)
> break;
>
> + rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
> rtx new_rtx = simplify_const_binary_operation (code, int_mode,
> XEXP (varop, 0),
> - GEN_INT (count));
> + count_rtx);
> varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
> count = 0;
> continue;
> @@ -10900,7 +10902,7 @@ simplify_shift_const_1 (enum rtx_code co
> && (new_rtx = simplify_const_binary_operation
> (code, int_result_mode,
> gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> - GEN_INT (count))) != 0
> + gen_int_shift_amount (int_result_mode, count))) != 0
> && CONST_INT_P (new_rtx)
> && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
> INTVAL (new_rtx), int_result_mode,
> @@ -11043,7 +11045,7 @@ simplify_shift_const_1 (enum rtx_code co
> && (new_rtx = simplify_const_binary_operation
> (ASHIFT, int_result_mode,
> gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> - GEN_INT (count))) != 0
> + gen_int_shift_amount (int_result_mode, count))) != 0
> && CONST_INT_P (new_rtx)
> && merge_outer_ops (&outer_op, &outer_const, PLUS,
> INTVAL (new_rtx), int_result_mode,
> @@ -11064,7 +11066,7 @@ simplify_shift_const_1 (enum rtx_code co
> && (new_rtx = simplify_const_binary_operation
> (code, int_result_mode,
> gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
> - GEN_INT (count))) != 0
> + gen_int_shift_amount (int_result_mode, count))) != 0
> && CONST_INT_P (new_rtx)
> && merge_outer_ops (&outer_op, &outer_const, XOR,
> INTVAL (new_rtx), int_result_mode,
> @@ -11119,12 +11121,12 @@ simplify_shift_const_1 (enum rtx_code co
> - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
> {
> rtx varop_inner = XEXP (varop, 0);
> -
> - varop_inner
> - = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
> - XEXP (varop_inner, 0),
> - GEN_INT
> - (count + INTVAL (XEXP (varop_inner, 1))));
> + int new_count = count + INTVAL (XEXP (varop_inner, 1));
> + rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
> + new_count);
> + varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
> + XEXP (varop_inner, 0),
> + new_count_rtx);
> varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
> count = 0;
> continue;
> @@ -11176,7 +11178,8 @@ simplify_shift_const_1 (enum rtx_code co
> x = NULL_RTX;
>
> if (x == NULL_RTX)
> - x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
> + x = simplify_gen_binary (code, shift_mode, varop,
> + gen_int_shift_amount (shift_mode, count));
>
> /* If we were doing an LSHIFTRT in a wider mode than it was originally,
> turn off all the bits that the shift would have turned off. */
> @@ -11238,7 +11241,8 @@ simplify_shift_const (rtx x, enum rtx_co
> return tem;
>
> if (!x)
> - x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
> + x = simplify_gen_binary (code, GET_MODE (varop), varop,
> + gen_int_shift_amount (GET_MODE (varop), count));
> if (GET_MODE (x) != result_mode)
> x = gen_lowpart (result_mode, x);
> return x;
> @@ -11429,8 +11433,9 @@ change_zero_ext (rtx pat)
> if (BITS_BIG_ENDIAN)
> start = GET_MODE_PRECISION (inner_mode) - size - start;
>
> - if (start)
> - x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
> + if (start != 0)
> + x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
> + gen_int_shift_amount (inner_mode, start));
> else
> x = XEXP (x, 0);
> if (mode != inner_mode)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-10-23 11:47:06.643477568 +0100
> +++ gcc/optabs.c 2017-10-23 11:47:11.276323187 +0100
> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
> if (binoptab != ashr_optab)
> emit_move_insn (outof_target, CONST0_RTX (word_mode));
> else
> - if (!force_expand_binop (word_mode, binoptab,
> - outof_input, GEN_INT (BITS_PER_WORD - 1),
> + if (!force_expand_binop (word_mode, binoptab, outof_input,
> + gen_int_shift_amount (word_mode,
> + BITS_PER_WORD - 1),
> outof_target, unsignedp, methods))
> return false;
> }
> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
> {
> int low = (WORDS_BIG_ENDIAN ? 1 : 0);
> int high = (WORDS_BIG_ENDIAN ? 0 : 1);
> - rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
> + rtx wordm1 = (umulp ? NULL_RTX
> + : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
> rtx product, adjust, product_high, temp;
>
> rtx op0_high = operand_subword_force (op0, high, mode);
> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
> unsigned int bits = GET_MODE_PRECISION (int_mode);
>
> if (CONST_INT_P (op1))
> - newop1 = GEN_INT (bits - INTVAL (op1));
> + newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
> else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
> newop1 = negate_rtx (GET_MODE (op1), op1);
> else
> @@ -1399,11 +1401,11 @@ expand_binop (machine_mode mode, optab b
> shift_mask = targetm.shift_truncation_mask (word_mode);
> op1_mode = (GET_MODE (op1) != VOIDmode
> ? as_a <scalar_int_mode> (GET_MODE (op1))
> - : word_mode);
> + : get_shift_amount_mode (word_mode));
>
> /* Apply the truncation to constant shifts. */
> if (double_shift_mask > 0 && CONST_INT_P (op1))
> - op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
> + op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>
> if (op1 == CONST0_RTX (op1_mode))
> return op0;
> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
> else
> {
> rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
> - rtx first_shift_count, second_shift_count;
> + HOST_WIDE_INT first_shift_count, second_shift_count;
> optab reverse_unsigned_shift, unsigned_shift;
>
> reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>
> if (shift_count > BITS_PER_WORD)
> {
> - first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
> - second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
> + first_shift_count = shift_count - BITS_PER_WORD;
> + second_shift_count = 2 * BITS_PER_WORD - shift_count;
> }
> else
> {
> - first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
> - second_shift_count = GEN_INT (shift_count);
> + first_shift_count = BITS_PER_WORD - shift_count;
> + second_shift_count = shift_count;
> }
> + rtx first_shift_count_rtx
> + = gen_int_shift_amount (word_mode, first_shift_count);
> + rtx second_shift_count_rtx
> + = gen_int_shift_amount (word_mode, second_shift_count);
>
> into_temp1 = expand_binop (word_mode, unsigned_shift,
> - outof_input, first_shift_count,
> + outof_input, first_shift_count_rtx,
> NULL_RTX, unsignedp, next_methods);
> into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
> - into_input, second_shift_count,
> + into_input, second_shift_count_rtx,
> NULL_RTX, unsignedp, next_methods);
>
> if (into_temp1 != 0 && into_temp2 != 0)
> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
> emit_move_insn (into_target, inter);
>
> outof_temp1 = expand_binop (word_mode, unsigned_shift,
> - into_input, first_shift_count,
> + into_input, first_shift_count_rtx,
> NULL_RTX, unsignedp, next_methods);
> outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
> - outof_input, second_shift_count,
> + outof_input, second_shift_count_rtx,
> NULL_RTX, unsignedp, next_methods);
>
> if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>
> if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
> {
> - temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
> - unsignedp, OPTAB_DIRECT);
> + temp = expand_binop (mode, rotl_optab, op0,
> + gen_int_shift_amount (mode, 8),
> + target, unsignedp, OPTAB_DIRECT);
> if (temp)
> return temp;
> }
>
> if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
> {
> - temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
> - unsignedp, OPTAB_DIRECT);
> + temp = expand_binop (mode, rotr_optab, op0,
> + gen_int_shift_amount (mode, 8),
> + target, unsignedp, OPTAB_DIRECT);
> if (temp)
> return temp;
> }
>
> last = get_last_insn ();
>
> - temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
> + temp1 = expand_binop (mode, ashl_optab, op0,
> + gen_int_shift_amount (mode, 8), NULL_RTX,
> unsignedp, OPTAB_WIDEN);
> - temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
> + temp2 = expand_binop (mode, lshr_optab, op0,
> + gen_int_shift_amount (mode, 8), NULL_RTX,
> unsignedp, OPTAB_WIDEN);
> if (temp1 && temp2)
> {
> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
> }
>
> /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
> - vec_perm operand, assuming the second operand is a constant vector of zeroes.
> - Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
> - shift. */
> + vec_perm operand (which has mode OP0_MODE), assuming the second
> + operand is a constant vector of zeroes. Return the shift distance in
> + bits if so, or NULL_RTX if the vec_perm is not a shift. */
> static rtx
> -shift_amt_for_vec_perm_mask (rtx sel)
> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
> {
> unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
> unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
> return NULL_RTX;
> }
>
> - return GEN_INT (first * bitsize);
> + return gen_int_shift_amount (op0_mode, first * bitsize);
> }
>
> /* A subroutine of expand_vec_perm for expanding one vec_perm insn. */
> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
> && (shift_code != CODE_FOR_nothing
> || shift_code_qi != CODE_FOR_nothing))
> {
> - shift_amt = shift_amt_for_vec_perm_mask (sel);
> + shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
> if (shift_amt)
> {
> struct expand_operand ops[3];
> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
> NULL, 0, OPTAB_DIRECT);
> else
> sel = expand_simple_binop (selmode, ASHIFT, sel,
> - GEN_INT (exact_log2 (u)),
> + gen_int_shift_amount (selmode,
> + exact_log2 (u)),
> NULL, 0, OPTAB_DIRECT);
> gcc_assert (sel != NULL);
>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [14/nn] Add helpers for shift count modes
2017-10-26 12:07 ` Richard Biener
@ 2017-10-26 12:07 ` Richard Biener
2017-11-20 21:04 ` Richard Sandiford
2017-10-30 15:03 ` Jeff Law
1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:07 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern). Is it the mode of the shifted
>> elements? Is it word_mode? Or maybe QImode? Is it whatever
>> the corresponding target pattern says? (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch. As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted. As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
>
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
>
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
>
> Just throwing in some comments here, RTL isn't my primary
> expertise.
To add a little bit - shift amounts is maybe the only(?) place
where a modeless CONST_INT makes sense! So "fixing"
that first sounds backwards.
Richard.
> Richard.
>
>>
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * target.h (get_shift_amount_mode): New function.
>> * emit-rtl.h (gen_int_shift_amount): Declare.
>> * emit-rtl.c (gen_int_shift_amount): New function.
>> * asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
>> instead of GEN_INT.
>> * calls.c (shift_return_value): Likewise.
>> * cse.c (fold_rtx): Likewise.
>> * dse.c (find_shift_sequence): Likewise.
>> * expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
>> (expand_shift, expand_smod_pow2): Likewise.
>> * lower-subreg.c (shift_cost): Likewise.
>> * simplify-rtx.c (simplify_unary_operation_1): Likewise.
>> (simplify_binary_operation_1): Likewise.
>> * combine.c (try_combine, find_split_point, force_int_to_mode)
>> (simplify_shift_const_1, simplify_shift_const): Likewise.
>> (change_zero_ext): Likewise. Use simplify_gen_binary.
>> * optabs.c (expand_superword_shift, expand_doubleword_mult)
>> (expand_unop): Use gen_int_shift_amount instead of GEN_INT.
>> (expand_binop): Likewise. Use get_shift_amount_mode instead
>> of word_mode as the mode of a CONST_INT shift amount.
>> (shift_amt_for_vec_perm_mask): Add a machine_mode argument.
>> Use gen_int_shift_amount instead of GEN_INT.
>> (expand_vec_perm): Update caller accordingly. Use
>> gen_int_shift_amount instead of GEN_INT.
>>
>> Index: gcc/target.h
>> ===================================================================
>> --- gcc/target.h 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/target.h 2017-10-23 11:47:11.277288162 +0100
>> @@ -209,6 +209,17 @@ #define HOOKSTRUCT(FRAGMENT) FRAGMENT
>>
>> extern struct gcc_target targetm;
>>
>> +/* Return the mode that should be used to hold a scalar shift amount
>> + when shifting values of the given mode. */
>> +/* ??? This could in principle be generated automatically from the .md
>> + shift patterns, but for now word_mode should be universally OK. */
>> +
>> +inline scalar_int_mode
>> +get_shift_amount_mode (machine_mode)
>> +{
>> + return word_mode;
>> +}
>> +
>> #ifdef GCC_TM_H
>>
>> #ifndef CUMULATIVE_ARGS_MAGIC
>> Index: gcc/emit-rtl.h
>> ===================================================================
>> --- gcc/emit-rtl.h 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/emit-rtl.h 2017-10-23 11:47:11.274393237 +0100
>> @@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
>> extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
>> extern void adjust_reg_mode (rtx, machine_mode);
>> extern int mem_expr_equal_p (const_tree, const_tree);
>> +extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
>>
>> extern bool need_atomic_barrier_p (enum memmodel, bool);
>>
>> Index: gcc/emit-rtl.c
>> ===================================================================
>> --- gcc/emit-rtl.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/emit-rtl.c 2017-10-23 11:47:11.273428262 +0100
>> @@ -6478,6 +6478,15 @@ need_atomic_barrier_p (enum memmodel mod
>> }
>> }
>>
>> +/* Return a constant shift amount for shifting a value of mode MODE
>> + by VALUE bits. */
>> +
>> +rtx
>> +gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
>> +{
>> + return gen_int_mode (value, get_shift_amount_mode (mode));
>> +}
>> +
>> /* Initialize fields of rtl_data related to stack alignment. */
>>
>> void
>> Index: gcc/asan.c
>> ===================================================================
>> --- gcc/asan.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/asan.c 2017-10-23 11:47:11.270533336 +0100
>> @@ -1388,7 +1388,7 @@ asan_emit_stack_protection (rtx base, rt
>> TREE_ASM_WRITTEN (id) = 1;
>> emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
>> shadow_base = expand_binop (Pmode, lshr_optab, base,
>> - GEN_INT (ASAN_SHADOW_SHIFT),
>> + gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
>> NULL_RTX, 1, OPTAB_DIRECT);
>> shadow_base
>> = plus_constant (Pmode, shadow_base,
>> Index: gcc/calls.c
>> ===================================================================
>> --- gcc/calls.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/calls.c 2017-10-23 11:47:11.270533336 +0100
>> @@ -2749,15 +2749,17 @@ shift_return_value (machine_mode mode, b
>> HOST_WIDE_INT shift;
>>
>> gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
>> - shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
>> + machine_mode value_mode = GET_MODE (value);
>> + shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
>> if (shift == 0)
>> return false;
>>
>> /* Use ashr rather than lshr for right shifts. This is for the benefit
>> of the MIPS port, which requires SImode values to be sign-extended
>> when stored in 64-bit registers. */
>> - if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
>> - value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
>> + if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
>> + value, gen_int_shift_amount (value_mode, shift),
>> + value, 1, OPTAB_WIDEN))
>> gcc_unreachable ();
>> return true;
>> }
>> Index: gcc/cse.c
>> ===================================================================
>> --- gcc/cse.c 2017-10-23 11:47:03.707058235 +0100
>> +++ gcc/cse.c 2017-10-23 11:47:11.273428262 +0100
>> @@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>> || INTVAL (const_arg1) < 0))
>> {
>> if (SHIFT_COUNT_TRUNCATED)
>> - canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
>> - & (GET_MODE_UNIT_BITSIZE (mode)
>> - - 1));
>> + canon_const_arg1 = gen_int_shift_amount
>> + (mode, (INTVAL (const_arg1)
>> + & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>> else
>> break;
>> }
>> @@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
>> || INTVAL (inner_const) < 0))
>> {
>> if (SHIFT_COUNT_TRUNCATED)
>> - inner_const = GEN_INT (INTVAL (inner_const)
>> - & (GET_MODE_UNIT_BITSIZE (mode)
>> - - 1));
>> + inner_const = gen_int_shift_amount
>> + (mode, (INTVAL (inner_const)
>> + & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
>> else
>> break;
>> }
>> @@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
>> /* As an exception, we can turn an ASHIFTRT of this
>> form into a shift of the number of bits - 1. */
>> if (code == ASHIFTRT)
>> - new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
>> + new_const = gen_int_shift_amount
>> + (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
>> else if (!side_effects_p (XEXP (y, 0)))
>> return CONST0_RTX (mode);
>> else
>> Index: gcc/dse.c
>> ===================================================================
>> --- gcc/dse.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/dse.c 2017-10-23 11:47:11.273428262 +0100
>> @@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
>> store_mode, byte);
>> if (ret && CONSTANT_P (ret))
>> {
>> + rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
>> ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
>> - ret, GEN_INT (shift));
>> + ret, shift_rtx);
>> if (ret && CONSTANT_P (ret))
>> {
>> byte = subreg_lowpart_offset (read_mode, new_mode);
>> @@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
>> of one dsp where the cost of these two was not the same. But
>> this really is a rare case anyway. */
>> target = expand_binop (new_mode, lshr_optab, new_reg,
>> - GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
>> + gen_int_shift_amount (new_mode, shift),
>> + new_reg, 1, OPTAB_DIRECT);
>>
>> shift_seq = get_insns ();
>> end_sequence ();
>> Index: gcc/expmed.c
>> ===================================================================
>> --- gcc/expmed.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/expmed.c 2017-10-23 11:47:11.274393237 +0100
>> @@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
>> PUT_MODE (all->zext, wider_mode);
>> PUT_MODE (all->wide_mult, wider_mode);
>> PUT_MODE (all->wide_lshr, wider_mode);
>> - XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
>> + XEXP (all->wide_lshr, 1)
>> + = gen_int_shift_amount (wider_mode, mode_bitsize);
>>
>> set_mul_widen_cost (speed, wider_mode,
>> set_src_cost (all->wide_mult, wider_mode, speed));
>> @@ -908,12 +909,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
>> to make sure that for big-endian machines the higher order
>> bits are used. */
>> if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
>> - value_word = simplify_expand_binop (word_mode, lshr_optab,
>> - value_word,
>> - GEN_INT (BITS_PER_WORD
>> - - new_bitsize),
>> - NULL_RTX, true,
>> - OPTAB_LIB_WIDEN);
>> + {
>> + int shift = BITS_PER_WORD - new_bitsize;
>> + rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
>> + value_word = simplify_expand_binop (word_mode, lshr_optab,
>> + value_word, shift_rtx,
>> + NULL_RTX, true,
>> + OPTAB_LIB_WIDEN);
>> + }
>>
>> if (!store_bit_field_1 (op0, new_bitsize,
>> bitnum + bit_offset,
>> @@ -2366,8 +2369,9 @@ expand_shift_1 (enum tree_code code, mac
>> if (CONST_INT_P (op1)
>> && ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
>> (unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
>> - op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
>> - % GET_MODE_BITSIZE (scalar_mode));
>> + op1 = gen_int_shift_amount (mode,
>> + (unsigned HOST_WIDE_INT) INTVAL (op1)
>> + % GET_MODE_BITSIZE (scalar_mode));
>> else if (GET_CODE (op1) == SUBREG
>> && subreg_lowpart_p (op1)
>> && SCALAR_INT_MODE_P (GET_MODE (SUBREG_REG (op1)))
>> @@ -2384,7 +2388,8 @@ expand_shift_1 (enum tree_code code, mac
>> && IN_RANGE (INTVAL (op1), GET_MODE_BITSIZE (scalar_mode) / 2 + left,
>> GET_MODE_BITSIZE (scalar_mode) - 1))
>> {
>> - op1 = GEN_INT (GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>> + op1 = gen_int_shift_amount (mode, (GET_MODE_BITSIZE (scalar_mode)
>> + - INTVAL (op1)));
>> left = !left;
>> code = left ? LROTATE_EXPR : RROTATE_EXPR;
>> }
>> @@ -2464,8 +2469,8 @@ expand_shift_1 (enum tree_code code, mac
>> if (op1 == const0_rtx)
>> return shifted;
>> else if (CONST_INT_P (op1))
>> - other_amount = GEN_INT (GET_MODE_BITSIZE (scalar_mode)
>> - - INTVAL (op1));
>> + other_amount = gen_int_shift_amount
>> + (mode, GET_MODE_BITSIZE (scalar_mode) - INTVAL (op1));
>> else
>> {
>> other_amount
>> @@ -2538,8 +2543,9 @@ expand_shift_1 (enum tree_code code, mac
>> expand_shift (enum tree_code code, machine_mode mode, rtx shifted,
>> int amount, rtx target, int unsignedp)
>> {
>> - return expand_shift_1 (code, mode,
>> - shifted, GEN_INT (amount), target, unsignedp);
>> + return expand_shift_1 (code, mode, shifted,
>> + gen_int_shift_amount (mode, amount),
>> + target, unsignedp);
>> }
>>
>> /* Likewise, but return 0 if that cannot be done. */
>> @@ -3855,7 +3861,7 @@ expand_smod_pow2 (scalar_int_mode mode,
>> {
>> HOST_WIDE_INT masklow = (HOST_WIDE_INT_1 << logd) - 1;
>> signmask = force_reg (mode, signmask);
>> - shift = GEN_INT (GET_MODE_BITSIZE (mode) - logd);
>> + shift = gen_int_shift_amount (mode, GET_MODE_BITSIZE (mode) - logd);
>>
>> /* Use the rtx_cost of a LSHIFTRT instruction to determine
>> which instruction sequence to use. If logical right shifts
>> Index: gcc/lower-subreg.c
>> ===================================================================
>> --- gcc/lower-subreg.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/lower-subreg.c 2017-10-23 11:47:11.274393237 +0100
>> @@ -129,7 +129,7 @@ shift_cost (bool speed_p, struct cost_rt
>> PUT_CODE (rtxes->shift, code);
>> PUT_MODE (rtxes->shift, mode);
>> PUT_MODE (rtxes->source, mode);
>> - XEXP (rtxes->shift, 1) = GEN_INT (op1);
>> + XEXP (rtxes->shift, 1) = gen_int_shift_amount (mode, op1);
>> return set_src_cost (rtxes->shift, mode, speed_p);
>> }
>>
>> Index: gcc/simplify-rtx.c
>> ===================================================================
>> --- gcc/simplify-rtx.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/simplify-rtx.c 2017-10-23 11:47:11.277288162 +0100
>> @@ -1165,7 +1165,8 @@ simplify_unary_operation_1 (enum rtx_cod
>> if (STORE_FLAG_VALUE == 1)
>> {
>> temp = simplify_gen_binary (ASHIFTRT, inner, XEXP (op, 0),
>> - GEN_INT (isize - 1));
>> + gen_int_shift_amount (inner,
>> + isize - 1));
>> if (int_mode == inner)
>> return temp;
>> if (GET_MODE_PRECISION (int_mode) > isize)
>> @@ -1175,7 +1176,8 @@ simplify_unary_operation_1 (enum rtx_cod
>> else if (STORE_FLAG_VALUE == -1)
>> {
>> temp = simplify_gen_binary (LSHIFTRT, inner, XEXP (op, 0),
>> - GEN_INT (isize - 1));
>> + gen_int_shift_amount (inner,
>> + isize - 1));
>> if (int_mode == inner)
>> return temp;
>> if (GET_MODE_PRECISION (int_mode) > isize)
>> @@ -2679,7 +2681,8 @@ simplify_binary_operation_1 (enum rtx_co
>> {
>> val = wi::exact_log2 (rtx_mode_t (trueop1, mode));
>> if (val >= 0)
>> - return simplify_gen_binary (ASHIFT, mode, op0, GEN_INT (val));
>> + return simplify_gen_binary (ASHIFT, mode, op0,
>> + gen_int_shift_amount (mode, val));
>> }
>>
>> /* x*2 is x+x and x*(-1) is -x */
>> @@ -3303,7 +3306,8 @@ simplify_binary_operation_1 (enum rtx_co
>> /* Convert divide by power of two into shift. */
>> if (CONST_INT_P (trueop1)
>> && (val = exact_log2 (UINTVAL (trueop1))) > 0)
>> - return simplify_gen_binary (LSHIFTRT, mode, op0, GEN_INT (val));
>> + return simplify_gen_binary (LSHIFTRT, mode, op0,
>> + gen_int_shift_amount (mode, val));
>> break;
>>
>> case DIV:
>> @@ -3423,10 +3427,12 @@ simplify_binary_operation_1 (enum rtx_co
>> && IN_RANGE (INTVAL (trueop1),
>> GET_MODE_UNIT_PRECISION (mode) / 2 + (code == ROTATE),
>> GET_MODE_UNIT_PRECISION (mode) - 1))
>> - return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>> - mode, op0,
>> - GEN_INT (GET_MODE_UNIT_PRECISION (mode)
>> - - INTVAL (trueop1)));
>> + {
>> + int new_amount = GET_MODE_UNIT_PRECISION (mode) - INTVAL (trueop1);
>> + rtx new_amount_rtx = gen_int_shift_amount (mode, new_amount);
>> + return simplify_gen_binary (code == ROTATE ? ROTATERT : ROTATE,
>> + mode, op0, new_amount_rtx);
>> + }
>> #endif
>> /* FALLTHRU */
>> case ASHIFTRT:
>> @@ -3466,8 +3472,8 @@ simplify_binary_operation_1 (enum rtx_co
>> == GET_MODE_BITSIZE (inner_mode) - GET_MODE_BITSIZE (int_mode))
>> && subreg_lowpart_p (op0))
>> {
>> - rtx tmp = GEN_INT (INTVAL (XEXP (SUBREG_REG (op0), 1))
>> - + INTVAL (op1));
>> + rtx tmp = gen_int_shift_amount
>> + (inner_mode, INTVAL (XEXP (SUBREG_REG (op0), 1)) + INTVAL (op1));
>> tmp = simplify_gen_binary (code, inner_mode,
>> XEXP (SUBREG_REG (op0), 0),
>> tmp);
>> @@ -3478,7 +3484,8 @@ simplify_binary_operation_1 (enum rtx_co
>> {
>> val = INTVAL (op1) & (GET_MODE_UNIT_PRECISION (mode) - 1);
>> if (val != INTVAL (op1))
>> - return simplify_gen_binary (code, mode, op0, GEN_INT (val));
>> + return simplify_gen_binary (code, mode, op0,
>> + gen_int_shift_amount (mode, val));
>> }
>> break;
>>
>> Index: gcc/combine.c
>> ===================================================================
>> --- gcc/combine.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/combine.c 2017-10-23 11:47:11.272463287 +0100
>> @@ -3773,8 +3773,9 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>> && INTVAL (XEXP (*split, 1)) > 0
>> && (i = exact_log2 (UINTVAL (XEXP (*split, 1)))) >= 0)
>> {
>> + rtx i_rtx = gen_int_shift_amount (split_mode, i);
>> SUBST (*split, gen_rtx_ASHIFT (split_mode,
>> - XEXP (*split, 0), GEN_INT (i)));
>> + XEXP (*split, 0), i_rtx));
>> /* Update split_code because we may not have a multiply
>> anymore. */
>> split_code = GET_CODE (*split);
>> @@ -3788,8 +3789,10 @@ try_combine (rtx_insn *i3, rtx_insn *i2,
>> && (i = exact_log2 (UINTVAL (XEXP (XEXP (*split, 0), 1)))) >= 0)
>> {
>> rtx nsplit = XEXP (*split, 0);
>> + rtx i_rtx = gen_int_shift_amount (GET_MODE (nsplit), i);
>> SUBST (XEXP (*split, 0), gen_rtx_ASHIFT (GET_MODE (nsplit),
>> - XEXP (nsplit, 0), GEN_INT (i)));
>> + XEXP (nsplit, 0),
>> + i_rtx));
>> /* Update split_code because we may not have a multiply
>> anymore. */
>> split_code = GET_CODE (*split);
>> @@ -5057,12 +5060,12 @@ find_split_point (rtx *loc, rtx_insn *in
>> GET_MODE (XEXP (SET_SRC (x), 0))))))
>> {
>> machine_mode mode = GET_MODE (XEXP (SET_SRC (x), 0));
>> -
>> + rtx pos_rtx = gen_int_shift_amount (mode, pos);
>> SUBST (SET_SRC (x),
>> gen_rtx_NEG (mode,
>> gen_rtx_LSHIFTRT (mode,
>> XEXP (SET_SRC (x), 0),
>> - GEN_INT (pos))));
>> + pos_rtx)));
>>
>> split = find_split_point (&SET_SRC (x), insn, true);
>> if (split && split != &SET_SRC (x))
>> @@ -5120,11 +5123,11 @@ find_split_point (rtx *loc, rtx_insn *in
>> {
>> unsigned HOST_WIDE_INT mask
>> = (HOST_WIDE_INT_1U << len) - 1;
>> + rtx pos_rtx = gen_int_shift_amount (mode, pos);
>> SUBST (SET_SRC (x),
>> gen_rtx_AND (mode,
>> gen_rtx_LSHIFTRT
>> - (mode, gen_lowpart (mode, inner),
>> - GEN_INT (pos)),
>> + (mode, gen_lowpart (mode, inner), pos_rtx),
>> gen_int_mode (mask, mode)));
>>
>> split = find_split_point (&SET_SRC (x), insn, true);
>> @@ -5133,14 +5136,15 @@ find_split_point (rtx *loc, rtx_insn *in
>> }
>> else
>> {
>> + int left_bits = GET_MODE_PRECISION (mode) - len - pos;
>> + int right_bits = GET_MODE_PRECISION (mode) - len;
>> SUBST (SET_SRC (x),
>> gen_rtx_fmt_ee
>> (unsignedp ? LSHIFTRT : ASHIFTRT, mode,
>> gen_rtx_ASHIFT (mode,
>> gen_lowpart (mode, inner),
>> - GEN_INT (GET_MODE_PRECISION (mode)
>> - - len - pos)),
>> - GEN_INT (GET_MODE_PRECISION (mode) - len)));
>> + gen_int_shift_amount (mode, left_bits)),
>> + gen_int_shift_amount (mode, right_bits)));
>>
>> split = find_split_point (&SET_SRC (x), insn, true);
>> if (split && split != &SET_SRC (x))
>> @@ -8915,10 +8919,11 @@ force_int_to_mode (rtx x, scalar_int_mod
>> /* Must be more sign bit copies than the mask needs. */
>> && ((int) num_sign_bit_copies (XEXP (x, 0), GET_MODE (XEXP (x, 0)))
>> >= exact_log2 (mask + 1)))
>> - x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>> - GEN_INT (GET_MODE_PRECISION (xmode)
>> - - exact_log2 (mask + 1)));
>> -
>> + {
>> + int nbits = GET_MODE_PRECISION (xmode) - exact_log2 (mask + 1);
>> + x = simplify_gen_binary (LSHIFTRT, xmode, XEXP (x, 0),
>> + gen_int_shift_amount (xmode, nbits));
>> + }
>> goto shiftrt;
>>
>> case ASHIFTRT:
>> @@ -10415,7 +10420,7 @@ simplify_shift_const_1 (enum rtx_code co
>> {
>> enum rtx_code orig_code = code;
>> rtx orig_varop = varop;
>> - int count;
>> + int count, log2;
>> machine_mode mode = result_mode;
>> machine_mode shift_mode;
>> scalar_int_mode tmode, inner_mode, int_mode, int_varop_mode, int_result_mode;
>> @@ -10618,13 +10623,11 @@ simplify_shift_const_1 (enum rtx_code co
>> is cheaper. But it is still better on those machines to
>> merge two shifts into one. */
>> if (CONST_INT_P (XEXP (varop, 1))
>> - && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>> + && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>> {
>> - varop
>> - = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>> - XEXP (varop, 0),
>> - GEN_INT (exact_log2 (
>> - UINTVAL (XEXP (varop, 1)))));
>> + rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>> + varop = simplify_gen_binary (ASHIFT, GET_MODE (varop),
>> + XEXP (varop, 0), log2_rtx);
>> continue;
>> }
>> break;
>> @@ -10632,13 +10635,11 @@ simplify_shift_const_1 (enum rtx_code co
>> case UDIV:
>> /* Similar, for when divides are cheaper. */
>> if (CONST_INT_P (XEXP (varop, 1))
>> - && exact_log2 (UINTVAL (XEXP (varop, 1))) >= 0)
>> + && (log2 = exact_log2 (UINTVAL (XEXP (varop, 1)))) >= 0)
>> {
>> - varop
>> - = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>> - XEXP (varop, 0),
>> - GEN_INT (exact_log2 (
>> - UINTVAL (XEXP (varop, 1)))));
>> + rtx log2_rtx = gen_int_shift_amount (GET_MODE (varop), log2);
>> + varop = simplify_gen_binary (LSHIFTRT, GET_MODE (varop),
>> + XEXP (varop, 0), log2_rtx);
>> continue;
>> }
>> break;
>> @@ -10773,10 +10774,10 @@ simplify_shift_const_1 (enum rtx_code co
>>
>> mask_rtx = gen_int_mode (nonzero_bits (varop, int_varop_mode),
>> int_result_mode);
>> -
>> + rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>> mask_rtx
>> = simplify_const_binary_operation (code, int_result_mode,
>> - mask_rtx, GEN_INT (count));
>> + mask_rtx, count_rtx);
>>
>> /* Give up if we can't compute an outer operation to use. */
>> if (mask_rtx == 0
>> @@ -10832,9 +10833,10 @@ simplify_shift_const_1 (enum rtx_code co
>> if (code == ASHIFTRT && int_mode != int_result_mode)
>> break;
>>
>> + rtx count_rtx = gen_int_shift_amount (int_result_mode, count);
>> rtx new_rtx = simplify_const_binary_operation (code, int_mode,
>> XEXP (varop, 0),
>> - GEN_INT (count));
>> + count_rtx);
>> varop = gen_rtx_fmt_ee (code, int_mode, new_rtx, XEXP (varop, 1));
>> count = 0;
>> continue;
>> @@ -10900,7 +10902,7 @@ simplify_shift_const_1 (enum rtx_code co
>> && (new_rtx = simplify_const_binary_operation
>> (code, int_result_mode,
>> gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> - GEN_INT (count))) != 0
>> + gen_int_shift_amount (int_result_mode, count))) != 0
>> && CONST_INT_P (new_rtx)
>> && merge_outer_ops (&outer_op, &outer_const, GET_CODE (varop),
>> INTVAL (new_rtx), int_result_mode,
>> @@ -11043,7 +11045,7 @@ simplify_shift_const_1 (enum rtx_code co
>> && (new_rtx = simplify_const_binary_operation
>> (ASHIFT, int_result_mode,
>> gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> - GEN_INT (count))) != 0
>> + gen_int_shift_amount (int_result_mode, count))) != 0
>> && CONST_INT_P (new_rtx)
>> && merge_outer_ops (&outer_op, &outer_const, PLUS,
>> INTVAL (new_rtx), int_result_mode,
>> @@ -11064,7 +11066,7 @@ simplify_shift_const_1 (enum rtx_code co
>> && (new_rtx = simplify_const_binary_operation
>> (code, int_result_mode,
>> gen_int_mode (INTVAL (XEXP (varop, 1)), int_result_mode),
>> - GEN_INT (count))) != 0
>> + gen_int_shift_amount (int_result_mode, count))) != 0
>> && CONST_INT_P (new_rtx)
>> && merge_outer_ops (&outer_op, &outer_const, XOR,
>> INTVAL (new_rtx), int_result_mode,
>> @@ -11119,12 +11121,12 @@ simplify_shift_const_1 (enum rtx_code co
>> - GET_MODE_UNIT_PRECISION (GET_MODE (varop)))))
>> {
>> rtx varop_inner = XEXP (varop, 0);
>> -
>> - varop_inner
>> - = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>> - XEXP (varop_inner, 0),
>> - GEN_INT
>> - (count + INTVAL (XEXP (varop_inner, 1))));
>> + int new_count = count + INTVAL (XEXP (varop_inner, 1));
>> + rtx new_count_rtx = gen_int_shift_amount (GET_MODE (varop_inner),
>> + new_count);
>> + varop_inner = gen_rtx_LSHIFTRT (GET_MODE (varop_inner),
>> + XEXP (varop_inner, 0),
>> + new_count_rtx);
>> varop = gen_rtx_TRUNCATE (GET_MODE (varop), varop_inner);
>> count = 0;
>> continue;
>> @@ -11176,7 +11178,8 @@ simplify_shift_const_1 (enum rtx_code co
>> x = NULL_RTX;
>>
>> if (x == NULL_RTX)
>> - x = simplify_gen_binary (code, shift_mode, varop, GEN_INT (count));
>> + x = simplify_gen_binary (code, shift_mode, varop,
>> + gen_int_shift_amount (shift_mode, count));
>>
>> /* If we were doing an LSHIFTRT in a wider mode than it was originally,
>> turn off all the bits that the shift would have turned off. */
>> @@ -11238,7 +11241,8 @@ simplify_shift_const (rtx x, enum rtx_co
>> return tem;
>>
>> if (!x)
>> - x = simplify_gen_binary (code, GET_MODE (varop), varop, GEN_INT (count));
>> + x = simplify_gen_binary (code, GET_MODE (varop), varop,
>> + gen_int_shift_amount (GET_MODE (varop), count));
>> if (GET_MODE (x) != result_mode)
>> x = gen_lowpart (result_mode, x);
>> return x;
>> @@ -11429,8 +11433,9 @@ change_zero_ext (rtx pat)
>> if (BITS_BIG_ENDIAN)
>> start = GET_MODE_PRECISION (inner_mode) - size - start;
>>
>> - if (start)
>> - x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0), GEN_INT (start));
>> + if (start != 0)
>> + x = gen_rtx_LSHIFTRT (inner_mode, XEXP (x, 0),
>> + gen_int_shift_amount (inner_mode, start));
>> else
>> x = XEXP (x, 0);
>> if (mode != inner_mode)
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c 2017-10-23 11:47:06.643477568 +0100
>> +++ gcc/optabs.c 2017-10-23 11:47:11.276323187 +0100
>> @@ -431,8 +431,9 @@ expand_superword_shift (optab binoptab,
>> if (binoptab != ashr_optab)
>> emit_move_insn (outof_target, CONST0_RTX (word_mode));
>> else
>> - if (!force_expand_binop (word_mode, binoptab,
>> - outof_input, GEN_INT (BITS_PER_WORD - 1),
>> + if (!force_expand_binop (word_mode, binoptab, outof_input,
>> + gen_int_shift_amount (word_mode,
>> + BITS_PER_WORD - 1),
>> outof_target, unsignedp, methods))
>> return false;
>> }
>> @@ -789,7 +790,8 @@ expand_doubleword_mult (machine_mode mod
>> {
>> int low = (WORDS_BIG_ENDIAN ? 1 : 0);
>> int high = (WORDS_BIG_ENDIAN ? 0 : 1);
>> - rtx wordm1 = umulp ? NULL_RTX : GEN_INT (BITS_PER_WORD - 1);
>> + rtx wordm1 = (umulp ? NULL_RTX
>> + : gen_int_shift_amount (word_mode, BITS_PER_WORD - 1));
>> rtx product, adjust, product_high, temp;
>>
>> rtx op0_high = operand_subword_force (op0, high, mode);
>> @@ -1185,7 +1187,7 @@ expand_binop (machine_mode mode, optab b
>> unsigned int bits = GET_MODE_PRECISION (int_mode);
>>
>> if (CONST_INT_P (op1))
>> - newop1 = GEN_INT (bits - INTVAL (op1));
>> + newop1 = gen_int_shift_amount (int_mode, bits - INTVAL (op1));
>> else if (targetm.shift_truncation_mask (int_mode) == bits - 1)
>> newop1 = negate_rtx (GET_MODE (op1), op1);
>> else
>> @@ -1399,11 +1401,11 @@ expand_binop (machine_mode mode, optab b
>> shift_mask = targetm.shift_truncation_mask (word_mode);
>> op1_mode = (GET_MODE (op1) != VOIDmode
>> ? as_a <scalar_int_mode> (GET_MODE (op1))
>> - : word_mode);
>> + : get_shift_amount_mode (word_mode));
>>
>> /* Apply the truncation to constant shifts. */
>> if (double_shift_mask > 0 && CONST_INT_P (op1))
>> - op1 = GEN_INT (INTVAL (op1) & double_shift_mask);
>> + op1 = gen_int_mode (INTVAL (op1) & double_shift_mask, op1_mode);
>>
>> if (op1 == CONST0_RTX (op1_mode))
>> return op0;
>> @@ -1513,7 +1515,7 @@ expand_binop (machine_mode mode, optab b
>> else
>> {
>> rtx into_temp1, into_temp2, outof_temp1, outof_temp2;
>> - rtx first_shift_count, second_shift_count;
>> + HOST_WIDE_INT first_shift_count, second_shift_count;
>> optab reverse_unsigned_shift, unsigned_shift;
>>
>> reverse_unsigned_shift = (left_shift ^ (shift_count < BITS_PER_WORD)
>> @@ -1524,20 +1526,24 @@ expand_binop (machine_mode mode, optab b
>>
>> if (shift_count > BITS_PER_WORD)
>> {
>> - first_shift_count = GEN_INT (shift_count - BITS_PER_WORD);
>> - second_shift_count = GEN_INT (2 * BITS_PER_WORD - shift_count);
>> + first_shift_count = shift_count - BITS_PER_WORD;
>> + second_shift_count = 2 * BITS_PER_WORD - shift_count;
>> }
>> else
>> {
>> - first_shift_count = GEN_INT (BITS_PER_WORD - shift_count);
>> - second_shift_count = GEN_INT (shift_count);
>> + first_shift_count = BITS_PER_WORD - shift_count;
>> + second_shift_count = shift_count;
>> }
>> + rtx first_shift_count_rtx
>> + = gen_int_shift_amount (word_mode, first_shift_count);
>> + rtx second_shift_count_rtx
>> + = gen_int_shift_amount (word_mode, second_shift_count);
>>
>> into_temp1 = expand_binop (word_mode, unsigned_shift,
>> - outof_input, first_shift_count,
>> + outof_input, first_shift_count_rtx,
>> NULL_RTX, unsignedp, next_methods);
>> into_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>> - into_input, second_shift_count,
>> + into_input, second_shift_count_rtx,
>> NULL_RTX, unsignedp, next_methods);
>>
>> if (into_temp1 != 0 && into_temp2 != 0)
>> @@ -1550,10 +1556,10 @@ expand_binop (machine_mode mode, optab b
>> emit_move_insn (into_target, inter);
>>
>> outof_temp1 = expand_binop (word_mode, unsigned_shift,
>> - into_input, first_shift_count,
>> + into_input, first_shift_count_rtx,
>> NULL_RTX, unsignedp, next_methods);
>> outof_temp2 = expand_binop (word_mode, reverse_unsigned_shift,
>> - outof_input, second_shift_count,
>> + outof_input, second_shift_count_rtx,
>> NULL_RTX, unsignedp, next_methods);
>>
>> if (inter != 0 && outof_temp1 != 0 && outof_temp2 != 0)
>> @@ -2793,25 +2799,29 @@ expand_unop (machine_mode mode, optab un
>>
>> if (optab_handler (rotl_optab, mode) != CODE_FOR_nothing)
>> {
>> - temp = expand_binop (mode, rotl_optab, op0, GEN_INT (8), target,
>> - unsignedp, OPTAB_DIRECT);
>> + temp = expand_binop (mode, rotl_optab, op0,
>> + gen_int_shift_amount (mode, 8),
>> + target, unsignedp, OPTAB_DIRECT);
>> if (temp)
>> return temp;
>> }
>>
>> if (optab_handler (rotr_optab, mode) != CODE_FOR_nothing)
>> {
>> - temp = expand_binop (mode, rotr_optab, op0, GEN_INT (8), target,
>> - unsignedp, OPTAB_DIRECT);
>> + temp = expand_binop (mode, rotr_optab, op0,
>> + gen_int_shift_amount (mode, 8),
>> + target, unsignedp, OPTAB_DIRECT);
>> if (temp)
>> return temp;
>> }
>>
>> last = get_last_insn ();
>>
>> - temp1 = expand_binop (mode, ashl_optab, op0, GEN_INT (8), NULL_RTX,
>> + temp1 = expand_binop (mode, ashl_optab, op0,
>> + gen_int_shift_amount (mode, 8), NULL_RTX,
>> unsignedp, OPTAB_WIDEN);
>> - temp2 = expand_binop (mode, lshr_optab, op0, GEN_INT (8), NULL_RTX,
>> + temp2 = expand_binop (mode, lshr_optab, op0,
>> + gen_int_shift_amount (mode, 8), NULL_RTX,
>> unsignedp, OPTAB_WIDEN);
>> if (temp1 && temp2)
>> {
>> @@ -5369,11 +5379,11 @@ vector_compare_rtx (machine_mode cmp_mod
>> }
>>
>> /* Checks if vec_perm mask SEL is a constant equivalent to a shift of the first
>> - vec_perm operand, assuming the second operand is a constant vector of zeroes.
>> - Return the shift distance in bits if so, or NULL_RTX if the vec_perm is not a
>> - shift. */
>> + vec_perm operand (which has mode OP0_MODE), assuming the second
>> + operand is a constant vector of zeroes. Return the shift distance in
>> + bits if so, or NULL_RTX if the vec_perm is not a shift. */
>> static rtx
>> -shift_amt_for_vec_perm_mask (rtx sel)
>> +shift_amt_for_vec_perm_mask (machine_mode op0_mode, rtx sel)
>> {
>> unsigned int i, first, nelt = GET_MODE_NUNITS (GET_MODE (sel));
>> unsigned int bitsize = GET_MODE_UNIT_BITSIZE (GET_MODE (sel));
>> @@ -5393,7 +5403,7 @@ shift_amt_for_vec_perm_mask (rtx sel)
>> return NULL_RTX;
>> }
>>
>> - return GEN_INT (first * bitsize);
>> + return gen_int_shift_amount (op0_mode, first * bitsize);
>> }
>>
>> /* A subroutine of expand_vec_perm for expanding one vec_perm insn. */
>> @@ -5473,7 +5483,7 @@ expand_vec_perm (machine_mode mode, rtx
>> && (shift_code != CODE_FOR_nothing
>> || shift_code_qi != CODE_FOR_nothing))
>> {
>> - shift_amt = shift_amt_for_vec_perm_mask (sel);
>> + shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
>> if (shift_amt)
>> {
>> struct expand_operand ops[3];
>> @@ -5563,7 +5573,8 @@ expand_vec_perm (machine_mode mode, rtx
>> NULL, 0, OPTAB_DIRECT);
>> else
>> sel = expand_simple_binop (selmode, ASHIFT, sel,
>> - GEN_INT (exact_log2 (u)),
>> + gen_int_shift_amount (selmode,
>> + exact_log2 (u)),
>> NULL, 0, OPTAB_DIRECT);
>> gcc_assert (sel != NULL);
>>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [15/nn] Use more specific hash functions in rtlhash.c
2017-10-23 11:27 ` [15/nn] Use more specific hash functions in rtlhash.c Richard Sandiford
@ 2017-10-26 12:08 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:08 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:26 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Avoid using add_object when we have more specific routines available.
Ok.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * rtlhash.c (add_rtx): Use add_hwi for 'w' and add_int for 'i'.
>
> Index: gcc/rtlhash.c
> ===================================================================
> --- gcc/rtlhash.c 2017-02-23 19:54:03.000000000 +0000
> +++ gcc/rtlhash.c 2017-10-23 11:47:20.120201389 +0100
> @@ -77,11 +77,11 @@ add_rtx (const_rtx x, hash &hstate)
> switch (fmt[i])
> {
> case 'w':
> - hstate.add_object (XWINT (x, i));
> + hstate.add_hwi (XWINT (x, i));
> break;
> case 'n':
> case 'i':
> - hstate.add_object (XINT (x, i));
> + hstate.add_int (XINT (x, i));
> break;
> case 'V':
> case 'E':
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [16/nn] Factor out the mode handling in lower-subreg.c
2017-10-23 11:27 ` [16/nn] Factor out the mode handling in lower-subreg.c Richard Sandiford
@ 2017-10-26 12:09 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:09 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:27 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch adds a helper routine (interesting_mode_p) to lower-subreg.c,
> to make the decision about whether a mode can be split and, if so,
> calculate the number of bytes and words in the mode. At present this
> function always returns true; a later patch will add cases in which it
> can return false.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * lower-subreg.c (interesting_mode_p): New function.
> (compute_costs, find_decomposable_subregs, decompose_register)
> (simplify_subreg_concatn, can_decompose_p, resolve_simple_move)
> (resolve_clobber, dump_choices): Use it.
>
> Index: gcc/lower-subreg.c
> ===================================================================
> --- gcc/lower-subreg.c 2017-10-23 11:47:11.274393237 +0100
> +++ gcc/lower-subreg.c 2017-10-23 11:47:23.555013148 +0100
> @@ -103,6 +103,18 @@ #define twice_word_mode \
> #define choices \
> this_target_lower_subreg->x_choices
>
> +/* Return true if MODE is a mode we know how to lower. When returning true,
> + store its byte size in *BYTES and its word size in *WORDS. */
> +
> +static inline bool
> +interesting_mode_p (machine_mode mode, unsigned int *bytes,
> + unsigned int *words)
> +{
> + *bytes = GET_MODE_SIZE (mode);
> + *words = CEIL (*bytes, UNITS_PER_WORD);
> + return true;
> +}
> +
> /* RTXes used while computing costs. */
> struct cost_rtxes {
> /* Source and target registers. */
> @@ -199,10 +211,10 @@ compute_costs (bool speed_p, struct cost
> for (i = 0; i < MAX_MACHINE_MODE; i++)
> {
> machine_mode mode = (machine_mode) i;
> - int factor = GET_MODE_SIZE (mode) / UNITS_PER_WORD;
> - if (factor > 1)
> + unsigned int size, factor;
> + if (interesting_mode_p (mode, &size, &factor) && factor > 1)
> {
> - int mode_move_cost;
> + unsigned int mode_move_cost;
>
> PUT_MODE (rtxes->target, mode);
> PUT_MODE (rtxes->source, mode);
> @@ -469,10 +481,10 @@ find_decomposable_subregs (rtx *loc, enu
> continue;
> }
>
> - outer_size = GET_MODE_SIZE (GET_MODE (x));
> - inner_size = GET_MODE_SIZE (GET_MODE (inner));
> - outer_words = (outer_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> - inner_words = (inner_size + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> + if (!interesting_mode_p (GET_MODE (x), &outer_size, &outer_words)
> + || !interesting_mode_p (GET_MODE (inner), &inner_size,
> + &inner_words))
> + continue;
>
> /* We only try to decompose single word subregs of multi-word
> registers. When we find one, we return -1 to avoid iterating
> @@ -507,7 +519,7 @@ find_decomposable_subregs (rtx *loc, enu
> }
> else if (REG_P (x))
> {
> - unsigned int regno;
> + unsigned int regno, size, words;
>
> /* We will see an outer SUBREG before we see the inner REG, so
> when we see a plain REG here it means a direct reference to
> @@ -527,7 +539,8 @@ find_decomposable_subregs (rtx *loc, enu
>
> regno = REGNO (x);
> if (!HARD_REGISTER_NUM_P (regno)
> - && GET_MODE_SIZE (GET_MODE (x)) > UNITS_PER_WORD)
> + && interesting_mode_p (GET_MODE (x), &size, &words)
> + && words > 1)
> {
> switch (*pcmi)
> {
> @@ -567,15 +580,15 @@ find_decomposable_subregs (rtx *loc, enu
> decompose_register (unsigned int regno)
> {
> rtx reg;
> - unsigned int words, i;
> + unsigned int size, words, i;
> rtvec v;
>
> reg = regno_reg_rtx[regno];
>
> regno_reg_rtx[regno] = NULL_RTX;
>
> - words = GET_MODE_SIZE (GET_MODE (reg));
> - words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> + if (!interesting_mode_p (GET_MODE (reg), &size, &words))
> + gcc_unreachable ();
>
> v = rtvec_alloc (words);
> for (i = 0; i < words; ++i)
> @@ -599,25 +612,29 @@ decompose_register (unsigned int regno)
> simplify_subreg_concatn (machine_mode outermode, rtx op,
> unsigned int byte)
> {
> - unsigned int inner_size;
> + unsigned int outer_size, outer_words, inner_size, inner_words;
> machine_mode innermode, partmode;
> rtx part;
> unsigned int final_offset;
>
> + innermode = GET_MODE (op);
> + if (!interesting_mode_p (outermode, &outer_size, &outer_words)
> + || !interesting_mode_p (innermode, &inner_size, &inner_words))
> + gcc_unreachable ();
> +
> gcc_assert (GET_CODE (op) == CONCATN);
> - gcc_assert (byte % GET_MODE_SIZE (outermode) == 0);
> + gcc_assert (byte % outer_size == 0);
>
> - innermode = GET_MODE (op);
> - gcc_assert (byte < GET_MODE_SIZE (innermode));
> - if (GET_MODE_SIZE (outermode) > GET_MODE_SIZE (innermode))
> + gcc_assert (byte < inner_size);
> + if (outer_size > inner_size)
> return NULL_RTX;
>
> - inner_size = GET_MODE_SIZE (innermode) / XVECLEN (op, 0);
> + inner_size /= XVECLEN (op, 0);
> part = XVECEXP (op, 0, byte / inner_size);
> partmode = GET_MODE (part);
>
> final_offset = byte % inner_size;
> - if (final_offset + GET_MODE_SIZE (outermode) > inner_size)
> + if (final_offset + outer_size > inner_size)
> return NULL_RTX;
>
> /* VECTOR_CSTs in debug expressions are expanded into CONCATN instead of
> @@ -801,9 +818,10 @@ can_decompose_p (rtx x)
>
> if (HARD_REGISTER_NUM_P (regno))
> {
> - unsigned int byte, num_bytes;
> + unsigned int byte, num_bytes, num_words;
>
> - num_bytes = GET_MODE_SIZE (GET_MODE (x));
> + if (!interesting_mode_p (GET_MODE (x), &num_bytes, &num_words))
> + return false;
> for (byte = 0; byte < num_bytes; byte += UNITS_PER_WORD)
> if (simplify_subreg_regno (regno, GET_MODE (x), byte, word_mode) < 0)
> return false;
> @@ -826,14 +844,15 @@ resolve_simple_move (rtx set, rtx_insn *
> rtx src, dest, real_dest;
> rtx_insn *insns;
> machine_mode orig_mode, dest_mode;
> - unsigned int words;
> + unsigned int orig_size, words;
> bool pushing;
>
> src = SET_SRC (set);
> dest = SET_DEST (set);
> orig_mode = GET_MODE (dest);
>
> - words = (GET_MODE_SIZE (orig_mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> + if (!interesting_mode_p (orig_mode, &orig_size, &words))
> + gcc_unreachable ();
> gcc_assert (words > 1);
>
> start_sequence ();
> @@ -964,7 +983,7 @@ resolve_simple_move (rtx set, rtx_insn *
> {
> unsigned int i, j, jinc;
>
> - gcc_assert (GET_MODE_SIZE (orig_mode) % UNITS_PER_WORD == 0);
> + gcc_assert (orig_size % UNITS_PER_WORD == 0);
> gcc_assert (GET_CODE (XEXP (dest, 0)) != PRE_MODIFY);
> gcc_assert (GET_CODE (XEXP (dest, 0)) != POST_MODIFY);
>
> @@ -1059,7 +1078,7 @@ resolve_clobber (rtx pat, rtx_insn *insn
> {
> rtx reg;
> machine_mode orig_mode;
> - unsigned int words, i;
> + unsigned int orig_size, words, i;
> int ret;
>
> reg = XEXP (pat, 0);
> @@ -1067,8 +1086,8 @@ resolve_clobber (rtx pat, rtx_insn *insn
> return false;
>
> orig_mode = GET_MODE (reg);
> - words = GET_MODE_SIZE (orig_mode);
> - words = (words + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
> + if (!interesting_mode_p (orig_mode, &orig_size, &words))
> + gcc_unreachable ();
>
> ret = validate_change (NULL_RTX, &XEXP (pat, 0),
> simplify_gen_subreg_concatn (word_mode, reg,
> @@ -1332,12 +1351,13 @@ dump_shift_choices (enum rtx_code code,
> static void
> dump_choices (bool speed_p, const char *description)
> {
> - unsigned int i;
> + unsigned int size, factor, i;
>
> fprintf (dump_file, "Choices when optimizing for %s:\n", description);
>
> for (i = 0; i < MAX_MACHINE_MODE; i++)
> - if (GET_MODE_SIZE ((machine_mode) i) > UNITS_PER_WORD)
> + if (interesting_mode_p ((machine_mode) i, &size, &factor)
> + && factor > 1)
> fprintf (dump_file, " %s mode %s for copy lowering.\n",
> choices[speed_p].move_modes_to_split[i]
> ? "Splitting"
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function
2017-10-23 11:28 ` [17/nn] Turn var-tracking.c:INT_MEM_OFFSET into a function Richard Sandiford
@ 2017-10-26 12:10 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:10 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:27 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This avoids the double evaluation mentioned in the comments and
> simplifies the change to make MEM_OFFSET variable.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * var-tracking.c (INT_MEM_OFFSET): Replace with...
> (int_mem_offset): ...this new function.
> (var_mem_set, var_mem_delete_and_set, var_mem_delete)
> (find_mem_expr_in_1pdv, dataflow_set_preserve_mem_locs)
> (same_variable_part_p, use_type, add_stores, vt_get_decl_and_offset):
> Update accordingly.
>
> Index: gcc/var-tracking.c
> ===================================================================
> --- gcc/var-tracking.c 2017-09-12 14:28:56.401824826 +0100
> +++ gcc/var-tracking.c 2017-10-23 11:47:27.197231712 +0100
> @@ -390,8 +390,15 @@ struct variable
> /* Pointer to the BB's information specific to variable tracking pass. */
> #define VTI(BB) ((variable_tracking_info *) (BB)->aux)
>
> -/* Macro to access MEM_OFFSET as an HOST_WIDE_INT. Evaluates MEM twice. */
> -#define INT_MEM_OFFSET(mem) (MEM_OFFSET_KNOWN_P (mem) ? MEM_OFFSET (mem) : 0)
> +/* Return MEM_OFFSET (MEM) as a HOST_WIDE_INT, or 0 if we can't. */
> +
> +static inline HOST_WIDE_INT
> +int_mem_offset (const_rtx mem)
> +{
> + if (MEM_OFFSET_KNOWN_P (mem))
> + return MEM_OFFSET (mem);
> + return 0;
> +}
>
> #if CHECKING_P && (GCC_VERSION >= 2007)
>
> @@ -2336,7 +2343,7 @@ var_mem_set (dataflow_set *set, rtx loc,
> rtx set_src)
> {
> tree decl = MEM_EXPR (loc);
> - HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> + HOST_WIDE_INT offset = int_mem_offset (loc);
>
> var_mem_decl_set (set, loc, initialized,
> dv_from_decl (decl), offset, set_src, INSERT);
> @@ -2354,7 +2361,7 @@ var_mem_delete_and_set (dataflow_set *se
> enum var_init_status initialized, rtx set_src)
> {
> tree decl = MEM_EXPR (loc);
> - HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> + HOST_WIDE_INT offset = int_mem_offset (loc);
>
> clobber_overlapping_mems (set, loc);
> decl = var_debug_decl (decl);
> @@ -2375,7 +2382,7 @@ var_mem_delete_and_set (dataflow_set *se
> var_mem_delete (dataflow_set *set, rtx loc, bool clobber)
> {
> tree decl = MEM_EXPR (loc);
> - HOST_WIDE_INT offset = INT_MEM_OFFSET (loc);
> + HOST_WIDE_INT offset = int_mem_offset (loc);
>
> clobber_overlapping_mems (set, loc);
> decl = var_debug_decl (decl);
> @@ -4618,7 +4625,7 @@ find_mem_expr_in_1pdv (tree expr, rtx va
> for (node = var->var_part[0].loc_chain; node; node = node->next)
> if (MEM_P (node->loc)
> && MEM_EXPR (node->loc) == expr
> - && INT_MEM_OFFSET (node->loc) == 0)
> + && int_mem_offset (node->loc) == 0)
> {
> where = node;
> break;
> @@ -4683,7 +4690,7 @@ dataflow_set_preserve_mem_locs (variable
> /* We want to remove dying MEMs that don't refer to DECL. */
> if (GET_CODE (loc->loc) == MEM
> && (MEM_EXPR (loc->loc) != decl
> - || INT_MEM_OFFSET (loc->loc) != 0)
> + || int_mem_offset (loc->loc) != 0)
> && mem_dies_at_call (loc->loc))
> break;
> /* We want to move here MEMs that do refer to DECL. */
> @@ -4727,7 +4734,7 @@ dataflow_set_preserve_mem_locs (variable
>
> if (GET_CODE (loc->loc) != MEM
> || (MEM_EXPR (loc->loc) == decl
> - && INT_MEM_OFFSET (loc->loc) == 0)
> + && int_mem_offset (loc->loc) == 0)
> || !mem_dies_at_call (loc->loc))
> {
> if (old_loc != loc->loc && emit_notes)
> @@ -5254,7 +5261,7 @@ same_variable_part_p (rtx loc, tree expr
> else if (MEM_P (loc))
> {
> expr2 = MEM_EXPR (loc);
> - offset2 = INT_MEM_OFFSET (loc);
> + offset2 = int_mem_offset (loc);
> }
> else
> return false;
> @@ -5522,7 +5529,7 @@ use_type (rtx loc, struct count_use_info
> return MO_CLOBBER;
> else if (target_for_debug_bind (var_debug_decl (expr)))
> return MO_CLOBBER;
> - else if (track_loc_p (loc, expr, INT_MEM_OFFSET (loc),
> + else if (track_loc_p (loc, expr, int_mem_offset (loc),
> false, modep, NULL)
> /* Multi-part variables shouldn't refer to one-part
> variable names such as VALUEs (never happens) or
> @@ -6017,7 +6024,7 @@ add_stores (rtx loc, const_rtx expr, voi
> rtx xexpr = gen_rtx_SET (loc, src);
> if (same_variable_part_p (SET_SRC (xexpr),
> MEM_EXPR (loc),
> - INT_MEM_OFFSET (loc)))
> + int_mem_offset (loc)))
> mo.type = MO_COPY;
> else
> mo.type = MO_SET;
> @@ -9579,7 +9586,7 @@ vt_get_decl_and_offset (rtx rtl, tree *d
> if (MEM_ATTRS (rtl))
> {
> *declp = MEM_EXPR (rtl);
> - *offsetp = INT_MEM_OFFSET (rtl);
> + *offsetp = int_mem_offset (rtl);
> return true;
> }
> }
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c
2017-10-23 11:29 ` [18/nn] Use (CONST_VECTOR|GET_MODE)_NUNITS in simplify-rtx.c Richard Sandiford
@ 2017-10-26 12:13 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:13 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:28 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch avoids some calculations of the form:
>
> GET_MODE_SIZE (vector_mode) / GET_MODE_SIZE (element_mode)
>
> in simplify-rtx.c. If we're dealing with CONST_VECTORs, it's better
> to use CONST_VECTOR_NUNITS, since that remains constant even after the
> SVE patches. In other cases we can get the number from GET_MODE_NUNITS.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * simplify-rtx.c (simplify_const_unary_operation): Use GET_MODE_NUNITS
> and CONST_VECTOR_NUNITS instead of computing the number of units from
> the byte sizes of the vector and element.
> (simplify_binary_operation_1): Likewise.
> (simplify_const_binary_operation): Likewise.
> (simplify_ternary_operation): Likewise.
>
> Index: gcc/simplify-rtx.c
> ===================================================================
> --- gcc/simplify-rtx.c 2017-10-23 11:47:11.277288162 +0100
> +++ gcc/simplify-rtx.c 2017-10-23 11:47:32.868935554 +0100
> @@ -1752,18 +1752,12 @@ simplify_const_unary_operation (enum rtx
> return gen_const_vec_duplicate (mode, op);
> if (GET_CODE (op) == CONST_VECTOR)
> {
> - int elt_size = GET_MODE_UNIT_SIZE (mode);
> - unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> - rtvec v = rtvec_alloc (n_elts);
> - unsigned int i;
> -
> - machine_mode inmode = GET_MODE (op);
> - int in_elt_size = GET_MODE_UNIT_SIZE (inmode);
> - unsigned in_n_elts = (GET_MODE_SIZE (inmode) / in_elt_size);
> -
> + unsigned int n_elts = GET_MODE_NUNITS (mode);
> + unsigned int in_n_elts = CONST_VECTOR_NUNITS (op);
> gcc_assert (in_n_elts < n_elts);
> gcc_assert ((n_elts % in_n_elts) == 0);
> - for (i = 0; i < n_elts; i++)
> + rtvec v = rtvec_alloc (n_elts);
> + for (unsigned i = 0; i < n_elts; i++)
> RTVEC_ELT (v, i) = CONST_VECTOR_ELT (op, i % in_n_elts);
> return gen_rtx_CONST_VECTOR (mode, v);
> }
> @@ -3608,9 +3602,7 @@ simplify_binary_operation_1 (enum rtx_co
> rtx op0 = XEXP (trueop0, 0);
> rtx op1 = XEXP (trueop0, 1);
>
> - machine_mode opmode = GET_MODE (op0);
> - int elt_size = GET_MODE_UNIT_SIZE (opmode);
> - int n_elts = GET_MODE_SIZE (opmode) / elt_size;
> + int n_elts = GET_MODE_NUNITS (GET_MODE (op0));
>
> int i = INTVAL (XVECEXP (trueop1, 0, 0));
> int elem;
> @@ -3637,21 +3629,8 @@ simplify_binary_operation_1 (enum rtx_co
> mode01 = GET_MODE (op01);
>
> /* Find out number of elements of each operand. */
> - if (VECTOR_MODE_P (mode00))
> - {
> - elt_size = GET_MODE_UNIT_SIZE (mode00);
> - n_elts00 = GET_MODE_SIZE (mode00) / elt_size;
> - }
> - else
> - n_elts00 = 1;
> -
> - if (VECTOR_MODE_P (mode01))
> - {
> - elt_size = GET_MODE_UNIT_SIZE (mode01);
> - n_elts01 = GET_MODE_SIZE (mode01) / elt_size;
> - }
> - else
> - n_elts01 = 1;
> + n_elts00 = GET_MODE_NUNITS (mode00);
> + n_elts01 = GET_MODE_NUNITS (mode01);
>
> gcc_assert (n_elts == n_elts00 + n_elts01);
>
> @@ -3771,9 +3750,8 @@ simplify_binary_operation_1 (enum rtx_co
> rtx subop1 = XEXP (trueop0, 1);
> machine_mode mode0 = GET_MODE (subop0);
> machine_mode mode1 = GET_MODE (subop1);
> - int li = GET_MODE_UNIT_SIZE (mode0);
> - int l0 = GET_MODE_SIZE (mode0) / li;
> - int l1 = GET_MODE_SIZE (mode1) / li;
> + int l0 = GET_MODE_NUNITS (mode0);
> + int l1 = GET_MODE_NUNITS (mode1);
> int i0 = INTVAL (XVECEXP (trueop1, 0, 0));
> if (i0 == 0 && !side_effects_p (op1) && mode == mode0)
> {
> @@ -3931,14 +3909,10 @@ simplify_binary_operation_1 (enum rtx_co
> || CONST_SCALAR_INT_P (trueop1)
> || CONST_DOUBLE_AS_FLOAT_P (trueop1)))
> {
> - int elt_size = GET_MODE_UNIT_SIZE (mode);
> - unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> + unsigned n_elts = GET_MODE_NUNITS (mode);
> + unsigned in_n_elts = GET_MODE_NUNITS (op0_mode);
> rtvec v = rtvec_alloc (n_elts);
> unsigned int i;
> - unsigned in_n_elts = 1;
> -
> - if (VECTOR_MODE_P (op0_mode))
> - in_n_elts = (GET_MODE_SIZE (op0_mode) / elt_size);
> for (i = 0; i < n_elts; i++)
> {
> if (i < in_n_elts)
> @@ -4026,16 +4000,12 @@ simplify_const_binary_operation (enum rt
> && GET_CODE (op0) == CONST_VECTOR
> && GET_CODE (op1) == CONST_VECTOR)
> {
> - unsigned n_elts = GET_MODE_NUNITS (mode);
> - machine_mode op0mode = GET_MODE (op0);
> - unsigned op0_n_elts = GET_MODE_NUNITS (op0mode);
> - machine_mode op1mode = GET_MODE (op1);
> - unsigned op1_n_elts = GET_MODE_NUNITS (op1mode);
> + unsigned int n_elts = CONST_VECTOR_NUNITS (op0);
> + gcc_assert (n_elts == (unsigned int) CONST_VECTOR_NUNITS (op1));
> + gcc_assert (n_elts == GET_MODE_NUNITS (mode));
> rtvec v = rtvec_alloc (n_elts);
> unsigned int i;
>
> - gcc_assert (op0_n_elts == n_elts);
> - gcc_assert (op1_n_elts == n_elts);
> for (i = 0; i < n_elts; i++)
> {
> rtx x = simplify_binary_operation (code, GET_MODE_INNER (mode),
> @@ -5712,8 +5682,7 @@ simplify_ternary_operation (enum rtx_cod
> trueop2 = avoid_constant_pool_reference (op2);
> if (CONST_INT_P (trueop2))
> {
> - int elt_size = GET_MODE_UNIT_SIZE (mode);
> - unsigned n_elts = (GET_MODE_SIZE (mode) / elt_size);
> + unsigned n_elts = GET_MODE_NUNITS (mode);
> unsigned HOST_WIDE_INT sel = UINTVAL (trueop2);
> unsigned HOST_WIDE_INT mask;
> if (n_elts == HOST_BITS_PER_WIDE_INT)
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [19/nn] Don't treat zero-sized ranges as overlapping
2017-10-23 11:29 ` [19/nn] Don't treat zero-sized ranges as overlapping Richard Sandiford
@ 2017-10-26 12:14 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:14 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:29 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Most GCC ranges seem to be represented as an offset and a size (rather
> than a start and inclusive end or start and exclusive end). The usual
> test for whether X is in a range is of course:
>
> x >= start && x < start + size
> or:
> x >= start && x - start < size
>
> which means that an empty range of size 0 contains nothing. But other
> range tests aren't as obvious.
>
> The usual test for whether one range is contained within another
> range is:
>
> start1 >= start2 && start1 + size1 <= start2 + size2
>
> while the test for whether two ranges overlap (from ranges_overlap_p) is:
>
> (start1 >= start2 && start1 < start2 + size2)
> || (start2 >= start1 && start2 < start1 + size1)
>
> i.e. the ranges overlap if one range contains the start of the other
> range. This leads to strange results like:
>
> (start X, size 0) is a subrange of (start X, size 0) but
> (start X, size 0) does not overlap (start X, size 0)
>
> Similarly:
>
> (start 4, size 0) is a subrange of (start 2, size 2) but
> (start 4, size 0) does not overlap (start 2, size 2)
>
> It seems like "X is a subrange of Y" should imply "X overlaps Y".
>
> This becomes harder to ignore with the runtime sizes and offsets
> added for SVE. The most obvious fix seemed to be to say that
> an empty range does not overlap anything, and is therefore not
> a subrange of anything.
>
> Using the new definition of subranges didn't seem to cause any
> codegen differences in the testsuite. But there was one change
> with the new definition of overlapping ranges. strncpy-chk.c has:
>
> memset (dst, 0, sizeof (dst));
> if (strncpy (dst, src, 0) != dst || strcmp (dst, ""))
> abort();
>
> The strncpy is detected as a zero-size write, and so with the new
> definition of overlapping ranges, we treat the strncpy as having
> no effect on the strcmp (which is true). The reaching definition
> is the memset instead.
>
> This patch makes ranges_overlap_p return false for zero-sized
> ranges, even if the other range has an unknown size.
Ok.
Thanks,
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>
> gcc/
> * tree-ssa-alias.h (ranges_overlap_p): Return false if either
> range is known to be empty.
>
> Index: gcc/tree-ssa-alias.h
> ===================================================================
> --- gcc/tree-ssa-alias.h 2017-03-28 16:19:22.000000000 +0100
> +++ gcc/tree-ssa-alias.h 2017-10-23 11:47:38.181155696 +0100
> @@ -171,6 +171,8 @@ ranges_overlap_p (HOST_WIDE_INT pos1,
> HOST_WIDE_INT pos2,
> unsigned HOST_WIDE_INT size2)
> {
> + if (size1 == 0 || size2 == 0)
> + return false;
> if (pos1 >= pos2
> && (size2 == (unsigned HOST_WIDE_INT)-1
> || pos1 < (pos2 + (HOST_WIDE_INT) size2)))
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [22/nn] Make dse.c use offset/width instead of start/end
2017-10-23 11:45 ` [22/nn] Make dse.c use offset/width instead of start/end Richard Sandiford
@ 2017-10-26 12:18 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:18 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> store_info and read_info_type in dse.c represented the ranges as
> start/end, but a lot of the internal code used offset/width instead.
> Using offset/width throughout fits better with the poly_int.h
> range-checking functions.
Ok.
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * dse.c (store_info, read_info_type): Replace begin and end with
> offset and width.
> (print_range): New function.
> (set_all_positions_unneeded, any_positions_needed_p)
> (check_mem_read_rtx, scan_stores, scan_reads, dse_step5): Update
> accordingly.
> (record_store): Likewise. Optimize the case in which all positions
> are unneeded.
> (get_stored_val): Replace read_begin and read_end with read_offset
> and read_width.
> (replace_read): Update call accordingly.
>
> Index: gcc/dse.c
> ===================================================================
> --- gcc/dse.c 2017-10-23 11:47:11.273428262 +0100
> +++ gcc/dse.c 2017-10-23 11:47:48.294155952 +0100
> @@ -243,9 +243,12 @@ struct store_info
> /* Canonized MEM address for use by canon_true_dependence. */
> rtx mem_addr;
>
> - /* The offset of the first and byte before the last byte associated
> - with the operation. */
> - HOST_WIDE_INT begin, end;
> + /* The offset of the first byte associated with the operation. */
> + HOST_WIDE_INT offset;
> +
> + /* The number of bytes covered by the operation. This is always exact
> + and known (rather than -1). */
> + HOST_WIDE_INT width;
>
> union
> {
> @@ -261,7 +264,7 @@ struct store_info
> bitmap bmap;
>
> /* Number of set bits (i.e. unneeded bytes) in BITMAP. If it is
> - equal to END - BEGIN, the whole store is unused. */
> + equal to WIDTH, the whole store is unused. */
> int count;
> } large;
> } positions_needed;
> @@ -304,10 +307,11 @@ struct read_info_type
> /* The id of the mem group of the base address. */
> int group_id;
>
> - /* The offset of the first and byte after the last byte associated
> - with the operation. If begin == end == 0, the read did not have
> - a constant offset. */
> - int begin, end;
> + /* The offset of the first byte associated with the operation. */
> + HOST_WIDE_INT offset;
> +
> + /* The number of bytes covered by the operation, or -1 if not known. */
> + HOST_WIDE_INT width;
>
> /* The mem being read. */
> rtx mem;
> @@ -586,6 +590,18 @@ static deferred_change *deferred_change_
>
> /* The number of bits used in the global bitmaps. */
> static unsigned int current_position;
> +
> +/* Print offset range [OFFSET, OFFSET + WIDTH) to FILE. */
> +
> +static void
> +print_range (FILE *file, poly_int64 offset, poly_int64 width)
> +{
> + fprintf (file, "[");
> + print_dec (offset, file, SIGNED);
> + fprintf (file, "..");
> + print_dec (offset + width, file, SIGNED);
> + fprintf (file, ")");
> +}
>
> /*----------------------------------------------------------------------------
> Zeroth step.
> @@ -1212,10 +1228,9 @@ set_all_positions_unneeded (store_info *
> {
> if (__builtin_expect (s_info->is_large, false))
> {
> - int pos, end = s_info->end - s_info->begin;
> - for (pos = 0; pos < end; pos++)
> - bitmap_set_bit (s_info->positions_needed.large.bmap, pos);
> - s_info->positions_needed.large.count = end;
> + bitmap_set_range (s_info->positions_needed.large.bmap,
> + 0, s_info->width);
> + s_info->positions_needed.large.count = s_info->width;
> }
> else
> s_info->positions_needed.small_bitmask = HOST_WIDE_INT_0U;
> @@ -1227,8 +1242,7 @@ set_all_positions_unneeded (store_info *
> any_positions_needed_p (store_info *s_info)
> {
> if (__builtin_expect (s_info->is_large, false))
> - return (s_info->positions_needed.large.count
> - < s_info->end - s_info->begin);
> + return s_info->positions_needed.large.count < s_info->width;
> else
> return (s_info->positions_needed.small_bitmask != HOST_WIDE_INT_0U);
> }
> @@ -1355,8 +1369,12 @@ record_store (rtx body, bb_info_t bb_inf
> set_usage_bits (group, offset, width, expr);
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, " processing const base store gid=%d[%d..%d)\n",
> - group_id, (int)offset, (int)(offset+width));
> + {
> + fprintf (dump_file, " processing const base store gid=%d",
> + group_id);
> + print_range (dump_file, offset, width);
> + fprintf (dump_file, "\n");
> + }
> }
> else
> {
> @@ -1368,8 +1386,11 @@ record_store (rtx body, bb_info_t bb_inf
> group_id = -1;
>
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, " processing cselib store [%d..%d)\n",
> - (int)offset, (int)(offset+width));
> + {
> + fprintf (dump_file, " processing cselib store ");
> + print_range (dump_file, offset, width);
> + fprintf (dump_file, "\n");
> + }
> }
>
> const_rhs = rhs = NULL_RTX;
> @@ -1435,18 +1456,21 @@ record_store (rtx body, bb_info_t bb_inf
> {
> HOST_WIDE_INT i;
> if (dump_file && (dump_flags & TDF_DETAILS))
> - fprintf (dump_file, " trying store in insn=%d gid=%d[%d..%d)\n",
> - INSN_UID (ptr->insn), s_info->group_id,
> - (int)s_info->begin, (int)s_info->end);
> + {
> + fprintf (dump_file, " trying store in insn=%d gid=%d",
> + INSN_UID (ptr->insn), s_info->group_id);
> + print_range (dump_file, s_info->offset, s_info->width);
> + fprintf (dump_file, "\n");
> + }
>
> /* Even if PTR won't be eliminated as unneeded, if both
> PTR and this insn store the same constant value, we might
> eliminate this insn instead. */
> if (s_info->const_rhs
> && const_rhs
> - && offset >= s_info->begin
> - && offset + width <= s_info->end
> - && all_positions_needed_p (s_info, offset - s_info->begin,
> + && known_subrange_p (offset, width,
> + s_info->offset, s_info->width)
> + && all_positions_needed_p (s_info, offset - s_info->offset,
> width))
> {
> if (GET_MODE (mem) == BLKmode)
> @@ -1462,8 +1486,7 @@ record_store (rtx body, bb_info_t bb_inf
> {
> rtx val;
> start_sequence ();
> - val = get_stored_val (s_info, GET_MODE (mem),
> - offset, offset + width,
> + val = get_stored_val (s_info, GET_MODE (mem), offset, width,
> BLOCK_FOR_INSN (insn_info->insn),
> true);
> if (get_insns () != NULL)
> @@ -1474,10 +1497,18 @@ record_store (rtx body, bb_info_t bb_inf
> }
> }
>
> - for (i = MAX (offset, s_info->begin);
> - i < offset + width && i < s_info->end;
> - i++)
> - set_position_unneeded (s_info, i - s_info->begin);
> + if (known_subrange_p (s_info->offset, s_info->width, offset, width))
> + /* The new store touches every byte that S_INFO does. */
> + set_all_positions_unneeded (s_info);
> + else
> + {
> + HOST_WIDE_INT begin_unneeded = offset - s_info->offset;
> + HOST_WIDE_INT end_unneeded = begin_unneeded + width;
> + begin_unneeded = MAX (begin_unneeded, 0);
> + end_unneeded = MIN (end_unneeded, s_info->width);
> + for (i = begin_unneeded; i < end_unneeded; ++i)
> + set_position_unneeded (s_info, i);
> + }
> }
> else if (s_info->rhs)
> /* Need to see if it is possible for this store to overwrite
> @@ -1535,8 +1566,8 @@ record_store (rtx body, bb_info_t bb_inf
> store_info->positions_needed.small_bitmask = lowpart_bitmask (width);
> }
> store_info->group_id = group_id;
> - store_info->begin = offset;
> - store_info->end = offset + width;
> + store_info->offset = offset;
> + store_info->width = width;
> store_info->is_set = GET_CODE (body) == SET;
> store_info->rhs = rhs;
> store_info->const_rhs = const_rhs;
> @@ -1700,39 +1731,38 @@ look_for_hardregs (rtx x, const_rtx pat
> }
>
> /* Helper function for replace_read and record_store.
> - Attempt to return a value stored in STORE_INFO, from READ_BEGIN
> - to one before READ_END bytes read in READ_MODE. Return NULL
> + Attempt to return a value of mode READ_MODE stored in STORE_INFO,
> + consisting of READ_WIDTH bytes starting from READ_OFFSET. Return NULL
> if not successful. If REQUIRE_CST is true, return always constant. */
>
> static rtx
> get_stored_val (store_info *store_info, machine_mode read_mode,
> - HOST_WIDE_INT read_begin, HOST_WIDE_INT read_end,
> + HOST_WIDE_INT read_offset, HOST_WIDE_INT read_width,
> basic_block bb, bool require_cst)
> {
> machine_mode store_mode = GET_MODE (store_info->mem);
> - int shift;
> - int access_size; /* In bytes. */
> + HOST_WIDE_INT gap;
> rtx read_reg;
>
> /* To get here the read is within the boundaries of the write so
> shift will never be negative. Start out with the shift being in
> bytes. */
> if (store_mode == BLKmode)
> - shift = 0;
> + gap = 0;
> else if (BYTES_BIG_ENDIAN)
> - shift = store_info->end - read_end;
> + gap = ((store_info->offset + store_info->width)
> + - (read_offset + read_width));
> else
> - shift = read_begin - store_info->begin;
> -
> - access_size = shift + GET_MODE_SIZE (read_mode);
> -
> - /* From now on it is bits. */
> - shift *= BITS_PER_UNIT;
> + gap = read_offset - store_info->offset;
>
> - if (shift)
> - read_reg = find_shift_sequence (access_size, store_info, read_mode, shift,
> - optimize_bb_for_speed_p (bb),
> - require_cst);
> + if (gap != 0)
> + {
> + HOST_WIDE_INT shift = gap * BITS_PER_UNIT;
> + HOST_WIDE_INT access_size = GET_MODE_SIZE (read_mode) + gap;
> + read_reg = find_shift_sequence (access_size, store_info, read_mode,
> + shift, optimize_bb_for_speed_p (bb),
> + require_cst);
> + }
> else if (store_mode == BLKmode)
> {
> /* The store is a memset (addr, const_val, const_size). */
> @@ -1835,7 +1865,7 @@ replace_read (store_info *store_info, in
> start_sequence ();
> bb = BLOCK_FOR_INSN (read_insn->insn);
> read_reg = get_stored_val (store_info,
> - read_mode, read_info->begin, read_info->end,
> + read_mode, read_info->offset, read_info->width,
> bb, false);
> if (read_reg == NULL_RTX)
> {
> @@ -1986,8 +2016,8 @@ check_mem_read_rtx (rtx *loc, bb_info_t
> read_info = read_info_type_pool.allocate ();
> read_info->group_id = group_id;
> read_info->mem = mem;
> - read_info->begin = offset;
> - read_info->end = offset + width;
> + read_info->offset = offset;
> + read_info->width = width;
> read_info->next = insn_info->read_rec;
> insn_info->read_rec = read_info;
> if (group_id < 0)
> @@ -2013,8 +2043,11 @@ check_mem_read_rtx (rtx *loc, bb_info_t
> fprintf (dump_file, " processing const load gid=%d[BLK]\n",
> group_id);
> else
> - fprintf (dump_file, " processing const load gid=%d[%d..%d)\n",
> - group_id, (int)offset, (int)(offset+width));
> + {
> + fprintf (dump_file, " processing const load gid=%d", group_id);
> + print_range (dump_file, offset, width);
> + fprintf (dump_file, "\n");
> + }
> }
>
> while (i_ptr)
> @@ -2052,19 +2085,19 @@ check_mem_read_rtx (rtx *loc, bb_info_t
> else
> {
> if (store_info->rhs
> - && offset >= store_info->begin
> - && offset + width <= store_info->end
> + && known_subrange_p (offset, width, store_info->offset,
> + store_info->width)
> && all_positions_needed_p (store_info,
> - offset - store_info->begin,
> + offset - store_info->offset,
> width)
> && replace_read (store_info, i_ptr, read_info,
> insn_info, loc, bb_info->regs_live))
> return;
>
> /* The bases are the same, just see if the offsets
> - overlap. */
> - if ((offset < store_info->end)
> - && (offset + width > store_info->begin))
> + could overlap. */
> + if (ranges_may_overlap_p (offset, width, store_info->offset,
> + store_info->width))
> remove = true;
> }
> }
> @@ -2119,11 +2152,10 @@ check_mem_read_rtx (rtx *loc, bb_info_t
> if (store_info->rhs
> && store_info->group_id == -1
> && store_info->cse_base == base
> - && width != -1
> - && offset >= store_info->begin
> - && offset + width <= store_info->end
> + && known_subrange_p (offset, width, store_info->offset,
> + store_info->width)
> && all_positions_needed_p (store_info,
> - offset - store_info->begin, width)
> + offset - store_info->offset, width)
> && replace_read (store_info, i_ptr, read_info, insn_info, loc,
> bb_info->regs_live))
> return;
> @@ -2775,16 +2807,19 @@ scan_stores (store_info *store_info, bit
> group_info *group_info
> = rtx_group_vec[store_info->group_id];
> if (group_info->process_globally)
> - for (i = store_info->begin; i < store_info->end; i++)
> - {
> - int index = get_bitmap_index (group_info, i);
> - if (index != 0)
> - {
> - bitmap_set_bit (gen, index);
> - if (kill)
> - bitmap_clear_bit (kill, index);
> - }
> - }
> + {
> + HOST_WIDE_INT end = store_info->offset + store_info->width;
> + for (i = store_info->offset; i < end; i++)
> + {
> + int index = get_bitmap_index (group_info, i);
> + if (index != 0)
> + {
> + bitmap_set_bit (gen, index);
> + if (kill)
> + bitmap_clear_bit (kill, index);
> + }
> + }
> + }
> store_info = store_info->next;
> }
> }
> @@ -2834,9 +2869,9 @@ scan_reads (insn_info_t insn_info, bitma
> {
> if (i == read_info->group_id)
> {
> - if (read_info->begin > read_info->end)
> + if (!known_size_p (read_info->width))
> {
> - /* Begin > end for block mode reads. */
> + /* Handle block mode reads. */
> if (kill)
> bitmap_ior_into (kill, group->group_kill);
> bitmap_and_compl_into (gen, group->group_kill);
> @@ -2846,7 +2881,8 @@ scan_reads (insn_info_t insn_info, bitma
> /* The groups are the same, just process the
> offsets. */
> HOST_WIDE_INT j;
> - for (j = read_info->begin; j < read_info->end; j++)
> + HOST_WIDE_INT end = read_info->offset + read_info->width;
> + for (j = read_info->offset; j < end; j++)
> {
> int index = get_bitmap_index (group, j);
> if (index != 0)
> @@ -3265,7 +3301,8 @@ dse_step5 (void)
> HOST_WIDE_INT i;
> group_info *group_info = rtx_group_vec[store_info->group_id];
>
> - for (i = store_info->begin; i < store_info->end; i++)
> + HOST_WIDE_INT end = store_info->offset + store_info->width;
> + for (i = store_info->offset; i < end; i++)
> {
> int index = get_bitmap_index (group_info, i);
>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 11:59 ` Richard Biener
@ 2017-10-26 12:18 ` Richard Sandiford
2017-10-26 12:46 ` Richard Biener
0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-26 12:18 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds a POD version of fixed_size_mode. The only current use
>> is for storing the __builtin_apply and __builtin_result register modes,
>> which were made fixed_size_modes by the previous patch.
>
> Bah - can we update our host compiler to C++11/14 please ...?
> (maybe requiring that build with GCC 4.8 as host compiler works,
> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
That'd be great :-) It would avoid all the poly_int_pod stuff too,
and allow some clean-up of wide-int.h.
Thanks for the reviews,
Richard
>
> Ok.
>
> Thanks,
> Richard.
>
>>
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * coretypes.h (fixed_size_mode): Declare.
>> (fixed_size_mode_pod): New typedef.
>> * builtins.h (target_builtins::x_apply_args_mode)
>> (target_builtins::x_apply_result_mode): Change type to
>> fixed_size_mode_pod.
>> * builtins.c (apply_args_size, apply_result_size, result_vector)
>> (expand_builtin_apply_args_1, expand_builtin_apply)
>> (expand_builtin_return): Update accordingly.
>>
>> Index: gcc/coretypes.h
>> ===================================================================
>> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
>> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>> class scalar_int_mode;
>> class scalar_float_mode;
>> class complex_mode;
>> +class fixed_size_mode;
>> template<typename> class opt_mode;
>> typedef opt_mode<scalar_mode> opt_scalar_mode;
>> typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
>> @@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
>> template<typename> class pod_mode;
>> typedef pod_mode<scalar_mode> scalar_mode_pod;
>> typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
>> +typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
>>
>> /* Subclasses of rtx_def, using indentation to show the class
>> hierarchy, along with the relevant invariant.
>> Index: gcc/builtins.h
>> ===================================================================
>> --- gcc/builtins.h 2017-08-30 12:18:46.602740973 +0100
>> +++ gcc/builtins.h 2017-10-23 11:42:57.592545063 +0100
>> @@ -29,14 +29,14 @@ struct target_builtins {
>> the register is not used for calling a function. If the machine
>> has register windows, this gives only the outbound registers.
>> INCOMING_REGNO gives the corresponding inbound register. */
>> - machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>> + fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>
>> /* For each register that may be used for returning values, this gives
>> a mode used to copy the register's value. VOIDmode indicates the
>> register is not used for returning values. If the machine has
>> register windows, this gives only the outbound registers.
>> INCOMING_REGNO gives the corresponding inbound register. */
>> - machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>> + fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>> };
>>
>> extern struct target_builtins default_target_builtins;
>> Index: gcc/builtins.c
>> ===================================================================
>> --- gcc/builtins.c 2017-10-23 11:41:23.140260335 +0100
>> +++ gcc/builtins.c 2017-10-23 11:42:57.592545063 +0100
>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>> static int size = -1;
>> int align;
>> unsigned int regno;
>> - machine_mode mode;
>>
>> /* The values computed by this function never change. */
>> if (size < 0)
>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>> for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> if (FUNCTION_ARG_REGNO_P (regno))
>> {
>> - mode = targetm.calls.get_raw_arg_mode (regno);
>> + fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>
>> gcc_assert (mode != VOIDmode);
>>
>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>> }
>> else
>> {
>> - apply_args_mode[regno] = VOIDmode;
>> + apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>> }
>> }
>> return size;
>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>> {
>> static int size = -1;
>> int align, regno;
>> - machine_mode mode;
>>
>> /* The values computed by this function never change. */
>> if (size < 0)
>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>> for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>> if (targetm.calls.function_value_regno_p (regno))
>> {
>> - mode = targetm.calls.get_raw_result_mode (regno);
>> + fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>>
>> gcc_assert (mode != VOIDmode);
>>
>> @@ -1421,7 +1419,7 @@ apply_result_size (void)
>> apply_result_mode[regno] = mode;
>> }
>> else
>> - apply_result_mode[regno] = VOIDmode;
>> + apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>
>> /* Allow targets that use untyped_call and untyped_return to override
>> the size so that machine-specific information can be stored here. */
>> @@ -1440,7 +1438,7 @@ apply_result_size (void)
>> result_vector (int savep, rtx result)
>> {
>> int regno, size, align, nelts;
>> - machine_mode mode;
>> + fixed_size_mode mode;
>> rtx reg, mem;
>> rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
>>
>> @@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
>> {
>> rtx registers, tem;
>> int size, align, regno;
>> - machine_mode mode;
>> + fixed_size_mode mode;
>> rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
>>
>> /* Create a block where the arg-pointer, structure value address,
>> @@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
>> expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
>> {
>> int size, align, regno;
>> - machine_mode mode;
>> + fixed_size_mode mode;
>> rtx incoming_args, result, reg, dest, src;
>> rtx_call_insn *call_insn;
>> rtx old_stack_level = 0;
>> @@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
>> expand_builtin_return (rtx result)
>> {
>> int size, align, regno;
>> - machine_mode mode;
>> + fixed_size_mode mode;
>> rtx reg;
>> rtx_insn *call_fusage = 0;
>>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [21/nn] Minor vn_reference_lookup_3 tweak
2017-10-23 11:31 ` [21/nn] Minor vn_reference_lookup_3 tweak Richard Sandiford
@ 2017-10-26 12:18 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:18 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:30 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> The repeated checks for MEM_REF made this code hard to convert to
> poly_ints as-is. Hopefully the new structure also makes it clearer
> at a glance what the two cases are.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * tree-ssa-sccvn.c (vn_reference_lookup_3): Avoid repeated
> checks for MEM_REF.
>
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c 2017-10-23 11:47:03.852769480 +0100
> +++ gcc/tree-ssa-sccvn.c 2017-10-23 11:47:44.596155858 +0100
> @@ -2234,6 +2234,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
> || offset % BITS_PER_UNIT != 0
> || ref->size % BITS_PER_UNIT != 0)
> return (void *)-1;
> + at = offset / BITS_PER_UNIT;
can you move this just
> /* Extract a pointer base and an offset for the destination. */
> lhs = gimple_call_arg (def_stmt, 0);
> @@ -2301,19 +2302,18 @@ vn_reference_lookup_3 (ao_ref *ref, tree
> copy_size = tree_to_uhwi (gimple_call_arg (def_stmt, 2));
>
> /* The bases of the destination and the references have to agree. */
here? Ok with that change.
Richard.
> - if ((TREE_CODE (base) != MEM_REF
> - && !DECL_P (base))
> - || (TREE_CODE (base) == MEM_REF
> - && (TREE_OPERAND (base, 0) != lhs
> - || !tree_fits_uhwi_p (TREE_OPERAND (base, 1))))
> - || (DECL_P (base)
> - && (TREE_CODE (lhs) != ADDR_EXPR
> - || TREE_OPERAND (lhs, 0) != base)))
> + if (TREE_CODE (base) == MEM_REF)
> + {
> + if (TREE_OPERAND (base, 0) != lhs
> + || !tree_fits_uhwi_p (TREE_OPERAND (base, 1)))
> + return (void *) -1;
> + at += tree_to_uhwi (TREE_OPERAND (base, 1));
> + }
> + else if (!DECL_P (base)
> + || TREE_CODE (lhs) != ADDR_EXPR
> + || TREE_OPERAND (lhs, 0) != base)
> return (void *)-1;
>
> - at = offset / BITS_PER_UNIT;
> - if (TREE_CODE (base) == MEM_REF)
> - at += tree_to_uhwi (TREE_OPERAND (base, 1));
> /* If the access is completely outside of the memcpy destination
> area there is no aliasing. */
> if (lhs_offset >= at + maxsize / BITS_PER_UNIT
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
2017-10-23 11:22 ` [06/nn] Add VEC_SERIES_{CST,EXPR} " Richard Sandiford
@ 2017-10-26 12:26 ` Richard Biener
2017-10-26 12:43 ` Richard Biener
2017-12-15 0:34 ` Richard Sandiford
1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:26 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
> tree code equivalents of the VEC_SERIES rtx code. VEC_SERIES_EXPR
> is for non-constant inputs and is a normal tcc_binary. VEC_SERIES_CST
> is a tcc_constant.
>
> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
> vectors. This avoids the need to handle combinations of VECTOR_CST
> and VEC_SERIES_CST.
Similar to the other patch can you document and verify that VEC_SERIES_CST
is only used on variable length vectors?
Ok with that change.
Thanks,
Richard.
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
> * doc/md.texi (vec_series@var{m}): Document.
> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
> codes.
> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
> (build_vec_series_cst, build_vec_series): Declare.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
> (build_vec_series_cst, build_vec_series): New functions.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
> (fold_checksum_tree): Likewise.
> (vec_series_equivalent_p): New function.
> (const_binop): Use it. Fold VEC_SERIES_EXPRs of constants.
> * expmed.c (make_tree): Handle VEC_SERIES.
> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
> (expand_expr_real_1): Handle VEC_SERIES_CST.
> * optabs.def (vec_series_optab): New optab.
> * optabs.h (expand_vec_series_expr): Declare.
> * optabs.c (expand_vec_series_expr): New function.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
> * tree-vect-generic.c (expand_vector_operations_1): Check that
> the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100
> +++ gcc/doc/generic.texi 2017-10-23 11:42:34.910720660 +0100
> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
> @tindex COMPLEX_CST
> @tindex VECTOR_CST
> @tindex VEC_DUPLICATE_CST
> +@tindex VEC_SERIES_CST
> @tindex STRING_CST
> @findex TREE_STRING_LENGTH
> @findex TREE_STRING_POINTER
> @@ -1098,6 +1099,16 @@ instead. The scalar element value is gi
> @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
> element of a @code{VECTOR_CST}.
>
> +@item VEC_SERIES_CST
> +These nodes represent a vector constant in which element @var{i}
> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
> +constant @var{base} and @var{step}. The value of @var{base} is
> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
> +given by @code{VEC_SERIES_CST_STEP}.
> +
> +These nodes are restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
> @item STRING_CST
> These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
> returns the length of the string, as an @code{int}. The
> @@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
> @node Vectors
> @subsection Vectors
> @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
> @tindex VEC_LSHIFT_EXPR
> @tindex VEC_RSHIFT_EXPR
> @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1721,6 +1733,14 @@ a value from @code{enum annot_expr_kind}
> This node has a single operand and represents a vector in which every
> element is equal to that operand.
>
> +@item VEC_SERIES_EXPR
> +This node represents a vector formed from a scalar base and step,
> +given as the first and second operands respectively. Element @var{i}
> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
> +
> +This node is restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
> @item VEC_LSHIFT_EXPR
> @itemx VEC_RSHIFT_EXPR
> These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100
> +++ gcc/doc/md.texi 2017-10-23 11:42:34.911720660 +0100
> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>
> This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{vec_series@var{m}} instruction pattern
> +@item @samp{vec_series@var{m}}
> +Initialize vector output operand 0 so that element @var{i} is equal to
> +operand 1 plus @var{i} times operand 2. In other words, create a linear
> +series whose base value is operand 1 and whose step is operand 2.
> +
> +The vector output has mode @var{m} and the scalar inputs have the mode
> +appropriate for one element of @var{m}. This pattern is not used for
> +floating-point vectors, in order to avoid having to specify the
> +rounding behavior for @var{i} > 1.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
> @item @samp{vec_cmp@var{m}@var{n}}
> Output a vector comparison. Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def 2017-10-23 11:41:51.774917721 +0100
> +++ gcc/tree.def 2017-10-23 11:42:34.924720660 +0100
> @@ -308,6 +308,10 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
> VEC_DUPLICATE_CST_ELT. */
> DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which element i is equal to
> + VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP. */
> +DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
> +
> /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
> DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -541,6 +545,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
> /* Represents a vector in which every element is equal to operand 0. */
> DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>
> +/* Vector series created from a start (base) value and a step.
> +
> + A = VEC_SERIES_EXPR (B, C)
> +
> + means
> +
> + for (i = 0; i < N; i++)
> + A[i] = B + C * i; */
> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
> +
> /* Vector conditional expression. It is like COND_EXPR, but with
> vector operands.
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h 2017-10-23 11:41:51.775882341 +0100
> +++ gcc/tree.h 2017-10-23 11:42:34.925720660 +0100
> @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
> #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
> (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> - this means there was an overflow in folding. */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
> + or VEC_SERES_CST, this means there was an overflow in folding. */
>
> #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1034,6 +1034,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
> #define VEC_DUPLICATE_CST_ELT(NODE) \
> (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
>
> +/* In a VEC_SERIES_CST node. */
> +#define VEC_SERIES_CST_BASE(NODE) \
> + (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
> +#define VEC_SERIES_CST_STEP(NODE) \
> + (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
> +
> /* Define fields and accessors for some special-purpose tree nodes. */
>
> #define IDENTIFIER_LENGTH(NODE) \
> @@ -4030,9 +4036,11 @@ extern tree build_int_cstu (tree type, u
> extern tree build_int_cst_type (tree, HOST_WIDE_INT);
> extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
> extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
> +extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
> extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
> extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
> extern tree build_vector_from_val (tree, tree);
> +extern tree build_vec_series (tree, tree, tree);
> extern void recompute_constructor_flags (tree);
> extern void verify_constructor_flags (tree);
> extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c 2017-10-23 11:41:51.774917721 +0100
> +++ gcc/tree.c 2017-10-23 11:42:34.924720660 +0100
> @@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
> case COMPLEX_CST: return TS_COMPLEX;
> case VECTOR_CST: return TS_VECTOR;
> case VEC_DUPLICATE_CST: return TS_VECTOR;
> + case VEC_SERIES_CST: return TS_VECTOR;
> case STRING_CST: return TS_STRING;
> /* tcc_exceptional cases. */
> case ERROR_MARK: return TS_COMMON;
> @@ -818,6 +819,8 @@ tree_code_size (enum tree_code code)
> case COMPLEX_CST: return sizeof (struct tree_complex);
> case VECTOR_CST: return sizeof (struct tree_vector);
> case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
> + case VEC_SERIES_CST:
> + return sizeof (struct tree_vector) + sizeof (tree);
> case STRING_CST: gcc_unreachable ();
> default:
> return lang_hooks.tree_size (code);
> @@ -880,6 +883,9 @@ tree_size (const_tree node)
> case VEC_DUPLICATE_CST:
> return sizeof (struct tree_vector);
>
> + case VEC_SERIES_CST:
> + return sizeof (struct tree_vector) + sizeof (tree);
> +
> case STRING_CST:
> return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1711,6 +1717,31 @@ build_vec_duplicate_cst (tree type, tree
> return t;
> }
>
> +/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
> +
> + Note that this function is only suitable for callers that specifically
> + need a VEC_SERIES_CST node. Use build_vec_series to build a general
> + series vector from a general base and step. */
> +
> +tree
> +build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
> +{
> + int length = sizeof (struct tree_vector) + sizeof (tree);
> +
> + record_node_allocation_statistics (VEC_SERIES_CST, length);
> +
> + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> + TREE_SET_CODE (t, VEC_SERIES_CST);
> + TREE_TYPE (t) = type;
> + t->base.u.nelts = 2;
> + VEC_SERIES_CST_BASE (t) = base;
> + VEC_SERIES_CST_STEP (t) = step;
> + TREE_CONSTANT (t) = 1;
> +
> + return t;
> +}
> +
> /* Build a newly constructed VECTOR_CST node of length LEN. */
>
> tree
> @@ -1821,6 +1852,19 @@ build_vector_from_val (tree vectype, tre
> }
> }
>
> +/* Build a vector series of type TYPE in which element I has the value
> + BASE + I * STEP. */
> +
> +tree
> +build_vec_series (tree type, tree base, tree step)
> +{
> + if (integer_zerop (step))
> + return build_vector_from_val (type, base);
> + if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
> + return build_vec_series_cst (type, base, step);
> + return build2 (VEC_SERIES_EXPR, type, base, step);
> +}
> +
> /* Something has messed with the elements of CONSTRUCTOR C after it was built;
> calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */
>
> @@ -7136,6 +7180,10 @@ add_expr (const_tree t, inchash::hash &h
> case VEC_DUPLICATE_CST:
> inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
> return;
> + case VEC_SERIES_CST:
> + inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
> + inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
> + return;
> case SSA_NAME:
> /* We can just compare by pointer. */
> hstate.add_wide_int (SSA_NAME_VERSION (t));
> @@ -11150,6 +11198,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
> case FIXED_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> case BLOCK:
> case PLACEHOLDER_EXPR:
> @@ -12442,6 +12491,15 @@ drop_tree_overflow (tree t)
> if (TREE_OVERFLOW (*elt))
> *elt = drop_tree_overflow (*elt);
> }
> + if (TREE_CODE (t) == VEC_SERIES_CST)
> + {
> + tree *elt = &VEC_SERIES_CST_BASE (t);
> + if (TREE_OVERFLOW (*elt))
> + *elt = drop_tree_overflow (*elt);
> + elt = &VEC_SERIES_CST_STEP (t);
> + if (TREE_OVERFLOW (*elt))
> + *elt = drop_tree_overflow (*elt);
> + }
> return t;
> }
>
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100
> +++ gcc/cfgexpand.c 2017-10-23 11:42:34.909720660 +0100
> @@ -5051,6 +5051,8 @@ expand_debug_expr (tree exp)
> case VEC_PERM_EXPR:
> case VEC_DUPLICATE_CST:
> case VEC_DUPLICATE_EXPR:
> + case VEC_SERIES_CST:
> + case VEC_SERIES_EXPR:
> return NULL;
>
> /* Misc codes. */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100
> +++ gcc/tree-pretty-print.c 2017-10-23 11:42:34.921720660 +0100
> @@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
> pp_string (pp, ", ... }");
> break;
>
> + case VEC_SERIES_CST:
> + pp_string (pp, "{ ");
> + dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
> + pp_string (pp, ", +, ");
> + dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
> + pp_string (pp, "}");
> + break;
> +
> case FUNCTION_TYPE:
> case METHOD_TYPE:
> dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
> pp_string (pp, " > ");
> break;
>
> + case VEC_SERIES_EXPR:
> case VEC_WIDEN_MULT_HI_EXPR:
> case VEC_WIDEN_MULT_LO_EXPR:
> case VEC_WIDEN_MULT_EVEN_EXPR:
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100
> +++ gcc/dwarf2out.c 2017-10-23 11:42:34.913720660 +0100
> @@ -18863,6 +18863,7 @@ rtl_for_decl_init (tree init, tree type)
> {
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> break;
> case CONSTRUCTOR:
> if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100
> +++ gcc/gimple-expr.h 2017-10-23 11:42:34.916720660 +0100
> @@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100
> +++ gcc/gimplify.c 2017-10-23 11:42:34.917720660 +0100
> @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> /* Drop the overflow flag on constants, we do not want
> that in the GIMPLE IL. */
> if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100
> +++ gcc/graphite-scop-detection.c 2017-10-23 11:42:34.917720660 +0100
> @@ -1244,6 +1244,7 @@ scan_tree_for_params (sese_info_p s, tre
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> break;
>
> default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100
> +++ gcc/ipa-icf-gimple.c 2017-10-23 11:42:34.917720660 +0100
> @@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> case REAL_CST:
> {
> @@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> case REAL_CST:
> case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100
> +++ gcc/ipa-icf.c 2017-10-23 11:42:34.918720660 +0100
> @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> inchash::add_expr (exp, hstate);
> break;
> case CONSTRUCTOR:
> @@ -2034,6 +2035,11 @@ sem_variable::equals (tree t1, tree t2)
> case VEC_DUPLICATE_CST:
> return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
> VEC_DUPLICATE_CST_ELT (t2));
> + case VEC_SERIES_CST:
> + return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
> + VEC_SERIES_CST_BASE (t2))
> + && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
> + VEC_SERIES_CST_STEP (t2)));
> case ARRAY_REF:
> case ARRAY_RANGE_REF:
> {
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100
> +++ gcc/print-tree.c 2017-10-23 11:42:34.919720660 +0100
> @@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
> print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
> break;
>
> + case VEC_SERIES_CST:
> + print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
> + print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
> + break;
> +
> case COMPLEX_CST:
> print_node (file, "real", TREE_REALPART (node), indent + 4);
> print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
> +++ gcc/tree-ssa-loop.c 2017-10-23 11:42:34.921720660 +0100
> @@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
> case RESULT_DECL:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case COMPLEX_CST:
> case INTEGER_CST:
> case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100
> +++ gcc/tree-ssa-pre.c 2017-10-23 11:42:34.922720660 +0100
> @@ -2676,6 +2676,7 @@ create_component_ref_by_pieces_1 (basic_
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100
> +++ gcc/tree-ssa-sccvn.c 2017-10-23 11:42:34.922720660 +0100
> @@ -859,6 +859,7 @@ copy_reference_ops_from_ref (tree ref, v
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case REAL_CST:
> case FIXED_CST:
> case CONSTRUCTOR:
> @@ -1052,6 +1053,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case CONST_DECL:
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100
> +++ gcc/varasm.c 2017-10-23 11:42:34.927720660 +0100
> @@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
> return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
> + const_hash_1 (TREE_OPERAND (exp, 1)));
>
> + case VEC_SERIES_CST:
> + return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
> + + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
> +
> CASE_CONVERT:
> return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> @@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
> return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
> VEC_DUPLICATE_CST_ELT (t2));
>
> + case VEC_SERIES_CST:
> + return (compare_constant (VEC_SERIES_CST_BASE (t1),
> + VEC_SERIES_CST_BASE (t2))
> + && compare_constant (VEC_SERIES_CST_STEP (t1),
> + VEC_SERIES_CST_STEP (t2)));
> +
> case CONSTRUCTOR:
> {
> vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100
> +++ gcc/fold-const.c 2017-10-23 11:42:34.916720660 +0100
> @@ -421,6 +421,10 @@ negate_expr_p (tree t)
> case VEC_DUPLICATE_CST:
> return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
>
> + case VEC_SERIES_CST:
> + return (negate_expr_p (VEC_SERIES_CST_BASE (t))
> + && negate_expr_p (VEC_SERIES_CST_STEP (t)));
> +
> case COMPLEX_EXPR:
> return negate_expr_p (TREE_OPERAND (t, 0))
> && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
> return build_vector_from_val (type, sub);
> }
>
> + case VEC_SERIES_CST:
> + {
> + tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
> + if (!neg_base)
> + return NULL_TREE;
> + tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
> + if (!neg_step)
> + return NULL_TREE;
> + return build_vec_series (type, neg_base, neg_step);
> + }
> +
> case COMPLEX_EXPR:
> if (negate_expr_p (t))
> return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
> return int_const_binop_1 (code, arg1, arg2, 1);
> }
>
> +/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
> + and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
> + The step will be zero for VEC_DUPLICATE_CST. */
> +
> +static bool
> +vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
> +{
> + if (TREE_CODE (exp) == VEC_SERIES_CST)
> + {
> + *base_out = VEC_SERIES_CST_BASE (exp);
> + *step_out = VEC_SERIES_CST_STEP (exp);
> + return true;
> + }
> + if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
> + {
> + *base_out = VEC_DUPLICATE_CST_ELT (exp);
> + *step_out = build_zero_cst (TREE_TYPE (*base_out));
> + return true;
> + }
> + return false;
> +}
> +
> /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
> constant. We assume ARG1 and ARG2 have the same data type, or at least
> are the same kind of constant and the same machine mode. Return zero if
> @@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
> return build_vector_from_val (TREE_TYPE (arg1), sub);
> }
>
> + tree base1, step1, base2, step2;
> + if ((code == PLUS_EXPR || code == MINUS_EXPR)
> + && vec_series_equivalent_p (arg1, &base1, &step1)
> + && vec_series_equivalent_p (arg2, &base2, &step2))
> + {
> + tree new_base = const_binop (code, base1, base2);
> + if (!new_base)
> + return NULL_TREE;
> + tree new_step = const_binop (code, step1, step2);
> + if (!new_step)
> + return NULL_TREE;
> + return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
> + }
> +
> /* Shifts allow a scalar offset for a vector. */
> if (TREE_CODE (arg1) == VECTOR_CST
> && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
> result as argument put those cases that need it here. */
> switch (code)
> {
> + case VEC_SERIES_EXPR:
> + if (CONSTANT_CLASS_P (arg1)
> + && CONSTANT_CLASS_P (arg2))
> + return build_vec_series (type, arg1, arg2);
> + return NULL_TREE;
> +
> case COMPLEX_EXPR:
> if ((TREE_CODE (arg1) == REAL_CST
> && TREE_CODE (arg2) == REAL_CST)
> @@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
> return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
> VEC_DUPLICATE_CST_ELT (arg1), flags);
>
> + case VEC_SERIES_CST:
> + return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
> + VEC_SERIES_CST_BASE (arg1), flags)
> + && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
> + VEC_SERIES_CST_STEP (arg1), flags));
> +
> case COMPLEX_CST:
> return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
> flags)
> @@ -12050,6 +12113,10 @@ fold_checksum_tree (const_tree expr, str
> case VEC_DUPLICATE_CST:
> fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
> break;
> + case VEC_SERIES_CST:
> + fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
> + fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
> + break;
> default:
> break;
> }
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c 2017-10-23 11:41:39.186050437 +0100
> +++ gcc/expmed.c 2017-10-23 11:42:34.914720660 +0100
> @@ -5253,6 +5253,13 @@ make_tree (tree type, rtx x)
> tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
> return build_vector_from_val (type, elt_tree);
> }
> + if (GET_CODE (op) == VEC_SERIES)
> + {
> + tree itype = TREE_TYPE (type);
> + tree base_tree = make_tree (itype, XEXP (op, 0));
> + tree step_tree = make_tree (itype, XEXP (op, 1));
> + return build_vec_series (type, base_tree, step_tree);
> + }
> return make_tree (type, op);
> }
>
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c 2017-10-23 11:41:25.500318672 +0100
> +++ gcc/gimple-pretty-print.c 2017-10-23 11:42:34.916720660 +0100
> @@ -438,6 +438,7 @@ dump_binary_rhs (pretty_printer *buffer,
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> + case VEC_SERIES_EXPR:
> for (p = get_tree_code_name (code); *p; p++)
> pp_character (buffer, TOUPPER (*p));
> pp_string (buffer, " <");
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100
> +++ gcc/tree-inline.c 2017-10-23 11:42:34.921720660 +0100
> @@ -4003,6 +4003,7 @@ estimate_operator_cost (enum tree_code c
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> case VEC_DUPLICATE_EXPR:
> + case VEC_SERIES_EXPR:
>
> return 1;
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c 2017-10-23 11:41:51.764306890 +0100
> +++ gcc/expr.c 2017-10-23 11:42:34.915720660 +0100
> @@ -7704,7 +7704,7 @@ expand_operands (tree exp0, tree exp1, r
>
>
> /* Expand constant vector element ELT, which has mode MODE. This is used
> - for members of VECTOR_CST and VEC_DUPLICATE_CST. */
> + for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST. */
>
> static rtx
> const_vector_element (scalar_mode mode, const_tree elt)
> @@ -9587,6 +9587,10 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
> gcc_assert (target);
> return target;
>
> + case VEC_SERIES_EXPR:
> + expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
> + return expand_vec_series_expr (mode, op0, op1, target);
> +
> case BIT_INSERT_EXPR:
> {
> unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -10044,6 +10048,13 @@ expand_expr_real_1 (tree exp, rtx target
> VEC_DUPLICATE_CST_ELT (exp));
> return gen_const_vec_duplicate (mode, op0);
>
> + case VEC_SERIES_CST:
> + op0 = const_vector_element (GET_MODE_INNER (mode),
> + VEC_SERIES_CST_BASE (exp));
> + op1 = const_vector_element (GET_MODE_INNER (mode),
> + VEC_SERIES_CST_STEP (exp));
> + return gen_const_vec_series (mode, op0, op1);
> +
> case CONST_DECL:
> if (modifier == EXPAND_WRITE)
> {
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100
> +++ gcc/optabs.def 2017-10-23 11:42:34.919720660 +0100
> @@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
> OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>
> OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100
> +++ gcc/optabs.h 2017-10-23 11:42:34.919720660 +0100
> @@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
> /* Generate code for VEC_COND_EXPR. */
> extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>
> +/* Generate code for VEC_SERIES_EXPR. */
> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
> +
> /* Generate code for MULT_HIGHPART_EXPR. */
> extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100
> +++ gcc/optabs.c 2017-10-23 11:42:34.919720660 +0100
> @@ -5693,6 +5693,27 @@ expand_vec_cond_expr (tree vec_cond_type
> return ops[0].value;
> }
>
> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
> + Use TARGET for the result if nonnull and convenient. */
> +
> +rtx
> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
> +{
> + struct expand_operand ops[3];
> + enum insn_code icode;
> + machine_mode emode = GET_MODE_INNER (vmode);
> +
> + icode = direct_optab_handler (vec_series_optab, vmode);
> + gcc_assert (icode != CODE_FOR_nothing);
> +
> + create_output_operand (&ops[0], target, vmode);
> + create_input_operand (&ops[1], op0, emode);
> + create_input_operand (&ops[2], op1, emode);
> +
> + expand_insn (icode, 3, ops);
> + return ops[0].value;
> +}
> +
> /* Generate insns for a vector comparison into a mask. */
>
> rtx
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100
> +++ gcc/optabs-tree.c 2017-10-23 11:42:34.918720660 +0100
> @@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
> case VEC_DUPLICATE_EXPR:
> return vec_duplicate_optab;
>
> + case VEC_SERIES_EXPR:
> + return vec_series_optab;
> +
> default:
> break;
> }
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100
> +++ gcc/tree-cfg.c 2017-10-23 11:42:34.920720660 +0100
> @@ -4119,6 +4119,23 @@ verify_gimple_assign_binary (gassign *st
> /* Continue with generic binary expression handling. */
> break;
>
> + case VEC_SERIES_EXPR:
> + if (!useless_type_conversion_p (rhs1_type, rhs2_type))
> + {
> + error ("type mismatch in series expression");
> + debug_generic_expr (rhs1_type);
> + debug_generic_expr (rhs2_type);
> + return true;
> + }
> + if (TREE_CODE (lhs_type) != VECTOR_TYPE
> + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> + {
> + error ("vector type expected in series expression");
> + debug_generic_expr (lhs_type);
> + return true;
> + }
> + return false;
> +
> default:
> gcc_unreachable ();
> }
> @@ -4485,6 +4502,7 @@ verify_gimple_assign_single (gassign *st
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> return res;
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
> +++ gcc/tree-vect-generic.c 2017-10-23 11:42:34.922720660 +0100
> @@ -1595,7 +1595,8 @@ expand_vector_operations_1 (gimple_stmt_
> if (rhs_class == GIMPLE_BINARY_RHS)
> rhs2 = gimple_assign_rhs2 (stmt);
>
> - if (TREE_CODE (type) != VECTOR_TYPE)
> + if (!VECTOR_TYPE_P (type)
> + || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
> return;
>
> /* If the vector operation is operating on all same vector elements
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
2017-10-26 12:26 ` Richard Biener
@ 2017-10-26 12:43 ` Richard Biener
2017-11-06 15:21 ` Richard Sandiford
0 siblings, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:43 UTC (permalink / raw)
To: GCC Patches, Richard Sandiford
On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>> tree code equivalents of the VEC_SERIES rtx code. VEC_SERIES_EXPR
>> is for non-constant inputs and is a normal tcc_binary. VEC_SERIES_CST
>> is a tcc_constant.
>>
>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>> vectors. This avoids the need to handle combinations of VECTOR_CST
>> and VEC_SERIES_CST.
>
> Similar to the other patch can you document and verify that VEC_SERIES_CST
> is only used on variable length vectors?
>
> Ok with that change.
Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
via setting step == 0? I think you can do {1, 1, 1, 1... } + {1, 2,3
,4,5 } constant
folding but you don't implement that. Propagation can also turn
VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).
Richard.
> Thanks,
> Richard.
>
>>
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
>> * doc/md.texi (vec_series@var{m}): Document.
>> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
>> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
>> codes.
>> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
>> (build_vec_series_cst, build_vec_series): Declare.
>> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
>> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
>> (build_vec_series_cst, build_vec_series): New functions.
>> * cfgexpand.c (expand_debug_expr): Handle the new codes.
>> * tree-pretty-print.c (dump_generic_node): Likewise.
>> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
>> * gimple-expr.h (is_gimple_constant): Likewise.
>> * gimplify.c (gimplify_expr): Likewise.
>> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
>> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
>> (func_checker::compare_operand): Likewise.
>> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
>> * print-tree.c (print_node): Likewise.
>> * tree-ssa-loop.c (for_each_index): Likewise.
>> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
>> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
>> (ao_ref_init_from_vn_reference): Likewise.
>> * varasm.c (const_hash_1, compare_constant): Likewise.
>> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
>> (fold_checksum_tree): Likewise.
>> (vec_series_equivalent_p): New function.
>> (const_binop): Use it. Fold VEC_SERIES_EXPRs of constants.
>> * expmed.c (make_tree): Handle VEC_SERIES.
>> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
>> * tree-inline.c (estimate_operator_cost): Likewise.
>> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
>> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
>> (expand_expr_real_1): Handle VEC_SERIES_CST.
>> * optabs.def (vec_series_optab): New optab.
>> * optabs.h (expand_vec_series_expr): Declare.
>> * optabs.c (expand_vec_series_expr): New function.
>> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
>> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
>> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
>> * tree-vect-generic.c (expand_vector_operations_1): Check that
>> the operands also have vector type.
>>
>> Index: gcc/doc/generic.texi
>> ===================================================================
>> --- gcc/doc/generic.texi 2017-10-23 11:41:51.760448406 +0100
>> +++ gcc/doc/generic.texi 2017-10-23 11:42:34.910720660 +0100
>> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
>> @tindex COMPLEX_CST
>> @tindex VECTOR_CST
>> @tindex VEC_DUPLICATE_CST
>> +@tindex VEC_SERIES_CST
>> @tindex STRING_CST
>> @findex TREE_STRING_LENGTH
>> @findex TREE_STRING_POINTER
>> @@ -1098,6 +1099,16 @@ instead. The scalar element value is gi
>> @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
>> element of a @code{VECTOR_CST}.
>>
>> +@item VEC_SERIES_CST
>> +These nodes represent a vector constant in which element @var{i}
>> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
>> +constant @var{base} and @var{step}. The value of @var{base} is
>> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
>> +given by @code{VEC_SERIES_CST_STEP}.
>> +
>> +These nodes are restricted to integral types, in order to avoid
>> +specifying the rounding behavior for floating-point types.
>> +
>> @item STRING_CST
>> These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
>> returns the length of the string, as an @code{int}. The
>> @@ -1702,6 +1713,7 @@ a value from @code{enum annot_expr_kind}
>> @node Vectors
>> @subsection Vectors
>> @tindex VEC_DUPLICATE_EXPR
>> +@tindex VEC_SERIES_EXPR
>> @tindex VEC_LSHIFT_EXPR
>> @tindex VEC_RSHIFT_EXPR
>> @tindex VEC_WIDEN_MULT_HI_EXPR
>> @@ -1721,6 +1733,14 @@ a value from @code{enum annot_expr_kind}
>> This node has a single operand and represents a vector in which every
>> element is equal to that operand.
>>
>> +@item VEC_SERIES_EXPR
>> +This node represents a vector formed from a scalar base and step,
>> +given as the first and second operands respectively. Element @var{i}
>> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
>> +
>> +This node is restricted to integral types, in order to avoid
>> +specifying the rounding behavior for floating-point types.
>> +
>> @item VEC_LSHIFT_EXPR
>> @itemx VEC_RSHIFT_EXPR
>> These nodes represent whole vector left and right shifts, respectively.
>> Index: gcc/doc/md.texi
>> ===================================================================
>> --- gcc/doc/md.texi 2017-10-23 11:41:51.761413027 +0100
>> +++ gcc/doc/md.texi 2017-10-23 11:42:34.911720660 +0100
>> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>>
>> This pattern is not allowed to @code{FAIL}.
>>
>> +@cindex @code{vec_series@var{m}} instruction pattern
>> +@item @samp{vec_series@var{m}}
>> +Initialize vector output operand 0 so that element @var{i} is equal to
>> +operand 1 plus @var{i} times operand 2. In other words, create a linear
>> +series whose base value is operand 1 and whose step is operand 2.
>> +
>> +The vector output has mode @var{m} and the scalar inputs have the mode
>> +appropriate for one element of @var{m}. This pattern is not used for
>> +floating-point vectors, in order to avoid having to specify the
>> +rounding behavior for @var{i} > 1.
>> +
>> +This pattern is not allowed to @code{FAIL}.
>> +
>> @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
>> @item @samp{vec_cmp@var{m}@var{n}}
>> Output a vector comparison. Operand 0 of mode @var{n} is the destination for
>> Index: gcc/tree.def
>> ===================================================================
>> --- gcc/tree.def 2017-10-23 11:41:51.774917721 +0100
>> +++ gcc/tree.def 2017-10-23 11:42:34.924720660 +0100
>> @@ -308,6 +308,10 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
>> VEC_DUPLICATE_CST_ELT. */
>> DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
>>
>> +/* Represents a vector constant in which element i is equal to
>> + VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP. */
>> +DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
>> +
>> /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
>> DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>>
>> @@ -541,6 +545,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
>> /* Represents a vector in which every element is equal to operand 0. */
>> DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>>
>> +/* Vector series created from a start (base) value and a step.
>> +
>> + A = VEC_SERIES_EXPR (B, C)
>> +
>> + means
>> +
>> + for (i = 0; i < N; i++)
>> + A[i] = B + C * i; */
>> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
>> +
>> /* Vector conditional expression. It is like COND_EXPR, but with
>> vector operands.
>>
>> Index: gcc/tree.h
>> ===================================================================
>> --- gcc/tree.h 2017-10-23 11:41:51.775882341 +0100
>> +++ gcc/tree.h 2017-10-23 11:42:34.925720660 +0100
>> @@ -730,8 +730,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
>> #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
>> (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>>
>> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
>> - this means there was an overflow in folding. */
>> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
>> + or VEC_SERES_CST, this means there was an overflow in folding. */
>>
>> #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>>
>> @@ -1034,6 +1034,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
>> #define VEC_DUPLICATE_CST_ELT(NODE) \
>> (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
>>
>> +/* In a VEC_SERIES_CST node. */
>> +#define VEC_SERIES_CST_BASE(NODE) \
>> + (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
>> +#define VEC_SERIES_CST_STEP(NODE) \
>> + (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
>> +
>> /* Define fields and accessors for some special-purpose tree nodes. */
>>
>> #define IDENTIFIER_LENGTH(NODE) \
>> @@ -4030,9 +4036,11 @@ extern tree build_int_cstu (tree type, u
>> extern tree build_int_cst_type (tree, HOST_WIDE_INT);
>> extern tree make_vector (unsigned CXX_MEM_STAT_INFO);
>> extern tree build_vec_duplicate_cst (tree, tree CXX_MEM_STAT_INFO);
>> +extern tree build_vec_series_cst (tree, tree, tree CXX_MEM_STAT_INFO);
>> extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
>> extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
>> extern tree build_vector_from_val (tree, tree);
>> +extern tree build_vec_series (tree, tree, tree);
>> extern void recompute_constructor_flags (tree);
>> extern void verify_constructor_flags (tree);
>> extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
>> Index: gcc/tree.c
>> ===================================================================
>> --- gcc/tree.c 2017-10-23 11:41:51.774917721 +0100
>> +++ gcc/tree.c 2017-10-23 11:42:34.924720660 +0100
>> @@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
>> case COMPLEX_CST: return TS_COMPLEX;
>> case VECTOR_CST: return TS_VECTOR;
>> case VEC_DUPLICATE_CST: return TS_VECTOR;
>> + case VEC_SERIES_CST: return TS_VECTOR;
>> case STRING_CST: return TS_STRING;
>> /* tcc_exceptional cases. */
>> case ERROR_MARK: return TS_COMMON;
>> @@ -818,6 +819,8 @@ tree_code_size (enum tree_code code)
>> case COMPLEX_CST: return sizeof (struct tree_complex);
>> case VECTOR_CST: return sizeof (struct tree_vector);
>> case VEC_DUPLICATE_CST: return sizeof (struct tree_vector);
>> + case VEC_SERIES_CST:
>> + return sizeof (struct tree_vector) + sizeof (tree);
>> case STRING_CST: gcc_unreachable ();
>> default:
>> return lang_hooks.tree_size (code);
>> @@ -880,6 +883,9 @@ tree_size (const_tree node)
>> case VEC_DUPLICATE_CST:
>> return sizeof (struct tree_vector);
>>
>> + case VEC_SERIES_CST:
>> + return sizeof (struct tree_vector) + sizeof (tree);
>> +
>> case STRING_CST:
>> return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>>
>> @@ -1711,6 +1717,31 @@ build_vec_duplicate_cst (tree type, tree
>> return t;
>> }
>>
>> +/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
>> +
>> + Note that this function is only suitable for callers that specifically
>> + need a VEC_SERIES_CST node. Use build_vec_series to build a general
>> + series vector from a general base and step. */
>> +
>> +tree
>> +build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
>> +{
>> + int length = sizeof (struct tree_vector) + sizeof (tree);
>> +
>> + record_node_allocation_statistics (VEC_SERIES_CST, length);
>> +
>> + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
>> +
>> + TREE_SET_CODE (t, VEC_SERIES_CST);
>> + TREE_TYPE (t) = type;
>> + t->base.u.nelts = 2;
>> + VEC_SERIES_CST_BASE (t) = base;
>> + VEC_SERIES_CST_STEP (t) = step;
>> + TREE_CONSTANT (t) = 1;
>> +
>> + return t;
>> +}
>> +
>> /* Build a newly constructed VECTOR_CST node of length LEN. */
>>
>> tree
>> @@ -1821,6 +1852,19 @@ build_vector_from_val (tree vectype, tre
>> }
>> }
>>
>> +/* Build a vector series of type TYPE in which element I has the value
>> + BASE + I * STEP. */
>> +
>> +tree
>> +build_vec_series (tree type, tree base, tree step)
>> +{
>> + if (integer_zerop (step))
>> + return build_vector_from_val (type, base);
>> + if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
>> + return build_vec_series_cst (type, base, step);
>> + return build2 (VEC_SERIES_EXPR, type, base, step);
>> +}
>> +
>> /* Something has messed with the elements of CONSTRUCTOR C after it was built;
>> calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */
>>
>> @@ -7136,6 +7180,10 @@ add_expr (const_tree t, inchash::hash &h
>> case VEC_DUPLICATE_CST:
>> inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
>> return;
>> + case VEC_SERIES_CST:
>> + inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
>> + inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
>> + return;
>> case SSA_NAME:
>> /* We can just compare by pointer. */
>> hstate.add_wide_int (SSA_NAME_VERSION (t));
>> @@ -11150,6 +11198,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
>> case FIXED_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case STRING_CST:
>> case BLOCK:
>> case PLACEHOLDER_EXPR:
>> @@ -12442,6 +12491,15 @@ drop_tree_overflow (tree t)
>> if (TREE_OVERFLOW (*elt))
>> *elt = drop_tree_overflow (*elt);
>> }
>> + if (TREE_CODE (t) == VEC_SERIES_CST)
>> + {
>> + tree *elt = &VEC_SERIES_CST_BASE (t);
>> + if (TREE_OVERFLOW (*elt))
>> + *elt = drop_tree_overflow (*elt);
>> + elt = &VEC_SERIES_CST_STEP (t);
>> + if (TREE_OVERFLOW (*elt))
>> + *elt = drop_tree_overflow (*elt);
>> + }
>> return t;
>> }
>>
>> Index: gcc/cfgexpand.c
>> ===================================================================
>> --- gcc/cfgexpand.c 2017-10-23 11:41:51.760448406 +0100
>> +++ gcc/cfgexpand.c 2017-10-23 11:42:34.909720660 +0100
>> @@ -5051,6 +5051,8 @@ expand_debug_expr (tree exp)
>> case VEC_PERM_EXPR:
>> case VEC_DUPLICATE_CST:
>> case VEC_DUPLICATE_EXPR:
>> + case VEC_SERIES_CST:
>> + case VEC_SERIES_EXPR:
>> return NULL;
>>
>> /* Misc codes. */
>> Index: gcc/tree-pretty-print.c
>> ===================================================================
>> --- gcc/tree-pretty-print.c 2017-10-23 11:41:51.772023858 +0100
>> +++ gcc/tree-pretty-print.c 2017-10-23 11:42:34.921720660 +0100
>> @@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
>> pp_string (pp, ", ... }");
>> break;
>>
>> + case VEC_SERIES_CST:
>> + pp_string (pp, "{ ");
>> + dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
>> + pp_string (pp, ", +, ");
>> + dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
>> + pp_string (pp, "}");
>> + break;
>> +
>> case FUNCTION_TYPE:
>> case METHOD_TYPE:
>> dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
>> @@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
>> pp_string (pp, " > ");
>> break;
>>
>> + case VEC_SERIES_EXPR:
>> case VEC_WIDEN_MULT_HI_EXPR:
>> case VEC_WIDEN_MULT_LO_EXPR:
>> case VEC_WIDEN_MULT_EVEN_EXPR:
>> Index: gcc/dwarf2out.c
>> ===================================================================
>> --- gcc/dwarf2out.c 2017-10-23 11:41:51.763342269 +0100
>> +++ gcc/dwarf2out.c 2017-10-23 11:42:34.913720660 +0100
>> @@ -18863,6 +18863,7 @@ rtl_for_decl_init (tree init, tree type)
>> {
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> break;
>> case CONSTRUCTOR:
>> if (TREE_CONSTANT (init))
>> Index: gcc/gimple-expr.h
>> ===================================================================
>> --- gcc/gimple-expr.h 2017-10-23 11:41:51.765271511 +0100
>> +++ gcc/gimple-expr.h 2017-10-23 11:42:34.916720660 +0100
>> @@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case STRING_CST:
>> return true;
>>
>> Index: gcc/gimplify.c
>> ===================================================================
>> --- gcc/gimplify.c 2017-10-23 11:41:51.766236132 +0100
>> +++ gcc/gimplify.c 2017-10-23 11:42:34.917720660 +0100
>> @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> /* Drop the overflow flag on constants, we do not want
>> that in the GIMPLE IL. */
>> if (TREE_OVERFLOW_P (*expr_p))
>> Index: gcc/graphite-scop-detection.c
>> ===================================================================
>> --- gcc/graphite-scop-detection.c 2017-10-23 11:41:51.767200753 +0100
>> +++ gcc/graphite-scop-detection.c 2017-10-23 11:42:34.917720660 +0100
>> @@ -1244,6 +1244,7 @@ scan_tree_for_params (sese_info_p s, tre
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> break;
>>
>> default:
>> Index: gcc/ipa-icf-gimple.c
>> ===================================================================
>> --- gcc/ipa-icf-gimple.c 2017-10-23 11:41:51.767200753 +0100
>> +++ gcc/ipa-icf-gimple.c 2017-10-23 11:42:34.917720660 +0100
>> @@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case STRING_CST:
>> case REAL_CST:
>> {
>> @@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case STRING_CST:
>> case REAL_CST:
>> case FUNCTION_DECL:
>> Index: gcc/ipa-icf.c
>> ===================================================================
>> --- gcc/ipa-icf.c 2017-10-23 11:41:51.768165374 +0100
>> +++ gcc/ipa-icf.c 2017-10-23 11:42:34.918720660 +0100
>> @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> inchash::add_expr (exp, hstate);
>> break;
>> case CONSTRUCTOR:
>> @@ -2034,6 +2035,11 @@ sem_variable::equals (tree t1, tree t2)
>> case VEC_DUPLICATE_CST:
>> return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
>> VEC_DUPLICATE_CST_ELT (t2));
>> + case VEC_SERIES_CST:
>> + return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
>> + VEC_SERIES_CST_BASE (t2))
>> + && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
>> + VEC_SERIES_CST_STEP (t2)));
>> case ARRAY_REF:
>> case ARRAY_RANGE_REF:
>> {
>> Index: gcc/print-tree.c
>> ===================================================================
>> --- gcc/print-tree.c 2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/print-tree.c 2017-10-23 11:42:34.919720660 +0100
>> @@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
>> print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
>> break;
>>
>> + case VEC_SERIES_CST:
>> + print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
>> + print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
>> + break;
>> +
>> case COMPLEX_CST:
>> print_node (file, "real", TREE_REALPART (node), indent + 4);
>> print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
>> Index: gcc/tree-ssa-loop.c
>> ===================================================================
>> --- gcc/tree-ssa-loop.c 2017-10-23 11:41:51.772023858 +0100
>> +++ gcc/tree-ssa-loop.c 2017-10-23 11:42:34.921720660 +0100
>> @@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
>> case RESULT_DECL:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case COMPLEX_CST:
>> case INTEGER_CST:
>> case REAL_CST:
>> Index: gcc/tree-ssa-pre.c
>> ===================================================================
>> --- gcc/tree-ssa-pre.c 2017-10-23 11:41:51.772023858 +0100
>> +++ gcc/tree-ssa-pre.c 2017-10-23 11:42:34.922720660 +0100
>> @@ -2676,6 +2676,7 @@ create_component_ref_by_pieces_1 (basic_
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case REAL_CST:
>> case CONSTRUCTOR:
>> case VAR_DECL:
>> Index: gcc/tree-ssa-sccvn.c
>> ===================================================================
>> --- gcc/tree-ssa-sccvn.c 2017-10-23 11:41:51.773953100 +0100
>> +++ gcc/tree-ssa-sccvn.c 2017-10-23 11:42:34.922720660 +0100
>> @@ -859,6 +859,7 @@ copy_reference_ops_from_ref (tree ref, v
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case REAL_CST:
>> case FIXED_CST:
>> case CONSTRUCTOR:
>> @@ -1052,6 +1053,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case REAL_CST:
>> case CONSTRUCTOR:
>> case CONST_DECL:
>> Index: gcc/varasm.c
>> ===================================================================
>> --- gcc/varasm.c 2017-10-23 11:41:51.775882341 +0100
>> +++ gcc/varasm.c 2017-10-23 11:42:34.927720660 +0100
>> @@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
>> return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
>> + const_hash_1 (TREE_OPERAND (exp, 1)));
>>
>> + case VEC_SERIES_CST:
>> + return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
>> + + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
>> +
>> CASE_CONVERT:
>> return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>>
>> @@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
>> return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
>> VEC_DUPLICATE_CST_ELT (t2));
>>
>> + case VEC_SERIES_CST:
>> + return (compare_constant (VEC_SERIES_CST_BASE (t1),
>> + VEC_SERIES_CST_BASE (t2))
>> + && compare_constant (VEC_SERIES_CST_STEP (t1),
>> + VEC_SERIES_CST_STEP (t2)));
>> +
>> case CONSTRUCTOR:
>> {
>> vec<constructor_elt, va_gc> *v1, *v2;
>> Index: gcc/fold-const.c
>> ===================================================================
>> --- gcc/fold-const.c 2017-10-23 11:41:51.765271511 +0100
>> +++ gcc/fold-const.c 2017-10-23 11:42:34.916720660 +0100
>> @@ -421,6 +421,10 @@ negate_expr_p (tree t)
>> case VEC_DUPLICATE_CST:
>> return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
>>
>> + case VEC_SERIES_CST:
>> + return (negate_expr_p (VEC_SERIES_CST_BASE (t))
>> + && negate_expr_p (VEC_SERIES_CST_STEP (t)));
>> +
>> case COMPLEX_EXPR:
>> return negate_expr_p (TREE_OPERAND (t, 0))
>> && negate_expr_p (TREE_OPERAND (t, 1));
>> @@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
>> return build_vector_from_val (type, sub);
>> }
>>
>> + case VEC_SERIES_CST:
>> + {
>> + tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
>> + if (!neg_base)
>> + return NULL_TREE;
>> + tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
>> + if (!neg_step)
>> + return NULL_TREE;
>> + return build_vec_series (type, neg_base, neg_step);
>> + }
>> +
>> case COMPLEX_EXPR:
>> if (negate_expr_p (t))
>> return fold_build2_loc (loc, COMPLEX_EXPR, type,
>> @@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
>> return int_const_binop_1 (code, arg1, arg2, 1);
>> }
>>
>> +/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
>> + and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
>> + The step will be zero for VEC_DUPLICATE_CST. */
>> +
>> +static bool
>> +vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
>> +{
>> + if (TREE_CODE (exp) == VEC_SERIES_CST)
>> + {
>> + *base_out = VEC_SERIES_CST_BASE (exp);
>> + *step_out = VEC_SERIES_CST_STEP (exp);
>> + return true;
>> + }
>> + if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
>> + {
>> + *base_out = VEC_DUPLICATE_CST_ELT (exp);
>> + *step_out = build_zero_cst (TREE_TYPE (*base_out));
>> + return true;
>> + }
>> + return false;
>> +}
>> +
>> /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
>> constant. We assume ARG1 and ARG2 have the same data type, or at least
>> are the same kind of constant and the same machine mode. Return zero if
>> @@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
>> return build_vector_from_val (TREE_TYPE (arg1), sub);
>> }
>>
>> + tree base1, step1, base2, step2;
>> + if ((code == PLUS_EXPR || code == MINUS_EXPR)
>> + && vec_series_equivalent_p (arg1, &base1, &step1)
>> + && vec_series_equivalent_p (arg2, &base2, &step2))
>> + {
>> + tree new_base = const_binop (code, base1, base2);
>> + if (!new_base)
>> + return NULL_TREE;
>> + tree new_step = const_binop (code, step1, step2);
>> + if (!new_step)
>> + return NULL_TREE;
>> + return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
>> + }
>> +
>> /* Shifts allow a scalar offset for a vector. */
>> if (TREE_CODE (arg1) == VECTOR_CST
>> && TREE_CODE (arg2) == INTEGER_CST)
>> @@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
>> result as argument put those cases that need it here. */
>> switch (code)
>> {
>> + case VEC_SERIES_EXPR:
>> + if (CONSTANT_CLASS_P (arg1)
>> + && CONSTANT_CLASS_P (arg2))
>> + return build_vec_series (type, arg1, arg2);
>> + return NULL_TREE;
>> +
>> case COMPLEX_EXPR:
>> if ((TREE_CODE (arg1) == REAL_CST
>> && TREE_CODE (arg2) == REAL_CST)
>> @@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
>> return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
>> VEC_DUPLICATE_CST_ELT (arg1), flags);
>>
>> + case VEC_SERIES_CST:
>> + return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
>> + VEC_SERIES_CST_BASE (arg1), flags)
>> + && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
>> + VEC_SERIES_CST_STEP (arg1), flags));
>> +
>> case COMPLEX_CST:
>> return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
>> flags)
>> @@ -12050,6 +12113,10 @@ fold_checksum_tree (const_tree expr, str
>> case VEC_DUPLICATE_CST:
>> fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
>> break;
>> + case VEC_SERIES_CST:
>> + fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
>> + fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
>> + break;
>> default:
>> break;
>> }
>> Index: gcc/expmed.c
>> ===================================================================
>> --- gcc/expmed.c 2017-10-23 11:41:39.186050437 +0100
>> +++ gcc/expmed.c 2017-10-23 11:42:34.914720660 +0100
>> @@ -5253,6 +5253,13 @@ make_tree (tree type, rtx x)
>> tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
>> return build_vector_from_val (type, elt_tree);
>> }
>> + if (GET_CODE (op) == VEC_SERIES)
>> + {
>> + tree itype = TREE_TYPE (type);
>> + tree base_tree = make_tree (itype, XEXP (op, 0));
>> + tree step_tree = make_tree (itype, XEXP (op, 1));
>> + return build_vec_series (type, base_tree, step_tree);
>> + }
>> return make_tree (type, op);
>> }
>>
>> Index: gcc/gimple-pretty-print.c
>> ===================================================================
>> --- gcc/gimple-pretty-print.c 2017-10-23 11:41:25.500318672 +0100
>> +++ gcc/gimple-pretty-print.c 2017-10-23 11:42:34.916720660 +0100
>> @@ -438,6 +438,7 @@ dump_binary_rhs (pretty_printer *buffer,
>> case VEC_PACK_FIX_TRUNC_EXPR:
>> case VEC_WIDEN_LSHIFT_HI_EXPR:
>> case VEC_WIDEN_LSHIFT_LO_EXPR:
>> + case VEC_SERIES_EXPR:
>> for (p = get_tree_code_name (code); *p; p++)
>> pp_character (buffer, TOUPPER (*p));
>> pp_string (buffer, " <");
>> Index: gcc/tree-inline.c
>> ===================================================================
>> --- gcc/tree-inline.c 2017-10-23 11:41:51.771059237 +0100
>> +++ gcc/tree-inline.c 2017-10-23 11:42:34.921720660 +0100
>> @@ -4003,6 +4003,7 @@ estimate_operator_cost (enum tree_code c
>> case VEC_WIDEN_LSHIFT_HI_EXPR:
>> case VEC_WIDEN_LSHIFT_LO_EXPR:
>> case VEC_DUPLICATE_EXPR:
>> + case VEC_SERIES_EXPR:
>>
>> return 1;
>>
>> Index: gcc/expr.c
>> ===================================================================
>> --- gcc/expr.c 2017-10-23 11:41:51.764306890 +0100
>> +++ gcc/expr.c 2017-10-23 11:42:34.915720660 +0100
>> @@ -7704,7 +7704,7 @@ expand_operands (tree exp0, tree exp1, r
>>
>>
>> /* Expand constant vector element ELT, which has mode MODE. This is used
>> - for members of VECTOR_CST and VEC_DUPLICATE_CST. */
>> + for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST. */
>>
>> static rtx
>> const_vector_element (scalar_mode mode, const_tree elt)
>> @@ -9587,6 +9587,10 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
>> gcc_assert (target);
>> return target;
>>
>> + case VEC_SERIES_EXPR:
>> + expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
>> + return expand_vec_series_expr (mode, op0, op1, target);
>> +
>> case BIT_INSERT_EXPR:
>> {
>> unsigned bitpos = tree_to_uhwi (treeop2);
>> @@ -10044,6 +10048,13 @@ expand_expr_real_1 (tree exp, rtx target
>> VEC_DUPLICATE_CST_ELT (exp));
>> return gen_const_vec_duplicate (mode, op0);
>>
>> + case VEC_SERIES_CST:
>> + op0 = const_vector_element (GET_MODE_INNER (mode),
>> + VEC_SERIES_CST_BASE (exp));
>> + op1 = const_vector_element (GET_MODE_INNER (mode),
>> + VEC_SERIES_CST_STEP (exp));
>> + return gen_const_vec_series (mode, op0, op1);
>> +
>> case CONST_DECL:
>> if (modifier == EXPAND_WRITE)
>> {
>> Index: gcc/optabs.def
>> ===================================================================
>> --- gcc/optabs.def 2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/optabs.def 2017-10-23 11:42:34.919720660 +0100
>> @@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
>> OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>>
>> OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
>> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
>> Index: gcc/optabs.h
>> ===================================================================
>> --- gcc/optabs.h 2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/optabs.h 2017-10-23 11:42:34.919720660 +0100
>> @@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
>> /* Generate code for VEC_COND_EXPR. */
>> extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>>
>> +/* Generate code for VEC_SERIES_EXPR. */
>> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
>> +
>> /* Generate code for MULT_HIGHPART_EXPR. */
>> extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>>
>> Index: gcc/optabs.c
>> ===================================================================
>> --- gcc/optabs.c 2017-10-23 11:41:51.769129995 +0100
>> +++ gcc/optabs.c 2017-10-23 11:42:34.919720660 +0100
>> @@ -5693,6 +5693,27 @@ expand_vec_cond_expr (tree vec_cond_type
>> return ops[0].value;
>> }
>>
>> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
>> + Use TARGET for the result if nonnull and convenient. */
>> +
>> +rtx
>> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
>> +{
>> + struct expand_operand ops[3];
>> + enum insn_code icode;
>> + machine_mode emode = GET_MODE_INNER (vmode);
>> +
>> + icode = direct_optab_handler (vec_series_optab, vmode);
>> + gcc_assert (icode != CODE_FOR_nothing);
>> +
>> + create_output_operand (&ops[0], target, vmode);
>> + create_input_operand (&ops[1], op0, emode);
>> + create_input_operand (&ops[2], op1, emode);
>> +
>> + expand_insn (icode, 3, ops);
>> + return ops[0].value;
>> +}
>> +
>> /* Generate insns for a vector comparison into a mask. */
>>
>> rtx
>> Index: gcc/optabs-tree.c
>> ===================================================================
>> --- gcc/optabs-tree.c 2017-10-23 11:41:51.768165374 +0100
>> +++ gcc/optabs-tree.c 2017-10-23 11:42:34.918720660 +0100
>> @@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
>> case VEC_DUPLICATE_EXPR:
>> return vec_duplicate_optab;
>>
>> + case VEC_SERIES_EXPR:
>> + return vec_series_optab;
>> +
>> default:
>> break;
>> }
>> Index: gcc/tree-cfg.c
>> ===================================================================
>> --- gcc/tree-cfg.c 2017-10-23 11:41:51.770094616 +0100
>> +++ gcc/tree-cfg.c 2017-10-23 11:42:34.920720660 +0100
>> @@ -4119,6 +4119,23 @@ verify_gimple_assign_binary (gassign *st
>> /* Continue with generic binary expression handling. */
>> break;
>>
>> + case VEC_SERIES_EXPR:
>> + if (!useless_type_conversion_p (rhs1_type, rhs2_type))
>> + {
>> + error ("type mismatch in series expression");
>> + debug_generic_expr (rhs1_type);
>> + debug_generic_expr (rhs2_type);
>> + return true;
>> + }
>> + if (TREE_CODE (lhs_type) != VECTOR_TYPE
>> + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
>> + {
>> + error ("vector type expected in series expression");
>> + debug_generic_expr (lhs_type);
>> + return true;
>> + }
>> + return false;
>> +
>> default:
>> gcc_unreachable ();
>> }
>> @@ -4485,6 +4502,7 @@ verify_gimple_assign_single (gassign *st
>> case COMPLEX_CST:
>> case VECTOR_CST:
>> case VEC_DUPLICATE_CST:
>> + case VEC_SERIES_CST:
>> case STRING_CST:
>> return res;
>>
>> Index: gcc/tree-vect-generic.c
>> ===================================================================
>> --- gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
>> +++ gcc/tree-vect-generic.c 2017-10-23 11:42:34.922720660 +0100
>> @@ -1595,7 +1595,8 @@ expand_vector_operations_1 (gimple_stmt_
>> if (rhs_class == GIMPLE_BINARY_RHS)
>> rhs2 = gimple_assign_rhs2 (stmt);
>>
>> - if (TREE_CODE (type) != VECTOR_TYPE)
>> + if (!VECTOR_TYPE_P (type)
>> + || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
>> return;
>>
>> /* If the vector operation is operating on all same vector elements
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 12:18 ` Richard Sandiford
@ 2017-10-26 12:46 ` Richard Biener
2017-10-26 19:42 ` Eric Botcazou
` (2 more replies)
0 siblings, 3 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-26 12:46 UTC (permalink / raw)
To: Richard Biener, GCC Patches, Richard Sandiford
On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds a POD version of fixed_size_mode. The only current use
>>> is for storing the __builtin_apply and __builtin_result register modes,
>>> which were made fixed_size_modes by the previous patch.
>>
>> Bah - can we update our host compiler to C++11/14 please ...?
>> (maybe requiring that build with GCC 4.8 as host compiler works,
>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>
> That'd be great :-) It would avoid all the poly_int_pod stuff too,
> and allow some clean-up of wide-int.h.
Can you figure what oldest GCC release supports the C++11/14 POD handling
that would be required?
Richard.
> Thanks for the reviews,
> Richard
>
>
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>>>
>>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>>> Alan Hayward <alan.hayward@arm.com>
>>> David Sherwood <david.sherwood@arm.com>
>>>
>>> gcc/
>>> * coretypes.h (fixed_size_mode): Declare.
>>> (fixed_size_mode_pod): New typedef.
>>> * builtins.h (target_builtins::x_apply_args_mode)
>>> (target_builtins::x_apply_result_mode): Change type to
>>> fixed_size_mode_pod.
>>> * builtins.c (apply_args_size, apply_result_size, result_vector)
>>> (expand_builtin_apply_args_1, expand_builtin_apply)
>>> (expand_builtin_return): Update accordingly.
>>>
>>> Index: gcc/coretypes.h
>>> ===================================================================
>>> --- gcc/coretypes.h 2017-09-11 17:10:58.656085547 +0100
>>> +++ gcc/coretypes.h 2017-10-23 11:42:57.592545063 +0100
>>> @@ -59,6 +59,7 @@ typedef const struct rtx_def *const_rtx;
>>> class scalar_int_mode;
>>> class scalar_float_mode;
>>> class complex_mode;
>>> +class fixed_size_mode;
>>> template<typename> class opt_mode;
>>> typedef opt_mode<scalar_mode> opt_scalar_mode;
>>> typedef opt_mode<scalar_int_mode> opt_scalar_int_mode;
>>> @@ -66,6 +67,7 @@ typedef opt_mode<scalar_float_mode> opt_
>>> template<typename> class pod_mode;
>>> typedef pod_mode<scalar_mode> scalar_mode_pod;
>>> typedef pod_mode<scalar_int_mode> scalar_int_mode_pod;
>>> +typedef pod_mode<fixed_size_mode> fixed_size_mode_pod;
>>>
>>> /* Subclasses of rtx_def, using indentation to show the class
>>> hierarchy, along with the relevant invariant.
>>> Index: gcc/builtins.h
>>> ===================================================================
>>> --- gcc/builtins.h 2017-08-30 12:18:46.602740973 +0100
>>> +++ gcc/builtins.h 2017-10-23 11:42:57.592545063 +0100
>>> @@ -29,14 +29,14 @@ struct target_builtins {
>>> the register is not used for calling a function. If the machine
>>> has register windows, this gives only the outbound registers.
>>> INCOMING_REGNO gives the corresponding inbound register. */
>>> - machine_mode x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>> + fixed_size_mode_pod x_apply_args_mode[FIRST_PSEUDO_REGISTER];
>>>
>>> /* For each register that may be used for returning values, this gives
>>> a mode used to copy the register's value. VOIDmode indicates the
>>> register is not used for returning values. If the machine has
>>> register windows, this gives only the outbound registers.
>>> INCOMING_REGNO gives the corresponding inbound register. */
>>> - machine_mode x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>> + fixed_size_mode_pod x_apply_result_mode[FIRST_PSEUDO_REGISTER];
>>> };
>>>
>>> extern struct target_builtins default_target_builtins;
>>> Index: gcc/builtins.c
>>> ===================================================================
>>> --- gcc/builtins.c 2017-10-23 11:41:23.140260335 +0100
>>> +++ gcc/builtins.c 2017-10-23 11:42:57.592545063 +0100
>>> @@ -1358,7 +1358,6 @@ apply_args_size (void)
>>> static int size = -1;
>>> int align;
>>> unsigned int regno;
>>> - machine_mode mode;
>>>
>>> /* The values computed by this function never change. */
>>> if (size < 0)
>>> @@ -1374,7 +1373,7 @@ apply_args_size (void)
>>> for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> if (FUNCTION_ARG_REGNO_P (regno))
>>> {
>>> - mode = targetm.calls.get_raw_arg_mode (regno);
>>> + fixed_size_mode mode = targetm.calls.get_raw_arg_mode (regno);
>>>
>>> gcc_assert (mode != VOIDmode);
>>>
>>> @@ -1386,7 +1385,7 @@ apply_args_size (void)
>>> }
>>> else
>>> {
>>> - apply_args_mode[regno] = VOIDmode;
>>> + apply_args_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>> }
>>> }
>>> return size;
>>> @@ -1400,7 +1399,6 @@ apply_result_size (void)
>>> {
>>> static int size = -1;
>>> int align, regno;
>>> - machine_mode mode;
>>>
>>> /* The values computed by this function never change. */
>>> if (size < 0)
>>> @@ -1410,7 +1408,7 @@ apply_result_size (void)
>>> for (regno = 0; regno < FIRST_PSEUDO_REGISTER; regno++)
>>> if (targetm.calls.function_value_regno_p (regno))
>>> {
>>> - mode = targetm.calls.get_raw_result_mode (regno);
>>> + fixed_size_mode mode = targetm.calls.get_raw_result_mode (regno);
>>>
>>> gcc_assert (mode != VOIDmode);
>>>
>>> @@ -1421,7 +1419,7 @@ apply_result_size (void)
>>> apply_result_mode[regno] = mode;
>>> }
>>> else
>>> - apply_result_mode[regno] = VOIDmode;
>>> + apply_result_mode[regno] = as_a <fixed_size_mode> (VOIDmode);
>>>
>>> /* Allow targets that use untyped_call and untyped_return to override
>>> the size so that machine-specific information can be stored here. */
>>> @@ -1440,7 +1438,7 @@ apply_result_size (void)
>>> result_vector (int savep, rtx result)
>>> {
>>> int regno, size, align, nelts;
>>> - machine_mode mode;
>>> + fixed_size_mode mode;
>>> rtx reg, mem;
>>> rtx *savevec = XALLOCAVEC (rtx, FIRST_PSEUDO_REGISTER);
>>>
>>> @@ -1469,7 +1467,7 @@ expand_builtin_apply_args_1 (void)
>>> {
>>> rtx registers, tem;
>>> int size, align, regno;
>>> - machine_mode mode;
>>> + fixed_size_mode mode;
>>> rtx struct_incoming_value = targetm.calls.struct_value_rtx (cfun ? TREE_TYPE (cfun->decl) : 0, 1);
>>>
>>> /* Create a block where the arg-pointer, structure value address,
>>> @@ -1573,7 +1571,7 @@ expand_builtin_apply_args (void)
>>> expand_builtin_apply (rtx function, rtx arguments, rtx argsize)
>>> {
>>> int size, align, regno;
>>> - machine_mode mode;
>>> + fixed_size_mode mode;
>>> rtx incoming_args, result, reg, dest, src;
>>> rtx_call_insn *call_insn;
>>> rtx old_stack_level = 0;
>>> @@ -1734,7 +1732,7 @@ expand_builtin_apply (rtx function, rtx
>>> expand_builtin_return (rtx result)
>>> {
>>> int size, align, regno;
>>> - machine_mode mode;
>>> + fixed_size_mode mode;
>>> rtx reg;
>>> rtx_insn *call_fusage = 0;
>>>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 12:46 ` Richard Biener
@ 2017-10-26 19:42 ` Eric Botcazou
2017-10-27 8:34 ` Richard Biener
2017-10-30 3:14 ` Trevor Saunders
2017-10-26 19:44 ` Richard Sandiford
2017-10-26 19:45 ` Jakub Jelinek
2 siblings, 2 replies; 90+ messages in thread
From: Eric Botcazou @ 2017-10-26 19:42 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches, Richard Sandiford
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?
GCC needs to be buildable by other compilers than itself though.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 12:46 ` Richard Biener
2017-10-26 19:42 ` Eric Botcazou
@ 2017-10-26 19:44 ` Richard Sandiford
2017-10-26 19:45 ` Jakub Jelinek
2 siblings, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-26 19:44 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> Richard Biener <richard.guenther@gmail.com> writes:
>>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> This patch adds a POD version of fixed_size_mode. The only current use
>>>> is for storing the __builtin_apply and __builtin_result register modes,
>>>> which were made fixed_size_modes by the previous patch.
>>>
>>> Bah - can we update our host compiler to C++11/14 please ...?
>>> (maybe requiring that build with GCC 4.8 as host compiler works,
>>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>>
>> That'd be great :-) It would avoid all the poly_int_pod stuff too,
>> and allow some clean-up of wide-int.h.
>
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?
Looks like GCC 4.7, which was also the first to support -std=c++11
as an option. I could bootstrap with that (after s/std=gnu..98/std=c++11/
in configure) without all the POD types. It also supports "= default"
and template using, which would get rid of some wide-int.h ugliness.
Being able to construct poly_int::coeffs directly might also allow
some optimisations, and should help avoid the POLY_SET_COEFF bug that
Martin found, but I haven't looked at that yet.
Thanks,
Richard
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 12:46 ` Richard Biener
2017-10-26 19:42 ` Eric Botcazou
2017-10-26 19:44 ` Richard Sandiford
@ 2017-10-26 19:45 ` Jakub Jelinek
2017-10-27 8:43 ` Richard Biener
2 siblings, 1 reply; 90+ messages in thread
From: Jakub Jelinek @ 2017-10-26 19:45 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Richard Sandiford
On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
> > Richard Biener <richard.guenther@gmail.com> writes:
> >> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
> >> <richard.sandiford@linaro.org> wrote:
> >>> This patch adds a POD version of fixed_size_mode. The only current use
> >>> is for storing the __builtin_apply and __builtin_result register modes,
> >>> which were made fixed_size_modes by the previous patch.
> >>
> >> Bah - can we update our host compiler to C++11/14 please ...?
> >> (maybe requiring that build with GCC 4.8 as host compiler works,
> >> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
> >
> > That'd be great :-) It would avoid all the poly_int_pod stuff too,
> > and allow some clean-up of wide-int.h.
>
> Can you figure what oldest GCC release supports the C++11/14 POD handling
> that would be required?
I think it is too early for that, we aren't LLVM or Rust that don't really
care about what build requirements they impose on users.
Jakub
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 19:42 ` Eric Botcazou
@ 2017-10-27 8:34 ` Richard Biener
2017-10-27 9:28 ` Eric Botcazou
2017-10-30 3:14 ` Trevor Saunders
1 sibling, 1 reply; 90+ messages in thread
From: Richard Biener @ 2017-10-27 8:34 UTC (permalink / raw)
To: Eric Botcazou; +Cc: GCC Patches, Richard Sandiford
On Thu, Oct 26, 2017 at 9:37 PM, Eric Botcazou <ebotcazou@adacore.com> wrote:
>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>> that would be required?
>
> GCC needs to be buildable by other compilers than itself though.
There's always the possibility of building GCC 4.8 with the other compiler and
then GCC 9+ (?) with GCC 4.8.
What's the list of other compilers people routinely use? I see various comments
on other compilers in install.texi but those are already saying those cannot be
used to build GCC but you need to build an older GCC first (like xlc or the HP
compiler).
Richard.
> --
> Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 19:45 ` Jakub Jelinek
@ 2017-10-27 8:43 ` Richard Biener
2017-10-27 8:45 ` Jakub Jelinek
` (2 more replies)
0 siblings, 3 replies; 90+ messages in thread
From: Richard Biener @ 2017-10-27 8:43 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: GCC Patches, Richard Sandiford
On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>> > Richard Biener <richard.guenther@gmail.com> writes:
>> >> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>> >> <richard.sandiford@linaro.org> wrote:
>> >>> This patch adds a POD version of fixed_size_mode. The only current use
>> >>> is for storing the __builtin_apply and __builtin_result register modes,
>> >>> which were made fixed_size_modes by the previous patch.
>> >>
>> >> Bah - can we update our host compiler to C++11/14 please ...?
>> >> (maybe requiring that build with GCC 4.8 as host compiler works,
>> >> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>> >
>> > That'd be great :-) It would avoid all the poly_int_pod stuff too,
>> > and allow some clean-up of wide-int.h.
>>
>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>> that would be required?
>
> I think it is too early for that, we aren't LLVM or Rust that don't really
> care about what build requirements they impose on users.
That's true, which is why I asked. For me requiring sth newer than GCC 4.8
would be a blocker given that's the system compiler on our latest server
(and "stable" OSS) product.
I guess it depends on the amount of pain we have going forward with C++
use in GCC. Given that gdb already requires C++11 people building
GCC are likely already experiencing the "issue".
Richard.
> Jakub
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-27 8:43 ` Richard Biener
@ 2017-10-27 8:45 ` Jakub Jelinek
2017-10-27 10:19 ` Pedro Alves
2017-10-27 15:23 ` Jeff Law
2 siblings, 0 replies; 90+ messages in thread
From: Jakub Jelinek @ 2017-10-27 8:45 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches, Richard Sandiford
On Fri, Oct 27, 2017 at 10:35:56AM +0200, Richard Biener wrote:
> > I think it is too early for that, we aren't LLVM or Rust that don't really
> > care about what build requirements they impose on users.
>
> That's true, which is why I asked. For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
>
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC. Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".
Well, they can always start by building a new GCC and then build GDB with
it. If they'd need to build an intermediate, already unsupported, GCC in
between as well, it might be bigger pain.
GCC 4.8 as system compiler certainly needs to be supported, it is still
heavily used in the wild, but I'd say even e.g. GCC 4.4 or 4.3 isn't
something that can be ignored. And there are also non-GCC system compilers
we need to cope with.
Jakub
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-27 8:34 ` Richard Biener
@ 2017-10-27 9:28 ` Eric Botcazou
0 siblings, 0 replies; 90+ messages in thread
From: Eric Botcazou @ 2017-10-27 9:28 UTC (permalink / raw)
To: Richard Biener; +Cc: gcc-patches, Richard Sandiford
> There's always the possibility of building GCC 4.8 with the other compiler
> and then GCC 9+ (?) with GCC 4.8.
What an user-friendly solution...
> What's the list of other compilers people routinely use? I see various
> comments on other compilers in install.texi but those are already saying
> those cannot be used to build GCC but you need to build an older GCC first
> (like xlc or the HP compiler).
I read the opposite for XLC:
"GCC can bootstrap with recent versions of IBM XLC, but bootstrapping with an
earlier release of GCC is recommended."
I think that the major supported compilers are IBM, Sun/Oracle and LLVM.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-27 8:43 ` Richard Biener
2017-10-27 8:45 ` Jakub Jelinek
@ 2017-10-27 10:19 ` Pedro Alves
2017-10-27 15:23 ` Jeff Law
2 siblings, 0 replies; 90+ messages in thread
From: Pedro Alves @ 2017-10-27 10:19 UTC (permalink / raw)
To: Richard Biener, Jakub Jelinek; +Cc: GCC Patches, Richard Sandiford
On 10/27/2017 09:35 AM, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>>> that would be required?
>>
>> I think it is too early for that, we aren't LLVM or Rust that don't really
>> care about what build requirements they impose on users.
>
> That's true, which is why I asked. For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
>
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC. Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".
Right, GDB's baseline is GCC 4.8 too. When GDB was deciding whether
to start requiring full C++11 (about a year ago), we looked at the
latest stable release of all the "big" distros to see whether:
#1 - the system compiler was new enough (gcc >= 4.8), or failing
that,
#2 - whether there's an easy to install package providing a
new-enough compiler.
and it turns out that that was true for all. Meanwhile another year
has passed and there have been no complaints.
Thanks,
Pedro Alves
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-27 8:43 ` Richard Biener
2017-10-27 8:45 ` Jakub Jelinek
2017-10-27 10:19 ` Pedro Alves
@ 2017-10-27 15:23 ` Jeff Law
2 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-27 15:23 UTC (permalink / raw)
To: Richard Biener, Jakub Jelinek; +Cc: GCC Patches, Richard Sandiford
On 10/27/2017 02:35 AM, Richard Biener wrote:
> On Thu, Oct 26, 2017 at 9:43 PM, Jakub Jelinek <jakub@redhat.com> wrote:
>> On Thu, Oct 26, 2017 at 02:43:55PM +0200, Richard Biener wrote:
>>> On Thu, Oct 26, 2017 at 2:18 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Richard Biener <richard.guenther@gmail.com> writes:
>>>>> On Mon, Oct 23, 2017 at 1:22 PM, Richard Sandiford
>>>>> <richard.sandiford@linaro.org> wrote:
>>>>>> This patch adds a POD version of fixed_size_mode. The only current use
>>>>>> is for storing the __builtin_apply and __builtin_result register modes,
>>>>>> which were made fixed_size_modes by the previous patch.
>>>>>
>>>>> Bah - can we update our host compiler to C++11/14 please ...?
>>>>> (maybe requiring that build with GCC 4.8 as host compiler works,
>>>>> GCC 4.3 has -std=c++0x, but I'm quite sure that's not enough).
>>>>
>>>> That'd be great :-) It would avoid all the poly_int_pod stuff too,
>>>> and allow some clean-up of wide-int.h.
>>>
>>> Can you figure what oldest GCC release supports the C++11/14 POD handling
>>> that would be required?
>>
>> I think it is too early for that, we aren't LLVM or Rust that don't really
>> care about what build requirements they impose on users.
>
> That's true, which is why I asked. For me requiring sth newer than GCC 4.8
> would be a blocker given that's the system compiler on our latest server
> (and "stable" OSS) product.
>
> I guess it depends on the amount of pain we have going forward with C++
> use in GCC. Given that gdb already requires C++11 people building
> GCC are likely already experiencing the "issue".
It's always going to be a balancing act. Clearly we don't want to go to
something like the Rust model. But we also don't want to limit
ourselves to such old tools that we end up hacking around compiler bugs
or avoiding features that can make the codebase easier to maintain and
improve or end up depending on dusty corners of C++98/C++03
implementations that nobody else uses/tests anymore because they've
moved on to C++11.
To be more concrete, if I had to put a stake in the ground. I'd want to
pick a semi- recent version of Sun, IBM and Clang/LLVM as well as GCC.
Ideally it'd be something that supports C++11 as a language, even if the
runtime isn't fully compliant. I suspect anything older than GCC 4.8
wouldn't have enough C++11 and anything newer to not work well for the
distros (Red Hat included).
Jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [07/nn] Add unique CONSTs
2017-10-23 11:22 ` [07/nn] Add unique CONSTs Richard Sandiford
@ 2017-10-27 15:51 ` Jeff Law
2017-10-27 15:58 ` Richard Sandiford
0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-27 15:51 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:21 AM, Richard Sandiford wrote:
> This patch adds a way of treating certain kinds of CONST as unique,
> so that pointer equality is equivalent to value equality. For now it
> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
> generate them remains in the else arm of an "if (1)" until a later
> patch.
>
> This is needed so that (const (vec_duplicate xx)) can used as the
> CONSTxx_RTX of a variable-length vector.
You're brave :-) I know we looked at making CONST_INTs behave in this
manner eons ago in an effort to reduce memory consumption and it was
just plain painful. There may still be comments from that project
littering the source code.
I do wonder if we might want to revisit this again as we have better
infrastructure in place.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * rtl.h (unique_const_p): New function.
> (gen_rtx_CONST): Declare.
> * emit-rtl.c (const_hasher): New struct.
> (const_htab): New variable.
> (init_emit_once): Initialize it.
> (const_hasher::hash, const_hasher::equal): New functions.
> (gen_rtx_CONST): New function.
> (spare_vec_duplicate, spare_vec_series): New variables.
> (gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
> but disable it for now.
> (gen_const_vec_series): Likewise (const (vec_series)).
> * gengenrtl.c (special_rtx): Return true for CONST.
> * rtl.c (shared_const_p): Return true if unique_const_p.
ISTM that you need an update the rtl.texi's structure sharing
assumptions section to describe the new rules around CONSTs.
So what's the purpose of the sparc_vec_* stuff that you're going to use
in the future? It looks like a single element cache to me. Am I
missing something?
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [07/nn] Add unique CONSTs
2017-10-27 15:51 ` Jeff Law
@ 2017-10-27 15:58 ` Richard Sandiford
2017-10-30 14:49 ` Jeff Law
0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-10-27 15:58 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Jeff Law <law@redhat.com> writes:
> On 10/23/2017 05:21 AM, Richard Sandiford wrote:
>> This patch adds a way of treating certain kinds of CONST as unique,
>> so that pointer equality is equivalent to value equality. For now it
>> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
>> generate them remains in the else arm of an "if (1)" until a later
>> patch.
>>
>> This is needed so that (const (vec_duplicate xx)) can used as the
>> CONSTxx_RTX of a variable-length vector.
> You're brave :-) I know we looked at making CONST_INTs behave in this
> manner eons ago in an effort to reduce memory consumption and it was
> just plain painful. There may still be comments from that project
> littering the source code.
>
> I do wonder if we might want to revisit this again as we have better
> infrastructure in place.
For vectors it isn't so bad, since we already do the same thing
for CONST_VECTOR. Fortunately CONST_VECTOR and CONST always have
a mode, so there's no awkward sharing between modes...
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * rtl.h (unique_const_p): New function.
>> (gen_rtx_CONST): Declare.
>> * emit-rtl.c (const_hasher): New struct.
>> (const_htab): New variable.
>> (init_emit_once): Initialize it.
>> (const_hasher::hash, const_hasher::equal): New functions.
>> (gen_rtx_CONST): New function.
>> (spare_vec_duplicate, spare_vec_series): New variables.
>> (gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
>> but disable it for now.
>> (gen_const_vec_series): Likewise (const (vec_series)).
>> * gengenrtl.c (special_rtx): Return true for CONST.
>> * rtl.c (shared_const_p): Return true if unique_const_p.
> ISTM that you need an update the rtl.texi's structure sharing
> assumptions section to describe the new rules around CONSTs.
Oops, yeah. How about the attached?
> So what's the purpose of the sparc_vec_* stuff that you're going to use
> in the future? It looks like a single element cache to me. Am I
> missing something?
No, that's right. When looking up the const for (vec_duplicate x), say,
it's easier to create the vec_duplicate rtx first. But if the lookup
succeeds (and so we already have an rtx with that value), we keep the
discarded vec_duplicate around so that we can reuse it for the next
lookup.
Thanks for the reviews,
Richard
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/rtl.texi: Document rtl sharing rules.
* rtl.h (unique_const_p): New function.
(gen_rtx_CONST): Declare.
* emit-rtl.c (const_hasher): New struct.
(const_htab): New variable.
(init_emit_once): Initialize it.
(const_hasher::hash, const_hasher::equal): New functions.
(gen_rtx_CONST): New function.
(spare_vec_duplicate, spare_vec_series): New variables.
(gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
but disable it for now.
(gen_const_vec_series): Likewise (const (vec_series)).
* gengenrtl.c (special_rtx): Return true for CONST.
* rtl.c (shared_const_p): Return true if unique_const_p.
Index: gcc/doc/rtl.texi
===================================================================
--- gcc/doc/rtl.texi 2017-10-27 16:48:35.827706696 +0100
+++ gcc/doc/rtl.texi 2017-10-27 16:48:37.617270148 +0100
@@ -4197,6 +4197,20 @@ There is only one @code{pc} expression.
@item
There is only one @code{cc0} expression.
+@cindex @code{const}, RTL sharing
+@item
+There is only one instance of the following structures for a given
+@var{m}, @var{x} and @var{y}:
+@example
+(const:@var{m} (vec_duplicate:@var{m} @var{x}))
+(const:@var{m} (vec_series:@var{m} @var{x} @var{y}))
+@end example
+This means, for example, that for a given @var{n} there is only ever a
+single instance of an expression like:
+@example
+(const:V@var{n}DI (vec_duplicate:V@var{n}DI (const_int 0)))
+@end example
+
@cindex @code{const_double}, RTL sharing
@item
There is only one @code{const_double} expression with value 0 for
Index: gcc/rtl.h
===================================================================
--- gcc/rtl.h 2017-10-27 16:48:37.433286940 +0100
+++ gcc/rtl.h 2017-10-27 16:48:37.619280894 +0100
@@ -2861,6 +2861,23 @@ vec_series_p (const_rtx x, rtx *base_out
return const_vec_series_p (x, base_out, step_out);
}
+/* Return true if there should only ever be one instance of (const X),
+ so that constants of this type can be compared using pointer equality. */
+
+inline bool
+unique_const_p (const_rtx x)
+{
+ switch (GET_CODE (x))
+ {
+ case VEC_DUPLICATE:
+ case VEC_SERIES:
+ return true;
+
+ default:
+ return false;
+ }
+}
+
/* Return the unpromoted (outer) mode of SUBREG_PROMOTED_VAR_P subreg X. */
inline scalar_int_mode
@@ -3560,6 +3577,7 @@ extern rtx_insn_list *gen_rtx_INSN_LIST
gen_rtx_INSN (machine_mode mode, rtx_insn *prev_insn, rtx_insn *next_insn,
basic_block bb, rtx pattern, int location, int code,
rtx reg_notes);
+extern rtx gen_rtx_CONST (machine_mode, rtx);
extern rtx gen_rtx_CONST_INT (machine_mode, HOST_WIDE_INT);
extern rtx gen_rtx_CONST_VECTOR (machine_mode, rtvec);
extern void set_mode_and_regno (rtx, machine_mode, unsigned int);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-10-27 16:48:37.433286940 +0100
+++ gcc/emit-rtl.c 2017-10-27 16:48:37.618275521 +0100
@@ -175,6 +175,15 @@ struct const_fixed_hasher : ggc_cache_pt
static GTY ((cache)) hash_table<const_fixed_hasher> *const_fixed_htab;
+/* A hash table storing unique CONSTs. */
+struct const_hasher : ggc_cache_ptr_hash<rtx_def>
+{
+ static hashval_t hash (rtx x);
+ static bool equal (rtx x, rtx y);
+};
+
+static GTY ((cache)) hash_table<const_hasher> *const_htab;
+
#define cur_insn_uid (crtl->emit.x_cur_insn_uid)
#define cur_debug_insn_uid (crtl->emit.x_cur_debug_insn_uid)
#define first_label_num (crtl->emit.x_first_label_num)
@@ -310,6 +319,28 @@ const_fixed_hasher::equal (rtx x, rtx y)
return fixed_identical (CONST_FIXED_VALUE (a), CONST_FIXED_VALUE (b));
}
+/* Returns a hash code for X (which is either an existing unique CONST
+ or an operand to gen_rtx_CONST). */
+
+hashval_t
+const_hasher::hash (rtx x)
+{
+ if (GET_CODE (x) == CONST)
+ x = XEXP (x, 0);
+
+ int do_not_record_p = 0;
+ return hash_rtx (x, GET_MODE (x), &do_not_record_p, NULL, false);
+}
+
+/* Returns true if the operand of unique CONST X is equal to Y. */
+
+bool
+const_hasher::equal (rtx x, rtx y)
+{
+ gcc_checking_assert (GET_CODE (x) == CONST);
+ return rtx_equal_p (XEXP (x, 0), y);
+}
+
/* Return true if the given memory attributes are equal. */
bool
@@ -5772,16 +5803,55 @@ init_emit (void)
#endif
}
+rtx
+gen_rtx_CONST (machine_mode mode, rtx val)
+{
+ if (unique_const_p (val))
+ {
+ /* Look up the CONST in the hash table. */
+ rtx *slot = const_htab->find_slot (val, INSERT);
+ if (*slot == 0)
+ *slot = gen_rtx_raw_CONST (mode, val);
+ return *slot;
+ }
+
+ return gen_rtx_raw_CONST (mode, val);
+}
+
+/* Temporary rtx used by gen_const_vec_duplicate_1. */
+static GTY((deletable)) rtx spare_vec_duplicate;
+
/* Like gen_const_vec_duplicate, but ignore const_tiny_rtx. */
static rtx
gen_const_vec_duplicate_1 (machine_mode mode, rtx el)
{
int nunits = GET_MODE_NUNITS (mode);
- rtvec v = rtvec_alloc (nunits);
- for (int i = 0; i < nunits; ++i)
- RTVEC_ELT (v, i) = el;
- return gen_rtx_raw_CONST_VECTOR (mode, v);
+ if (1)
+ {
+ rtvec v = rtvec_alloc (nunits);
+
+ for (int i = 0; i < nunits; ++i)
+ RTVEC_ELT (v, i) = el;
+
+ return gen_rtx_raw_CONST_VECTOR (mode, v);
+ }
+ else
+ {
+ if (spare_vec_duplicate)
+ {
+ PUT_MODE (spare_vec_duplicate, mode);
+ XEXP (spare_vec_duplicate, 0) = el;
+ }
+ else
+ spare_vec_duplicate = gen_rtx_VEC_DUPLICATE (mode, el);
+
+ rtx res = gen_rtx_CONST (mode, spare_vec_duplicate);
+ if (XEXP (res, 0) == spare_vec_duplicate)
+ spare_vec_duplicate = NULL_RTX;
+
+ return res;
+ }
}
/* Generate a vector constant of mode MODE in which every element has
@@ -5843,6 +5913,9 @@ const_vec_series_p_1 (const_rtx x, rtx *
return true;
}
+/* Temporary rtx used by gen_const_vec_series. */
+static GTY((deletable)) rtx spare_vec_series;
+
/* Generate a vector constant of mode MODE in which element I has
the value BASE + I * STEP. */
@@ -5852,13 +5925,33 @@ gen_const_vec_series (machine_mode mode,
gcc_assert (CONSTANT_P (base) && CONSTANT_P (step));
int nunits = GET_MODE_NUNITS (mode);
- rtvec v = rtvec_alloc (nunits);
- scalar_mode inner_mode = GET_MODE_INNER (mode);
- RTVEC_ELT (v, 0) = base;
- for (int i = 1; i < nunits; ++i)
- RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
- RTVEC_ELT (v, i - 1), step);
- return gen_rtx_raw_CONST_VECTOR (mode, v);
+ if (1)
+ {
+ rtvec v = rtvec_alloc (nunits);
+ scalar_mode inner_mode = GET_MODE_INNER (mode);
+ RTVEC_ELT (v, 0) = base;
+ for (int i = 1; i < nunits; ++i)
+ RTVEC_ELT (v, i) = simplify_gen_binary (PLUS, inner_mode,
+ RTVEC_ELT (v, i - 1), step);
+ return gen_rtx_raw_CONST_VECTOR (mode, v);
+ }
+ else
+ {
+ if (spare_vec_series)
+ {
+ PUT_MODE (spare_vec_series, mode);
+ XEXP (spare_vec_series, 0) = base;
+ XEXP (spare_vec_series, 1) = step;
+ }
+ else
+ spare_vec_series = gen_rtx_VEC_SERIES (mode, base, step);
+
+ rtx res = gen_rtx_CONST (mode, spare_vec_series);
+ if (XEXP (res, 0) == spare_vec_series)
+ spare_vec_series = NULL_RTX;
+
+ return res;
+ }
}
/* Generate a vector of mode MODE in which element I has the value
@@ -6016,6 +6109,8 @@ init_emit_once (void)
reg_attrs_htab = hash_table<reg_attr_hasher>::create_ggc (37);
+ const_htab = hash_table<const_hasher>::create_ggc (37);
+
#ifdef INIT_EXPANDERS
/* This is to initialize {init|mark|free}_machine_status before the first
call to push_function_context_to. This is needed by the Chill front
Index: gcc/gengenrtl.c
===================================================================
--- gcc/gengenrtl.c 2017-10-27 16:48:37.433286940 +0100
+++ gcc/gengenrtl.c 2017-10-27 16:48:37.618275521 +0100
@@ -143,7 +143,8 @@ special_rtx (int idx)
|| strcmp (defs[idx].enumname, "CC0") == 0
|| strcmp (defs[idx].enumname, "RETURN") == 0
|| strcmp (defs[idx].enumname, "SIMPLE_RETURN") == 0
- || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0);
+ || strcmp (defs[idx].enumname, "CONST_VECTOR") == 0
+ || strcmp (defs[idx].enumname, "CONST") == 0);
}
/* Return nonzero if the RTL code given by index IDX is one that we should
Index: gcc/rtl.c
===================================================================
--- gcc/rtl.c 2017-10-27 16:48:37.433286940 +0100
+++ gcc/rtl.c 2017-10-27 16:48:37.618275521 +0100
@@ -252,6 +252,9 @@ shared_const_p (const_rtx orig)
{
gcc_assert (GET_CODE (orig) == CONST);
+ if (unique_const_p (XEXP (orig, 0)))
+ return true;
+
/* CONST can be shared if it contains a SYMBOL_REF. If it contains
a LABEL_REF, it isn't sharable. */
return (GET_CODE (XEXP (orig, 0)) == PLUS
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [01/nn] Add gen_(const_)vec_duplicate helpers
2017-10-25 16:29 ` Jeff Law
@ 2017-10-27 16:12 ` Richard Sandiford
0 siblings, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-27 16:12 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Jeff Law <law@redhat.com> writes:
> On 10/23/2017 05:16 AM, Richard Sandiford wrote:
>> This patch adds helper functions for generating constant and
>> non-constant vector duplicates. These routines help with SVE because
>> it is then easier to use:
>>
>> (const:M (vec_duplicate:M X))
>>
>> for a broadcast of X, even if the number of elements in M isn't known
>> at compile time. It also makes it easier for general rtx code to treat
>> constant and non-constant duplicates in the same way.
>>
>> In the target code, the patch uses gen_vec_duplicate instead of
>> gen_rtx_VEC_DUPLICATE if handling constants correctly is potentially
>> useful. It might be that some or all of the call sites only handle
>> non-constants in practice, in which case the change is a harmless
>> no-op (and a saving of a few characters).
>>
>> Otherwise, the target changes use gen_const_vec_duplicate instead
>> of gen_rtx_CONST_VECTOR if the constant is obviously a duplicate.
>> They also include some changes to use CONSTxx_RTX for easy global
>> constants.
>>
>>
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * emit-rtl.h (gen_const_vec_duplicate): Declare.
>> (gen_vec_duplicate): Likewise.
>> * emit-rtl.c (gen_const_vec_duplicate_1): New function, split
>> out from...
>> (gen_const_vector): ...here.
>> (gen_const_vec_duplicate, gen_vec_duplicate): New functions.
>> (gen_rtx_CONST_VECTOR): Use gen_const_vec_duplicate for constants
>> whose elements are all equal.
>> * optabs.c (expand_vector_broadcast): Use gen_const_vec_duplicate.
>> * simplify-rtx.c (simplify_const_unary_operation): Likewise.
>> (simplify_relational_operation): Likewise.
>> * config/aarch64/aarch64.c (aarch64_simd_gen_const_vector_dup):
>> Likewise.
>> (aarch64_simd_dup_constant): Use gen_vec_duplicate.
>> (aarch64_expand_vector_init): Likewise.
>> * config/arm/arm.c (neon_vdup_constant): Likewise.
>> (neon_expand_vector_init): Likewise.
>> (arm_expand_vec_perm): Use gen_const_vec_duplicate.
>> (arm_block_set_unaligned_vect): Likewise.
>> (arm_block_set_aligned_vect): Likewise.
>> * config/arm/neon.md (neon_copysignf<mode>): Likewise.
>> * config/i386/i386.c (ix86_expand_vec_perm): Likewise.
>> (expand_vec_perm_even_odd_pack): Likewise.
>> (ix86_vector_duplicate_value): Use gen_vec_duplicate.
>> * config/i386/sse.md (one_cmpl<mode>2): Use CONSTM1_RTX.
>> * config/ia64/ia64.c (ia64_expand_vecint_compare): Use
>> gen_const_vec_duplicate.
>> * config/ia64/vect.md (addv2sf3, subv2sf3): Use CONST1_RTX.
>> * config/mips/mips.c (mips_gen_const_int_vector): Use
>> gen_const_vec_duplicate.
>> (mips_expand_vector_init): Use CONST0_RTX.
>> * config/powerpcspe/altivec.md (abs<mode>2, nabs<mode>2): Likewise.
>> (define_split): Use gen_const_vec_duplicate.
>> * config/rs6000/altivec.md (abs<mode>2, nabs<mode>2): Use CONST0_RTX.
>> (define_split): Use gen_const_vec_duplicate.
>> * config/s390/vx-builtins.md (vec_genmask<mode>): Likewise.
>> (vec_ctd_s64, vec_ctd_u64, vec_ctsl, vec_ctul): Likewise.
>> * config/spu/spu.c (spu_const): Likewise.
> I'd started looking at this a couple times when it was originally
> submitted, but never seemed to get through it. It seems like a nice
> cleanup.
>
> So in gen_const_vector we had an assert to verify that const_tiny_rtx
> was set up. That seems to have been lost. It's probably not a big
> deal, but I mention it in case the loss was unintentional.
This morphed into:
+static rtx
+gen_const_vector (machine_mode mode, int constant)
+{
+ machine_mode inner = GET_MODE_INNER (mode);
+
+ gcc_assert (!DECIMAL_FLOAT_MODE_P (inner));
+
+ rtx el = const_tiny_rtx[constant][(int) inner];
+ gcc_assert (el);
but it wasn't obvious due to the way the unified diff mixed up the
functions. I should have posted that one as context, sorry...
Richard
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [03/nn] Allow vector CONSTs
2017-10-25 16:59 ` Jeff Law
@ 2017-10-27 16:19 ` Richard Sandiford
0 siblings, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-27 16:19 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches
Jeff Law <law@redhat.com> writes:
> On 10/23/2017 05:18 AM, Richard Sandiford wrote:
>> This patch allows (const ...) wrappers to be used for rtx vector
>> constants, as an alternative to const_vector. This is useful
>> for SVE, where the number of elements isn't known until runtime.
> Right. It's constant, but not knowable at compile time. That seems an
> exact match for how we've used CONST.
>
>>
>> It could also be useful in future for fixed-length vectors, to
>> reduce the amount of memory needed to represent simple constants
>> with high element counts. However, one nice thing about keeping
>> it restricted to variable-length vectors is that there is never
>> any need to handle combinations of (const ...) and CONST_VECTOR.
> Yea, but is the memory consumption of these large vectors a real
> problem? I suspect, relative to other memory issues they're in the noise.
Yeah, maybe not, especially since the elements themselves are shared.
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * doc/rtl.texi (const): Update description of address constants.
>> Say that vector constants are allowed too.
>> * common.md (E, F): Use CONSTANT_P instead of checking for
>> CONST_VECTOR.
>> * emit-rtl.c (gen_lowpart_common): Use const_vec_p instead of
>> checking for CONST_VECTOR.
>> * expmed.c (make_tree): Use build_vector_from_val for a CONST
>> VEC_DUPLICATE.
>> * expr.c (expand_expr_real_2): Check for vector modes instead
>> of checking for CONST_VECTOR.
>> * rtl.h (const_vec_p): New function.
>> (const_vec_duplicate_p): Check for a CONST VEC_DUPLICATE.
>> (unwrap_const_vec_duplicate): Handle them here too.
> My only worry here is code that is a bit loose in checking for a CONST,
> but not the innards and perhaps isn't prepared for for the new forms
> that appear inside the CONST.
>
> If we have such problems I'd expect it's in the targets as the targets
> have traditionally have had to validate the innards of a CONST to ensure
> it could be handled by the assembler/linker. Hmm, that may save the
> targets since they'd likely need an update to LEGITIMATE_CONSTANT_P to
> ever see these new forms.
>
> Presumably an aarch64 specific patch to recognize these as valid
> constants in LEGITIMATE_CONSTANT_P is in the works?
Yeah, via the const_vec_duplicate_p helper. For the default
variable-length mode of SVE we use the (const ...) while for the
fixed-length mode we use (const_vector ...) as normal. Advanced SIMD
always uses (const_vector ...).
> OK for the trunk.
>
> jeff
Thanks,
Richard
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-26 19:42 ` Eric Botcazou
2017-10-27 8:34 ` Richard Biener
@ 2017-10-30 3:14 ` Trevor Saunders
2017-10-30 8:52 ` Richard Sandiford
2017-10-30 10:13 ` Eric Botcazou
1 sibling, 2 replies; 90+ messages in thread
From: Trevor Saunders @ 2017-10-30 3:14 UTC (permalink / raw)
To: Eric Botcazou; +Cc: Richard Biener, gcc-patches, Richard Sandiford
On Thu, Oct 26, 2017 at 09:37:31PM +0200, Eric Botcazou wrote:
> > Can you figure what oldest GCC release supports the C++11/14 POD handling
> > that would be required?
>
> GCC needs to be buildable by other compilers than itself though.
It sounds like people are mostly concerned about sun studio and xlc? It
doesn't seem that hard to provide precompiled binaries for those two
platforms, and maybe 4.8 binaries for people who want to compile theire
own gcc from source. If that would be enough to deal with people
concerns it seems doable by next stage 1?
Trev
>
> --
> Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-30 3:14 ` Trevor Saunders
@ 2017-10-30 8:52 ` Richard Sandiford
2017-10-30 10:13 ` Eric Botcazou
1 sibling, 0 replies; 90+ messages in thread
From: Richard Sandiford @ 2017-10-30 8:52 UTC (permalink / raw)
To: Trevor Saunders; +Cc: Eric Botcazou, Richard Biener, gcc-patches
Trevor Saunders <tbsaunde@tbsaunde.org> writes:
> On Thu, Oct 26, 2017 at 09:37:31PM +0200, Eric Botcazou wrote:
>> > Can you figure what oldest GCC release supports the C++11/14 POD handling
>> > that would be required?
>>
>> GCC needs to be buildable by other compilers than itself though.
>
> It sounds like people are mostly concerned about sun studio and xlc? It
> doesn't seem that hard to provide precompiled binaries for those two
> platforms, and maybe 4.8 binaries for people who want to compile theire
> own gcc from source. If that would be enough to deal with people
> concerns it seems doable by next stage 1?
Would it be worth supporting a 4-stage bootstrap, with stage 0 being
built from older gcc sources? We could include a contrib/ script that
downloads sources for gcc-4.7 or whatever and patches it to build with
modern as well as old compilers. (When I tried gcc-4.7 last week,
I needed a couple of tweaks to get it to build.)
Not that I'd have time try that before GCC 9...
Thanks,
Richard
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-30 3:14 ` Trevor Saunders
2017-10-30 8:52 ` Richard Sandiford
@ 2017-10-30 10:13 ` Eric Botcazou
2017-10-31 10:39 ` Trevor Saunders
1 sibling, 1 reply; 90+ messages in thread
From: Eric Botcazou @ 2017-10-30 10:13 UTC (permalink / raw)
To: Trevor Saunders; +Cc: gcc-patches, Richard Biener, Richard Sandiford
> It sounds like people are mostly concerned about sun studio and xlc? It
> doesn't seem that hard to provide precompiled binaries for those two
> platforms, and maybe 4.8 binaries for people who want to compile theire
> own gcc from source.
I'm not sure that we want to enter the business of precompiled binaries.
Moreover, if we want people to contribute to GCC's development, especially
occasionally to fix a couple of bugs, we need to make it easier to build the
compiler, not the other way around.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [07/nn] Add unique CONSTs
2017-10-27 15:58 ` Richard Sandiford
@ 2017-10-30 14:49 ` Jeff Law
0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 14:49 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/27/2017 09:56 AM, Richard Sandiford wrote:
> Jeff Law <law@redhat.com> writes:
>> On 10/23/2017 05:21 AM, Richard Sandiford wrote:
>>> This patch adds a way of treating certain kinds of CONST as unique,
>>> so that pointer equality is equivalent to value equality. For now it
>>> is restricted to VEC_DUPLICATE and VEC_SERIES, although the code to
>>> generate them remains in the else arm of an "if (1)" until a later
>>> patch.
>>>
>>> This is needed so that (const (vec_duplicate xx)) can used as the
>>> CONSTxx_RTX of a variable-length vector.
>> You're brave :-) I know we looked at making CONST_INTs behave in this
>> manner eons ago in an effort to reduce memory consumption and it was
>> just plain painful. There may still be comments from that project
>> littering the source code.
>>
>> I do wonder if we might want to revisit this again as we have better
>> infrastructure in place.
>
> For vectors it isn't so bad, since we already do the same thing
> for CONST_VECTOR. Fortunately CONST_VECTOR and CONST always have
> a mode, so there's no awkward sharing between modes...
>
>>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>>> Alan Hayward <alan.hayward@arm.com>
>>> David Sherwood <david.sherwood@arm.com>
>>>
>>> gcc/
>>> * rtl.h (unique_const_p): New function.
>>> (gen_rtx_CONST): Declare.
>>> * emit-rtl.c (const_hasher): New struct.
>>> (const_htab): New variable.
>>> (init_emit_once): Initialize it.
>>> (const_hasher::hash, const_hasher::equal): New functions.
>>> (gen_rtx_CONST): New function.
>>> (spare_vec_duplicate, spare_vec_series): New variables.
>>> (gen_const_vec_duplicate_1): Add code for use (const (vec_duplicate)),
>>> but disable it for now.
>>> (gen_const_vec_series): Likewise (const (vec_series)).
>>> * gengenrtl.c (special_rtx): Return true for CONST.
>>> * rtl.c (shared_const_p): Return true if unique_const_p.
>> ISTM that you need an update the rtl.texi's structure sharing
>> assumptions section to describe the new rules around CONSTs.
>
> Oops, yeah. How about the attached?
OK.
>
>> So what's the purpose of the sparc_vec_* stuff that you're going to use
>> in the future? It looks like a single element cache to me. Am I
>> missing something?
>
> No, that's right. When looking up the const for (vec_duplicate x), say,
> it's easier to create the vec_duplicate rtx first. But if the lookup
> succeeds (and so we already have an rtx with that value), we keep the
> discarded vec_duplicate around so that we can reuse it for the next
> lookup.
OK.
Jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [14/nn] Add helpers for shift count modes
2017-10-26 12:07 ` Richard Biener
2017-10-26 12:07 ` Richard Biener
@ 2017-10-30 15:03 ` Jeff Law
1 sibling, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 15:03 UTC (permalink / raw)
To: Richard Biener, GCC Patches, Richard Sandiford
On 10/26/2017 06:06 AM, Richard Biener wrote:
> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> This patch adds a stub helper routine to provide the mode
>> of a scalar shift amount, given the mode of the values
>> being shifted.
>>
>> One long-standing problem has been to decide what this mode
>> should be for arbitrary rtxes (as opposed to those directly
>> tied to a target pattern). Is it the mode of the shifted
>> elements? Is it word_mode? Or maybe QImode? Is it whatever
>> the corresponding target pattern says? (In which case what
>> should the mode be when the target doesn't have a pattern?)
>>
>> For now the patch picks word_mode, which should be safe on
>> all targets but could perhaps become suboptimal if the helper
>> routine is used more often than it is in this patch. As it
>> stands the patch does not change the generated code.
>>
>> The patch also adds a helper function that constructs rtxes
>> for constant shift amounts, again given the mode of the value
>> being shifted. As well as helping with the SVE patches, this
>> is one step towards allowing CONST_INTs to have a real mode.
>
> I think gen_shift_amount_mode is flawed and while encapsulating
> constant shift amount RTX generation into a gen_int_shift_amount
> looks good to me I'd rather have that ??? in this function (and
> I'd use the mode of the RTX shifted, not word_mode...).
>
> In the end it's up to insn recognizing to convert the op to the
> expected mode and for generic RTL it's us that should decide
> on the mode -- on GENERIC the shift amount has to be an
> integer so why not simply use a mode that is large enough to
> make the constant fit?
>
> Just throwing in some comments here, RTL isn't my primary
> expertise.
I wonder if encapsulation + a target hook to specify the mode would be
better? We'd then have to argue over word_mode, vs QImode vs something
else for the default, but at least we'd have a way for the target to
specify the mode is generally best when working on shift counts.
In the end I doubt there's a single definition that is overall better.
Largely because I suspect there are times when the narrowest mode is
best, or the mode of the operand being shifted.
So thoughts on doing the encapsulation with a target hook to specify the
desired mode? Does that get us what we need for SVE and does it provide
us a path forward on this issue if we were to try to move towards
CONST_INTs with modes?
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [11/nn] Add narrower_subreg_mode helper function
2017-10-23 11:24 ` [11/nn] Add narrower_subreg_mode helper function Richard Sandiford
@ 2017-10-30 15:06 ` Jeff Law
0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 15:06 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:24 AM, Richard Sandiford wrote:
> This patch adds a narrowing equivalent of wider_subreg_mode. At present
> there is only one user.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * rtl.h (narrower_subreg_mode): New function.
> * ira-color.c (update_costs_from_allocno): Use it.
OK. I'm going to assume further uses will show up :-)
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool
2017-10-23 11:30 ` [20/nn] Make tree-ssa-dse.c:normalize_ref return a bool Richard Sandiford
@ 2017-10-30 17:49 ` Jeff Law
0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 17:49 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:29 AM, Richard Sandiford wrote:
> This patch moves the check for an overlapping byte to normalize_ref
> from its callers, so that it's easier to convert to poly_ints later.
> It's not really worth it on its own.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>
> gcc/
> * tree-ssa-dse.c (normalize_ref): Check whether the ranges overlap
> and return false if not.
> (clear_bytes_written_by, live_bytes_read): Update accordingly.
OK.
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [10/nn] Widening optab cleanup
2017-10-23 11:24 ` [10/nn] Widening optab cleanup Richard Sandiford
@ 2017-10-30 18:32 ` Jeff Law
0 siblings, 0 replies; 90+ messages in thread
From: Jeff Law @ 2017-10-30 18:32 UTC (permalink / raw)
To: gcc-patches, richard.sandiford
On 10/23/2017 05:23 AM, Richard Sandiford wrote:
> widening_optab_handler had the comment:
>
> /* ??? Why does find_widening_optab_handler_and_mode attempt to
> widen things that can't be widened? E.g. add_optab... */
> if (op > LAST_CONV_OPTAB)
> return CODE_FOR_nothing;
>
> I think it comes from expand_binop using
> find_widening_optab_handler_and_mode for two things: to test whether
> a "normal" optab like add_optab is supported for a standard binary
> operation and to test whether a "convert" optab is supported for a
> widening operation like umul_widen_optab. In the former case from_mode
> and to_mode must be the same, in the latter from_mode must be narrower
> than to_mode.
>
> For the former case, find_widening_optab_handler_and_mode is only really
> testing the modes that are passed in. permit_non_widening must be true
> here.
>
> For the latter case, find_widening_optab_handler_and_mode should only
> really consider new from_modes that are wider than the original
> from_mode and narrower than the original to_mode. Logically
> permit_non_widening should be false, since widening optabs aren't
> supposed to take operands that are the same width as the destination.
> We get away with permit_non_widening being true because no target
> would/should define a widening .md pattern with matching modes.
>
> But really, it seems better for expand_binop to handle these two
> cases itself rather than pushing them down. With that change,
> find_widening_optab_handler_and_mode is only ever called with
> permit_non_widening set to false and is only ever called with
> a "proper" convert optab. We then no longer need widening_optab_handler,
> we can just use convert_optab_handler directly.
>
> The patch also passes the instruction code down to expand_binop_directly.
> This should be more efficient and removes an extra call to
> find_widening_optab_handler_and_mode.
>
>
> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * optabs-query.h (convert_optab_p): New function, split out from...
> (convert_optab_handler): ...here.
> (widening_optab_handler): Delete.
> (find_widening_optab_handler): Remove permit_non_widening parameter.
> (find_widening_optab_handler_and_mode): Likewise. Provide an
> override that operates on mode class wrappers.
> * optabs-query.c (widening_optab_handler): Delete.
> (find_widening_optab_handler_and_mode): Remove permit_non_widening
> parameter. Assert that the two modes are the same class and that
> the "from" mode is narrower than the "to" mode. Use
> convert_optab_handler instead of widening_optab_handler.
> * expmed.c (expmed_mult_highpart_optab): Use convert_optab_handler
> instead of widening_optab_handler.
> * expr.c (expand_expr_real_2): Update calls to
> find_widening_optab_handler.
> * optabs.c (expand_widen_pattern_expr): Likewise.
> (expand_binop_directly): Take the insn_code as a parameter.
> (expand_binop): Only call find_widening_optab_handler for
> conversion optabs; use optab_handler otherwise. Update calls
> to find_widening_optab_handler and expand_binop_directly.
> Use convert_optab_handler instead of widening_optab_handler.
> * tree-ssa-math-opts.c (convert_mult_to_widen): Update calls to
> find_widening_optab_handler and use scalar_mode rather than
> machine_mode.
> (convert_plusminus_to_widen): Likewise.
OK.
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-30 10:13 ` Eric Botcazou
@ 2017-10-31 10:39 ` Trevor Saunders
2017-10-31 17:29 ` Eric Botcazou
0 siblings, 1 reply; 90+ messages in thread
From: Trevor Saunders @ 2017-10-31 10:39 UTC (permalink / raw)
To: Eric Botcazou; +Cc: gcc-patches, Richard Biener, Richard Sandiford
On Mon, Oct 30, 2017 at 11:11:12AM +0100, Eric Botcazou wrote:
> > It sounds like people are mostly concerned about sun studio and xlc? It
> > doesn't seem that hard to provide precompiled binaries for those two
> > platforms, and maybe 4.8 binaries for people who want to compile theire
> > own gcc from source.
>
> I'm not sure that we want to enter the business of precompiled binaries.
I don't see a reason not to other than a pretty small amount of work
each time we make a release.
> Moreover, if we want people to contribute to GCC's development, especially
> occasionally to fix a couple of bugs, we need to make it easier to build the
> compiler, not the other way around.
Well first this would only matter to the 0.01% of people who want to do
that on AIX or Solaris machines, not the vast majority of possible
contributors who already use clang or gcc as there system compiler.
Secondly downloading a tarball isn't very difficult and arguably
providing them makes it easier for people to test gcc on those systems
without having to build it themselves.
Thirdly making it easier to work on the compiler and understand it makes
things easier for those possible contributors, so if being able to use
C++11 advances that goalthings could be better over all for possible
contributors with different system compilers.
Trev
>
> --
> Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-31 10:39 ` Trevor Saunders
@ 2017-10-31 17:29 ` Eric Botcazou
2017-10-31 17:57 ` Jeff Law
0 siblings, 1 reply; 90+ messages in thread
From: Eric Botcazou @ 2017-10-31 17:29 UTC (permalink / raw)
To: Trevor Saunders; +Cc: gcc-patches, Richard Biener, Richard Sandiford
> I don't see a reason not to other than a pretty small amount of work
> each time we make a release.
I'm not sure it would be so small an amount of work, especially on non-Linux
platforms, so this would IMO divert our resources for little benefit.
> Well first this would only matter to the 0.01% of people who want to do
> that on AIX or Solaris machines, not the vast majority of possible
> contributors who already use clang or gcc as there system compiler.
Yes, but we're GCC, not Clang, and we support more than Linux and Darwin.
> Thirdly making it easier to work on the compiler and understand it makes
> things easier for those possible contributors, so if being able to use
> C++11 advances that goalthings could be better over all for possible
> contributors with different system compilers.
I don't buy this at all. You don't need bleeding edge C++ features to build a
compiler and people don't work on compilers to use bleeding edge C++. Using a
narrow and sensible set of C++ features was one of the conditions under which
the switch to C++ as implementation language was accepted at the time.
--
Eric Botcazou
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-31 17:29 ` Eric Botcazou
@ 2017-10-31 17:57 ` Jeff Law
2017-11-01 2:50 ` Trevor Saunders
0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-10-31 17:57 UTC (permalink / raw)
To: Eric Botcazou, Trevor Saunders
Cc: gcc-patches, Richard Biener, Richard Sandiford
On 10/31/2017 11:22 AM, Eric Botcazou wrote:
>> I don't see a reason not to other than a pretty small amount of work
>> each time we make a release.
>
> I'm not sure it would be so small an amount of work, especially on non-Linux
> platforms, so this would IMO divert our resources for little benefit.
Having done this for years on HPUX, yes, it takes more time than one
could imagine. THen I went to work for a company that did this for
hpux, solaris, aix, irix and others and well, it was very painful.
>
>> Well first this would only matter to the 0.01% of people who want to do
>> that on AIX or Solaris machines, not the vast majority of possible
>> contributors who already use clang or gcc as there system compiler.
>
> Yes, but we're GCC, not Clang, and we support more than Linux and Darwin.
Very true.
>
>> Thirdly making it easier to work on the compiler and understand it makes
>> things easier for those possible contributors, so if being able to use
>> C++11 advances that goalthings could be better over all for possible
>> contributors with different system compilers.
>
> I don't buy this at all. You don't need bleeding edge C++ features to build a
> compiler and people don't work on compilers to use bleeding edge C++. Using a
> narrow and sensible set of C++ features was one of the conditions under which
> the switch to C++ as implementation language was accepted at the time.
Agreed that we need to stick with a sensible set of features. But the
sensible set isn't necessarily fixed forever.
Jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-10-31 17:57 ` Jeff Law
@ 2017-11-01 2:50 ` Trevor Saunders
2017-11-01 16:30 ` Jeff Law
0 siblings, 1 reply; 90+ messages in thread
From: Trevor Saunders @ 2017-11-01 2:50 UTC (permalink / raw)
To: Jeff Law; +Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford
On Tue, Oct 31, 2017 at 11:38:48AM -0600, Jeff Law wrote:
> On 10/31/2017 11:22 AM, Eric Botcazou wrote:
> >> I don't see a reason not to other than a pretty small amount of work
> >> each time we make a release.
> >
> > I'm not sure it would be so small an amount of work, especially on non-Linux
> > platforms, so this would IMO divert our resources for little benefit.
> Having done this for years on HPUX, yes, it takes more time than one
> could imagine. THen I went to work for a company that did this for
> hpux, solaris, aix, irix and others and well, it was very painful.
I'm sure its a project one can spend arbitrary amounts of time on if one
wishes or is payed to do so. That said I'm considering the scope here
limitted to running configure / make / make install with the defaults
and taring up the result. I'll admitt I've only done that on linux
where it was easy, but people do keep AIX and Solaris building and they
really are supposed to be buildable in a release. However at some point
it can be less work to do this than to beat C++98 into doing what is
desired.
> >> Well first this would only matter to the 0.01% of people who want to do
> >> that on AIX or Solaris machines, not the vast majority of possible
> >> contributors who already use clang or gcc as there system compiler.
> >
> > Yes, but we're GCC, not Clang, and we support more than Linux and Darwin.
> Very true.
certainly, but I think it makes sense to understand how many people
might be negatively effected by a change, and to what degree before
making that decision.
> >> Thirdly making it easier to work on the compiler and understand it makes
> >> things easier for those possible contributors, so if being able to use
> >> C++11 advances that goalthings could be better over all for possible
> >> contributors with different system compilers.
> >
> > I don't buy this at all. You don't need bleeding edge C++ features to build a
> > compiler and people don't work on compilers to use bleeding edge C++. Using a
> > narrow and sensible set of C++ features was one of the conditions under which
> > the switch to C++ as implementation language was accepted at the time.
> Agreed that we need to stick with a sensible set of features. But the
> sensible set isn't necessarily fixed forever.
Also as a counter example what brought this thread up is Richard wanting
to use something from C++11. So in that particular case it probably
would make something better.
thanks
Trev
>
> Jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-11-01 2:50 ` Trevor Saunders
@ 2017-11-01 16:30 ` Jeff Law
2017-11-02 4:28 ` Trevor Saunders
0 siblings, 1 reply; 90+ messages in thread
From: Jeff Law @ 2017-11-01 16:30 UTC (permalink / raw)
To: Trevor Saunders
Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford
On 10/31/2017 08:47 PM, Trevor Saunders wrote:
> On Tue, Oct 31, 2017 at 11:38:48AM -0600, Jeff Law wrote:
>> On 10/31/2017 11:22 AM, Eric Botcazou wrote:
>>>> I don't see a reason not to other than a pretty small amount of work
>>>> each time we make a release.
>>>
>>> I'm not sure it would be so small an amount of work, especially on non-Linux
>>> platforms, so this would IMO divert our resources for little benefit.
>> Having done this for years on HPUX, yes, it takes more time than one
>> could imagine. THen I went to work for a company that did this for
>> hpux, solaris, aix, irix and others and well, it was very painful.
>
> I'm sure its a project one can spend arbitrary amounts of time on if one
> wishes or is payed to do so. That said I'm considering the scope here
> limitted to running configure / make / make install with the defaults
> and taring up the result. I'll admitt I've only done that on linux
> where it was easy, but people do keep AIX and Solaris building and they
> really are supposed to be buildable in a release. However at some point
> it can be less work to do this than to beat C++98 into doing what is
> desired.
It sounds so easy, but it does get more complex than just build and tar
the result up. How (for example) do you handle DSOs that may or may not
be on the system where the bits get installed. Do you embed them or
tell the user to go get them? That's just one example of a gotcha,
there's many.
It's really not something I'd suggest we pursue all that deeply. Been
there, done that wouldn't want to do it again.
>>>> Thirdly making it easier to work on the compiler and understand it makes
>>>> things easier for those possible contributors, so if being able to use
>>>> C++11 advances that goalthings could be better over all for possible
>>>> contributors with different system compilers.
>>>
>>> I don't buy this at all. You don't need bleeding edge C++ features to build a
>>> compiler and people don't work on compilers to use bleeding edge C++. Using a
>>> narrow and sensible set of C++ features was one of the conditions under which
>>> the switch to C++ as implementation language was accepted at the time.
>> Agreed that we need to stick with a sensible set of features. But the
>> sensible set isn't necessarily fixed forever.
>
> Also as a counter example what brought this thread up is Richard wanting
> to use something from C++11. So in that particular case it probably
> would make something better.
In my particular case I could use certain C++11 features to make the
code cleaner/easier to prove right -- particularly rvalue references and
move semantics. I've got an object with a chunk of allocated memory. I
want to move ownership of the memory to another object.
C++11 handles this cleanly and gracefully and in doing so makes it very
hard to get it wrong.
However, I don't think my case, in and of itself, is enough to push us
into the C++11 world. Nor am I convinced that the aggregate of these
things is enough to push us into the C++11 world. But I do think we'll
be there at some point.
jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [09/nn] Add a fixed_size_mode_pod class
2017-11-01 16:30 ` Jeff Law
@ 2017-11-02 4:28 ` Trevor Saunders
0 siblings, 0 replies; 90+ messages in thread
From: Trevor Saunders @ 2017-11-02 4:28 UTC (permalink / raw)
To: Jeff Law; +Cc: Eric Botcazou, gcc-patches, Richard Biener, Richard Sandiford
On Wed, Nov 01, 2017 at 10:30:29AM -0600, Jeff Law wrote:
> On 10/31/2017 08:47 PM, Trevor Saunders wrote:
> > On Tue, Oct 31, 2017 at 11:38:48AM -0600, Jeff Law wrote:
> > > On 10/31/2017 11:22 AM, Eric Botcazou wrote:
> > > > > I don't see a reason not to other than a pretty small amount of work
> > > > > each time we make a release.
> > > >
> > > > I'm not sure it would be so small an amount of work, especially on non-Linux
> > > > platforms, so this would IMO divert our resources for little benefit.
> > > Having done this for years on HPUX, yes, it takes more time than one
> > > could imagine. THen I went to work for a company that did this for
> > > hpux, solaris, aix, irix and others and well, it was very painful.
> >
> > I'm sure its a project one can spend arbitrary amounts of time on if one
> > wishes or is payed to do so. That said I'm considering the scope here
> > limitted to running configure / make / make install with the defaults
> > and taring up the result. I'll admitt I've only done that on linux
> > where it was easy, but people do keep AIX and Solaris building and they
> > really are supposed to be buildable in a release. However at some point
> > it can be less work to do this than to beat C++98 into doing what is
> > desired.
> It sounds so easy, but it does get more complex than just build and tar the
> result up. How (for example) do you handle DSOs that may or may not be on
> the system where the bits get installed. Do you embed them or tell the user
> to go get them? That's just one example of a gotcha, there's many.
>
> It's really not something I'd suggest we pursue all that deeply. Been
> there, done that wouldn't want to do it again.
>
> > > > > Thirdly making it easier to work on the compiler and understand it makes
> > > > > things easier for those possible contributors, so if being able to use
> > > > > C++11 advances that goalthings could be better over all for possible
> > > > > contributors with different system compilers.
> > > >
> > > > I don't buy this at all. You don't need bleeding edge C++ features to build a
> > > > compiler and people don't work on compilers to use bleeding edge C++. Using a
> > > > narrow and sensible set of C++ features was one of the conditions under which
> > > > the switch to C++ as implementation language was accepted at the time.
> > > Agreed that we need to stick with a sensible set of features. But the
> > > sensible set isn't necessarily fixed forever.
> >
> > Also as a counter example what brought this thread up is Richard wanting
> > to use something from C++11. So in that particular case it probably
> > would make something better.
> In my particular case I could use certain C++11 features to make the code
> cleaner/easier to prove right -- particularly rvalue references and move
> semantics. I've got an object with a chunk of allocated memory. I want to
> move ownership of the memory to another object.
>
> C++11 handles this cleanly and gracefully and in doing so makes it very hard
> to get it wrong.
You may want to look at how the unique_ptr shim deals with that, though
maybe you don't want to copy the ifdef hackery to actually use rval refs
when possible.
Trev
>
> However, I don't think my case, in and of itself, is enough to push us into
> the C++11 world. Nor am I convinced that the aggregate of these things is
> enough to push us into the C++11 world. But I do think we'll be there at
> some point.
>
> jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
2017-10-26 11:53 ` Richard Biener
@ 2017-11-06 15:09 ` Richard Sandiford
2017-11-07 10:37 ` Richard Biener
0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-11-06 15:09 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
> <richard.sandiford@linaro.org> wrote:
>> SVE needs a way of broadcasting a scalar to a variable-length vector.
>> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
>> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
>> be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree
>> equivalent of the existing rtl code VEC_DUPLICATE.
>>
>> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
>> to mark constant nodes, but in response to last year's RFC, Richard B.
>> suggested it would be better to have separate codes for the constant
>> and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated
>> as a normal unary operation and avoids the previous need for treating
>> it as a GIMPLE_SINGLE_RHS.
>>
>> It might make sense to use VEC_DUPLICATE_CST for all duplicated
>> vector constants, since it's a bit more compact than VECTOR_CST
>> in that case, and is potentially more efficient to process.
>> However, the nice thing about keeping it restricted to variable-length
>> vectors is that there is then no need to handle combinations of
>> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
>> VECTOR_CST or never use it.
>>
>> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100
> +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
> ssa_uniform_vector_p (tree op)
> {
> if (TREE_CODE (op) == VECTOR_CST
> + || TREE_CODE (op) == VEC_DUPLICATE_CST
> || TREE_CODE (op) == CONSTRUCTOR)
> return uniform_vector_p (op);
>
> VEC_DUPLICATE_EXPR handling?
Oops, yeah. I could have sworn it was there at one time...
> Looks like for VEC_DUPLICATE_CST it could directly return true.
The function is a bit misnamed: it returns the duplicated tree value
rather than a bool.
> I didn't see uniform_vector_p being updated?
That part was there FWIW (for tree.c).
> Can you add verification to either verify_expr or build_vec_duplicate_cst
> that the type is one of variable size? And amend tree.def docs
> accordingly. Because otherwise we miss a lot of cases in constant
> folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).
OK, done in the patch below with a gcc_unreachable () bomb in
build_vec_duplicate_cst, which becomes a gcc_assert when variable-length
vectors are added. This meant changing the selftests to use
build_vector_from_val rather than build_vec_duplicate_cst,
but to still get testing of VEC_DUPLICATE_*, we then need to use
the target's preferred vector length instead of always using 4.
Tested as before. OK (given the slightly different selftests)?
Thanks,
Richard
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hawyard@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
(VEC_COND_EXPR): Add missing @tindex.
* doc/md.texi (vec_duplicate@var{m}): Document.
* tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
* tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
are used for VEC_DUPLICATE_CST as well.
(tree_vector): Access base.n.nelts directly.
* tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
valid codes.
(VEC_DUPLICATE_CST_ELT): New macro.
* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
(integer_zerop, integer_onep, integer_all_onesp, integer_truep)
(real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
(walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
(build_vec_duplicate_cst): New function.
(build_vector_from_val): Add stubbed-out handling of variable-length
vectors, using build_vec_duplicate_cst and VEC_DUPLICATE_EXPR.
(uniform_vector_p): Handle the new codes.
(test_vec_duplicate_predicates_int): New function.
(test_vec_duplicate_predicates_float): Likewise.
(test_vec_duplicate_predicates): Likewise.
(tree_c_tests): Call test_vec_duplicate_predicates.
* cfgexpand.c (expand_debug_expr): Handle the new codes.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
* dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
* gimple-expr.h (is_gimple_constant): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* graphite-isl-ast-to-gimple.c
(translate_isl_ast_to_gimple::is_constant): Likewise.
* graphite-scop-detection.c (scan_tree_for_params): Likewise.
* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
(func_checker::compare_operand): Likewise.
* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
* match.pd (negate_expr_p): Likewise.
* print-tree.c (print_node): Likewise.
* tree-chkp.c (chkp_find_bounds_1): Likewise.
* tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
* tree-ssa-loop.c (for_each_index): Likewise.
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
(ao_ref_init_from_vn_reference): Likewise.
* varasm.c (const_hash_1, compare_constant): Likewise.
* fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
(fold_convert_const, operand_equal_p, fold_view_convert_expr)
(exact_inverse, fold_checksum_tree): Likewise.
(const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant.
(test_vec_duplicate_folding): New function.
(fold_const_c_tests): Call it.
* optabs.def (vec_duplicate_optab): New optab.
* optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
* optabs.h (expand_vector_broadcast): Declare.
* optabs.c (expand_vector_broadcast): Make non-static. Try using
vec_duplicate_optab.
* expr.c (store_constructor): Try using vec_duplicate_optab for
uniform vectors.
(const_vector_element): New function, split out from...
(const_vector_from_tree): ...here.
(expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
(expand_expr_real_1): Handle VEC_DUPLICATE_CST.
* internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
instead of checking for VECTOR_CST.
* tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
(verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
* tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.
Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi 2017-11-06 12:40:39.845713389 +0000
+++ gcc/doc/generic.texi 2017-11-06 12:40:40.277637153 +0000
@@ -1036,6 +1036,7 @@ As this example indicates, the operands
@tindex FIXED_CST
@tindex COMPLEX_CST
@tindex VECTOR_CST
+@tindex VEC_DUPLICATE_CST
@tindex STRING_CST
@findex TREE_STRING_LENGTH
@findex TREE_STRING_POINTER
@@ -1089,6 +1090,14 @@ constant nodes. Each individual constan
double constant node. The first operand is a @code{TREE_LIST} of the
constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
+@item VEC_DUPLICATE_CST
+These nodes represent a vector constant in which every element has the
+same scalar value. At present only variable-length vectors use
+@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
+instead. The scalar element value is given by
+@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
+element of a @code{VECTOR_CST}.
+
@item STRING_CST
These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
returns the length of the string, as an @code{int}. The
@@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
@node Vectors
@subsection Vectors
+@tindex VEC_DUPLICATE_EXPR
@tindex VEC_LSHIFT_EXPR
@tindex VEC_RSHIFT_EXPR
@tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
@tindex VEC_PACK_TRUNC_EXPR
@tindex VEC_PACK_SAT_EXPR
@tindex VEC_PACK_FIX_TRUNC_EXPR
+@tindex VEC_COND_EXPR
@tindex SAD_EXPR
@table @code
+@item VEC_DUPLICATE_EXPR
+This node has a single operand and represents a vector in which every
+element is equal to that operand.
+
@item VEC_LSHIFT_EXPR
@itemx VEC_RSHIFT_EXPR
These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi 2017-11-06 12:40:39.845713389 +0000
+++ gcc/doc/md.texi 2017-11-06 12:40:40.278630081 +0000
@@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
the vector mode @var{m}, or a vector mode with the same element mode and
smaller number of elements.
+@cindex @code{vec_duplicate@var{m}} instruction pattern
+@item @samp{vec_duplicate@var{m}}
+Initialize vector output operand 0 so that each element has the value given
+by scalar input operand 1. The vector has mode @var{m} and the scalar has
+the mode appropriate for one element of @var{m}.
+
+This pattern only handles duplicates of non-constant inputs. Constant
+vectors go through the @code{mov@var{m}} pattern instead.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
@item @samp{vec_cmp@var{m}@var{n}}
Output a vector comparison. Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree.def 2017-11-06 12:40:40.292531076 +0000
@@ -304,6 +304,11 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
/* Contents are in VECTOR_CST_ELTS field. */
DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
+/* Represents a vector constant in which every element is equal to
+ VEC_DUPLICATE_CST_ELT. This is only ever used for variable-length
+ vectors; fixed-length vectors must use VECTOR_CST instead. */
+DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
+
/* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
@@ -534,6 +539,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
1 and 2 are NULL. The operands are then taken from the cfg edges. */
DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
+/* Represents a vector in which every element is equal to operand 0. */
+DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+
/* Vector conditional expression. It is like COND_EXPR, but with
vector operands.
Index: gcc/tree-core.h
===================================================================
--- gcc/tree-core.h 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-core.h 2017-11-06 12:40:40.288559363 +0000
@@ -975,7 +975,8 @@ struct GTY(()) tree_base {
/* VEC length. This field is only used with TREE_VEC. */
int length;
- /* Number of elements. This field is only used with VECTOR_CST. */
+ /* Number of elements. This field is only used with VECTOR_CST
+ and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */
unsigned int nelts;
/* SSA version number. This field is only used with SSA_NAME. */
@@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
public_flag:
TREE_OVERFLOW in
- INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
+ INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
TREE_PUBLIC in
VAR_DECL, FUNCTION_DECL
@@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
struct GTY(()) tree_vector {
struct tree_typed typed;
- tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
+ tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
};
struct GTY(()) tree_identifier {
Index: gcc/tree.h
===================================================================
--- gcc/tree.h 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree.h 2017-11-06 12:40:40.293524004 +0000
@@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
#define TYPE_REF_CAN_ALIAS_ALL(NODE) \
(PTR_OR_REF_CHECK (NODE)->base.static_flag)
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
- there was an overflow in folding. */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
+ this means there was an overflow in folding. */
#define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
@@ -1009,6 +1009,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
#define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
#define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
+/* In a VEC_DUPLICATE_CST node. */
+#define VEC_DUPLICATE_CST_ELT(NODE) \
+ (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
+
/* Define fields and accessors for some special-purpose tree nodes. */
#define IDENTIFIER_LENGTH(NODE) \
Index: gcc/tree.c
===================================================================
--- gcc/tree.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree.c 2017-11-06 12:40:40.292531076 +0000
@@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
case FIXED_CST: return TS_FIXED_CST;
case COMPLEX_CST: return TS_COMPLEX;
case VECTOR_CST: return TS_VECTOR;
+ case VEC_DUPLICATE_CST: return TS_VECTOR;
case STRING_CST: return TS_STRING;
/* tcc_exceptional cases. */
case ERROR_MARK: return TS_COMMON;
@@ -829,6 +830,7 @@ tree_code_size (enum tree_code code)
case FIXED_CST: return sizeof (tree_fixed_cst);
case COMPLEX_CST: return sizeof (tree_complex);
case VECTOR_CST: return sizeof (tree_vector);
+ case VEC_DUPLICATE_CST: return sizeof (tree_vector);
case STRING_CST: gcc_unreachable ();
default:
gcc_checking_assert (code >= NUM_TREE_CODES);
@@ -890,6 +892,9 @@ tree_size (const_tree node)
return (sizeof (struct tree_vector)
+ (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
+ case VEC_DUPLICATE_CST:
+ return sizeof (struct tree_vector);
+
case STRING_CST:
return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
@@ -1697,6 +1702,34 @@ cst_and_fits_in_hwi (const_tree x)
&& (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
}
+/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
+
+ This function is only suitable for callers that know TYPE is a
+ variable-length vector and specifically need a VEC_DUPLICATE_CST node.
+ Use build_vector_from_val to duplicate a general scalar into a general
+ vector type. */
+
+static tree
+build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
+{
+ /* Shouldn't be used until we have variable-length vectors. */
+ gcc_unreachable ();
+
+ int length = sizeof (struct tree_vector);
+
+ record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
+
+ tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+ TREE_SET_CODE (t, VEC_DUPLICATE_CST);
+ TREE_TYPE (t) = type;
+ t->base.u.nelts = 1;
+ VEC_DUPLICATE_CST_ELT (t) = exp;
+ TREE_CONSTANT (t) = 1;
+
+ return t;
+}
+
/* Build a newly constructed VECTOR_CST node of length LEN. */
tree
@@ -1790,6 +1823,13 @@ build_vector_from_val (tree vectype, tre
gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)),
TREE_TYPE (vectype)));
+ if (0)
+ {
+ if (CONSTANT_CLASS_P (sc))
+ return build_vec_duplicate_cst (vectype, sc);
+ return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
+ }
+
if (CONSTANT_CLASS_P (sc))
{
auto_vec<tree, 32> v (nunits);
@@ -2358,6 +2398,8 @@ integer_zerop (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2384,6 +2426,8 @@ integer_onep (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2422,6 +2466,9 @@ integer_all_onesp (const_tree expr)
return 1;
}
+ else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
+ return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
+
else if (TREE_CODE (expr) != INTEGER_CST)
return 0;
@@ -2478,7 +2525,7 @@ integer_nonzerop (const_tree expr)
int
integer_truep (const_tree expr)
{
- if (TREE_CODE (expr) == VECTOR_CST)
+ if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
return integer_all_onesp (expr);
return integer_onep (expr);
}
@@ -2649,6 +2696,8 @@ real_zerop (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2677,6 +2726,8 @@ real_onep (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return real_onep (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -2704,6 +2755,8 @@ real_minus_onep (const_tree expr)
return false;
return true;
}
+ case VEC_DUPLICATE_CST:
+ return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
default:
return false;
}
@@ -7106,6 +7159,9 @@ add_expr (const_tree t, inchash::hash &h
inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
return;
}
+ case VEC_DUPLICATE_CST:
+ inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
+ return;
case SSA_NAME:
/* We can just compare by pointer. */
hstate.add_hwi (SSA_NAME_VERSION (t));
@@ -10367,6 +10423,9 @@ initializer_zerop (const_tree init)
return true;
}
+ case VEC_DUPLICATE_CST:
+ return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
+
case CONSTRUCTOR:
{
unsigned HOST_WIDE_INT idx;
@@ -10412,7 +10471,13 @@ uniform_vector_p (const_tree vec)
gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
- if (TREE_CODE (vec) == VECTOR_CST)
+ if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
+ return VEC_DUPLICATE_CST_ELT (vec);
+
+ else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
+ return TREE_OPERAND (vec, 0);
+
+ else if (TREE_CODE (vec) == VECTOR_CST)
{
first = VECTOR_CST_ELT (vec, 0);
for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
@@ -11144,6 +11209,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
case REAL_CST:
case FIXED_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
case BLOCK:
case PLACEHOLDER_EXPR:
@@ -12430,6 +12496,12 @@ drop_tree_overflow (tree t)
elt = drop_tree_overflow (elt);
}
}
+ if (TREE_CODE (t) == VEC_DUPLICATE_CST)
+ {
+ tree *elt = &VEC_DUPLICATE_CST_ELT (t);
+ if (TREE_OVERFLOW (*elt))
+ *elt = drop_tree_overflow (*elt);
+ }
return t;
}
@@ -13850,6 +13922,102 @@ test_integer_constants ()
ASSERT_EQ (type, TREE_TYPE (zero));
}
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
+ for integral type TYPE. */
+
+static void
+test_vec_duplicate_predicates_int (tree type)
+{
+ scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type);
+ machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+ /* This will be 1 if VEC_MODE isn't a vector mode. */
+ unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+ tree vec_type = build_vector_type (type, nunits);
+
+ tree zero = build_zero_cst (type);
+ tree vec_zero = build_vector_from_val (vec_type, zero);
+ ASSERT_TRUE (integer_zerop (vec_zero));
+ ASSERT_FALSE (integer_onep (vec_zero));
+ ASSERT_FALSE (integer_minus_onep (vec_zero));
+ ASSERT_FALSE (integer_all_onesp (vec_zero));
+ ASSERT_FALSE (integer_truep (vec_zero));
+ ASSERT_TRUE (initializer_zerop (vec_zero));
+
+ tree one = build_one_cst (type);
+ tree vec_one = build_vector_from_val (vec_type, one);
+ ASSERT_FALSE (integer_zerop (vec_one));
+ ASSERT_TRUE (integer_onep (vec_one));
+ ASSERT_FALSE (integer_minus_onep (vec_one));
+ ASSERT_FALSE (integer_all_onesp (vec_one));
+ ASSERT_FALSE (integer_truep (vec_one));
+ ASSERT_FALSE (initializer_zerop (vec_one));
+
+ tree minus_one = build_minus_one_cst (type);
+ tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
+ ASSERT_FALSE (integer_zerop (vec_minus_one));
+ ASSERT_FALSE (integer_onep (vec_minus_one));
+ ASSERT_TRUE (integer_minus_onep (vec_minus_one));
+ ASSERT_TRUE (integer_all_onesp (vec_minus_one));
+ ASSERT_TRUE (integer_truep (vec_minus_one));
+ ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+ tree x = create_tmp_var_raw (type, "x");
+ tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
+ ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+ ASSERT_EQ (uniform_vector_p (vec_one), one);
+ ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+ ASSERT_EQ (uniform_vector_p (vec_x), x);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
+ type TYPE. */
+
+static void
+test_vec_duplicate_predicates_float (tree type)
+{
+ scalar_float_mode float_mode = SCALAR_FLOAT_TYPE_MODE (type);
+ machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (float_mode);
+ /* This will be 1 if VEC_MODE isn't a vector mode. */
+ unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+ tree vec_type = build_vector_type (type, nunits);
+
+ tree zero = build_zero_cst (type);
+ tree vec_zero = build_vector_from_val (vec_type, zero);
+ ASSERT_TRUE (real_zerop (vec_zero));
+ ASSERT_FALSE (real_onep (vec_zero));
+ ASSERT_FALSE (real_minus_onep (vec_zero));
+ ASSERT_TRUE (initializer_zerop (vec_zero));
+
+ tree one = build_one_cst (type);
+ tree vec_one = build_vector_from_val (vec_type, one);
+ ASSERT_FALSE (real_zerop (vec_one));
+ ASSERT_TRUE (real_onep (vec_one));
+ ASSERT_FALSE (real_minus_onep (vec_one));
+ ASSERT_FALSE (initializer_zerop (vec_one));
+
+ tree minus_one = build_minus_one_cst (type);
+ tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
+ ASSERT_FALSE (real_zerop (vec_minus_one));
+ ASSERT_FALSE (real_onep (vec_minus_one));
+ ASSERT_TRUE (real_minus_onep (vec_minus_one));
+ ASSERT_FALSE (initializer_zerop (vec_minus_one));
+
+ ASSERT_EQ (uniform_vector_p (vec_zero), zero);
+ ASSERT_EQ (uniform_vector_p (vec_one), one);
+ ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
+}
+
+/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
+
+static void
+test_vec_duplicate_predicates ()
+{
+ test_vec_duplicate_predicates_int (integer_type_node);
+ test_vec_duplicate_predicates_float (float_type_node);
+}
+
/* Verify identifiers. */
static void
@@ -13878,6 +14046,7 @@ test_labels ()
tree_c_tests ()
{
test_integer_constants ();
+ test_vec_duplicate_predicates ();
test_identifiers ();
test_labels ();
}
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/cfgexpand.c 2017-11-06 12:40:40.276644225 +0000
@@ -5068,6 +5068,8 @@ expand_debug_expr (tree exp)
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
case VEC_PERM_EXPR:
+ case VEC_DUPLICATE_CST:
+ case VEC_DUPLICATE_EXPR:
return NULL;
/* Misc codes. */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-pretty-print.c 2017-11-06 12:40:40.289552291 +0000
@@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
}
break;
+ case VEC_DUPLICATE_CST:
+ pp_string (pp, "{ ");
+ dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
+ pp_string (pp, ", ... }");
+ break;
+
case FUNCTION_TYPE:
case METHOD_TYPE:
dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
pp_string (pp, " > ");
break;
+ case VEC_DUPLICATE_EXPR:
+ pp_space (pp);
+ for (str = get_tree_code_name (code); *str; str++)
+ pp_character (pp, TOUPPER (*str));
+ pp_string (pp, " < ");
+ dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
+ pp_string (pp, " > ");
+ break;
+
case VEC_UNPACK_HI_EXPR:
pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-vect-generic.c 2017-11-06 12:40:40.291538147 +0000
@@ -1419,6 +1419,8 @@ lower_vec_perm (gimple_stmt_iterator *gs
ssa_uniform_vector_p (tree op)
{
if (TREE_CODE (op) == VECTOR_CST
+ || TREE_CODE (op) == VEC_DUPLICATE_CST
+ || TREE_CODE (op) == VEC_DUPLICATE_EXPR
|| TREE_CODE (op) == CONSTRUCTOR)
return uniform_vector_p (op);
if (TREE_CODE (op) == SSA_NAME)
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/dwarf2out.c 2017-11-06 12:40:40.280615937 +0000
@@ -18878,6 +18878,7 @@ rtl_for_decl_init (tree init, tree type)
switch (TREE_CODE (init))
{
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
break;
case CONSTRUCTOR:
if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h 2017-11-06 12:40:39.845713389 +0000
+++ gcc/gimple-expr.h 2017-11-06 12:40:40.282601794 +0000
@@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
case FIXED_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
return true;
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/gimplify.c 2017-11-06 12:40:40.283594722 +0000
@@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
case STRING_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
/* Drop the overflow flag on constants, we do not want
that in the GIMPLE IL. */
if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-isl-ast-to-gimple.c
===================================================================
--- gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:40.284587650 +0000
@@ -211,7 +211,8 @@ enum phi_node_kind
return TREE_CODE (op) == INTEGER_CST
|| TREE_CODE (op) == REAL_CST
|| TREE_CODE (op) == COMPLEX_CST
- || TREE_CODE (op) == VECTOR_CST;
+ || TREE_CODE (op) == VECTOR_CST
+ || TREE_CODE (op) == VEC_DUPLICATE_CST;
}
private:
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/graphite-scop-detection.c 2017-11-06 12:40:40.284587650 +0000
@@ -1212,6 +1212,7 @@ scan_tree_for_params (sese_info_p s, tre
case REAL_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
break;
default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/ipa-icf-gimple.c 2017-11-06 12:40:40.285580578 +0000
@@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
case REAL_CST:
{
@@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
case REAL_CST:
case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/ipa-icf.c 2017-11-06 12:40:40.285580578 +0000
@@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
case STRING_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
inchash::add_expr (exp, hstate);
break;
case CONSTRUCTOR:
@@ -2036,6 +2037,9 @@ sem_variable::equals (tree t1, tree t2)
return 1;
}
+ case VEC_DUPLICATE_CST:
+ return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
+ VEC_DUPLICATE_CST_ELT (t2));
case ARRAY_REF:
case ARRAY_RANGE_REF:
{
Index: gcc/match.pd
===================================================================
--- gcc/match.pd 2017-11-06 12:40:39.845713389 +0000
+++ gcc/match.pd 2017-11-06 12:40:40.285580578 +0000
@@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(match negate_expr_p
VECTOR_CST
(if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
+(match negate_expr_p
+ VEC_DUPLICATE_CST
+ (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
/* (-A) * (-B) -> A * B */
(simplify
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/print-tree.c 2017-11-06 12:40:40.287566435 +0000
@@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
}
break;
+ case VEC_DUPLICATE_CST:
+ print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
+ break;
+
case COMPLEX_CST:
print_node (file, "real", TREE_REALPART (node), indent + 4);
print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-chkp.c
===================================================================
--- gcc/tree-chkp.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-chkp.c 2017-11-06 12:40:40.288559363 +0000
@@ -3799,6 +3799,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
if (integer_zerop (ptr_src))
bounds = chkp_get_none_bounds ();
else
Index: gcc/tree-loop-distribution.c
===================================================================
--- gcc/tree-loop-distribution.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-loop-distribution.c 2017-11-06 12:40:40.289552291 +0000
@@ -927,6 +927,9 @@ const_with_all_bytes_same (tree val)
&& CONSTRUCTOR_NELTS (val) == 0))
return 0;
+ if (TREE_CODE (val) == VEC_DUPLICATE_CST)
+ return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
+
if (real_zerop (val))
{
/* Only return 0 for +0.0, not for -0.0, which doesn't have
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-ssa-loop.c 2017-11-06 12:40:40.290545219 +0000
@@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
case STRING_CST:
case RESULT_DECL:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case COMPLEX_CST:
case INTEGER_CST:
case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-ssa-pre.c 2017-11-06 12:40:40.290545219 +0000
@@ -2627,6 +2627,7 @@ create_component_ref_by_pieces_1 (basic_
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case REAL_CST:
case CONSTRUCTOR:
case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-ssa-sccvn.c 2017-11-06 12:40:40.291538147 +0000
@@ -866,6 +866,7 @@ copy_reference_ops_from_ref (tree ref, v
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case REAL_CST:
case FIXED_CST:
case CONSTRUCTOR:
@@ -1058,6 +1059,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
case INTEGER_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case REAL_CST:
case CONSTRUCTOR:
case CONST_DECL:
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/varasm.c 2017-11-06 12:40:40.293524004 +0000
@@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
CASE_CONVERT:
return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
+ case VEC_DUPLICATE_CST:
+ return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
+
default:
/* A language specific constant. Just hash the code. */
return code;
@@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
return 1;
}
+ case VEC_DUPLICATE_CST:
+ return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
+ VEC_DUPLICATE_CST_ELT (t2));
+
case CONSTRUCTOR:
{
vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/fold-const.c 2017-11-06 12:40:40.282601794 +0000
@@ -418,6 +418,9 @@ negate_expr_p (tree t)
return true;
}
+ case VEC_DUPLICATE_CST:
+ return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
+
case COMPLEX_EXPR:
return negate_expr_p (TREE_OPERAND (t, 0))
&& negate_expr_p (TREE_OPERAND (t, 1));
@@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
return build_vector (type, elts);
}
+ case VEC_DUPLICATE_CST:
+ {
+ tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (type, sub);
+ }
+
case COMPLEX_EXPR:
if (negate_expr_p (t))
return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
return build_vector (type, elts);
}
+ if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+ && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
+ {
+ tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
+ VEC_DUPLICATE_CST_ELT (arg2));
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (TREE_TYPE (arg1), sub);
+ }
+
/* Shifts allow a scalar offset for a vector. */
if (TREE_CODE (arg1) == VECTOR_CST
&& TREE_CODE (arg2) == INTEGER_CST)
@@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
return build_vector (type, elts);
}
+
+ if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+ && TREE_CODE (arg2) == INTEGER_CST)
+ {
+ tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (TREE_TYPE (arg1), sub);
+ }
return NULL_TREE;
}
@@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
if (i == count)
return build_vector (type, elements);
}
+ else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
+ {
+ tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (arg0));
+ if (sub)
+ return build_vector_from_val (type, sub);
+ }
break;
case TRUTH_NOT_EXPR:
@@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
return res;
}
+ case VEC_DUPLICATE_EXPR:
+ if (CONSTANT_CLASS_P (arg0))
+ return build_vector_from_val (type, arg0);
+ return NULL_TREE;
+
default:
break;
}
@@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
}
return build_vector (type, v);
}
+ if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
+ && (TYPE_VECTOR_SUBPARTS (type)
+ == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
+ {
+ tree sub = fold_convert_const (code, TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (arg1));
+ if (sub)
+ return build_vector_from_val (type, sub);
+ }
}
return NULL_TREE;
}
@@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
return 1;
}
+ case VEC_DUPLICATE_CST:
+ return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
+ VEC_DUPLICATE_CST_ELT (arg1), flags);
+
case COMPLEX_CST:
return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
flags)
@@ -7475,6 +7530,20 @@ can_native_interpret_type_p (tree type)
static tree
fold_view_convert_expr (tree type, tree expr)
{
+ /* Recurse on duplicated vectors if the target type is also a vector
+ and if the elements line up. */
+ tree expr_type = TREE_TYPE (expr);
+ if (TREE_CODE (expr) == VEC_DUPLICATE_CST
+ && VECTOR_TYPE_P (type)
+ && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
+ && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
+ {
+ tree sub = fold_view_convert_expr (TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (expr));
+ if (sub)
+ return build_vector_from_val (type, sub);
+ }
+
/* We support up to 512-bit values (for V8DFmode). */
unsigned char buffer[64];
int len;
@@ -8874,6 +8943,15 @@ exact_inverse (tree type, tree cst)
return build_vector (type, elts);
}
+ case VEC_DUPLICATE_CST:
+ {
+ tree sub = exact_inverse (TREE_TYPE (type),
+ VEC_DUPLICATE_CST_ELT (cst));
+ if (!sub)
+ return NULL_TREE;
+ return build_vector_from_val (type, sub);
+ }
+
default:
return NULL_TREE;
}
@@ -11939,6 +12017,9 @@ fold_checksum_tree (const_tree expr, str
for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
break;
+ case VEC_DUPLICATE_CST:
+ fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
+ break;
default:
break;
}
@@ -14412,6 +14493,41 @@ test_vector_folding ()
ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
}
+/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
+
+static void
+test_vec_duplicate_folding ()
+{
+ scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
+ machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+ /* This will be 1 if VEC_MODE isn't a vector mode. */
+ unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+
+ tree type = build_vector_type (ssizetype, nunits);
+ tree dup5 = build_vector_from_val (type, ssize_int (5));
+ tree dup3 = build_vector_from_val (type, ssize_int (3));
+
+ tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
+ ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
+
+ tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
+ ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
+
+ tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
+ ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
+
+ tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
+ ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
+
+ tree size_vector = build_vector_type (sizetype, nunits);
+ tree size_dup5 = fold_convert (size_vector, dup5);
+ ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
+
+ tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
+ tree dup5_cst = build_vector_from_val (type, ssize_int (5));
+ ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
+}
+
/* Run all of the selftests within this file. */
void
@@ -14419,6 +14535,7 @@ fold_const_c_tests ()
{
test_arithmetic_folding ();
test_vector_folding ();
+ test_vec_duplicate_folding ();
}
} // namespace selftest
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def 2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs.def 2017-11-06 12:40:40.286573506 +0000
@@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
+
+OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs-tree.c 2017-11-06 12:40:40.286573506 +0000
@@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
return TYPE_UNSIGNED (type) ?
vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
+ case VEC_DUPLICATE_EXPR:
+ return vec_duplicate_optab;
+
default:
break;
}
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h 2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs.h 2017-11-06 12:40:40.287566435 +0000
@@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
enum optab_methods methods);
extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
enum optab_methods);
+extern rtx expand_vector_broadcast (machine_mode, rtx);
/* Generate code for a simple binary or unary operation. "Simple" in
this case means "can be unambiguously described by a (mode, code)
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/optabs.c 2017-11-06 12:40:40.286573506 +0000
@@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
mode of OP must be the element mode of VMODE. If OP is a constant,
then the return value will be a constant. */
-static rtx
+rtx
expand_vector_broadcast (machine_mode vmode, rtx op)
{
enum insn_code icode;
@@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
if (valid_for_const_vec_duplicate_p (vmode, op))
return gen_const_vec_duplicate (vmode, op);
+ icode = optab_handler (vec_duplicate_optab, vmode);
+ if (icode != CODE_FOR_nothing)
+ {
+ struct expand_operand ops[2];
+ create_output_operand (&ops[0], NULL_RTX, vmode);
+ create_input_operand (&ops[1], op, GET_MODE (op));
+ expand_insn (icode, 2, ops);
+ return ops[0].value;
+ }
+
/* ??? If the target doesn't have a vec_init, then we have no easy way
of performing this operation. Most of this sort of generic support
is hidden away in the vector lowering support in gimple. */
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/expr.c 2017-11-06 12:40:40.281608865 +0000
@@ -6576,7 +6576,8 @@ store_constructor (tree exp, rtx target,
constructor_elt *ce;
int i;
int need_to_clear;
- int icode = CODE_FOR_nothing;
+ insn_code icode = CODE_FOR_nothing;
+ tree elt;
tree elttype = TREE_TYPE (type);
int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
machine_mode eltmode = TYPE_MODE (elttype);
@@ -6586,13 +6587,30 @@ store_constructor (tree exp, rtx target,
unsigned n_elts;
alias_set_type alias;
bool vec_vec_init_p = false;
+ machine_mode mode = GET_MODE (target);
gcc_assert (eltmode != BLKmode);
+ /* Try using vec_duplicate_optab for uniform vectors. */
+ if (!TREE_SIDE_EFFECTS (exp)
+ && VECTOR_MODE_P (mode)
+ && eltmode == GET_MODE_INNER (mode)
+ && ((icode = optab_handler (vec_duplicate_optab, mode))
+ != CODE_FOR_nothing)
+ && (elt = uniform_vector_p (exp)))
+ {
+ struct expand_operand ops[2];
+ create_output_operand (&ops[0], target, mode);
+ create_input_operand (&ops[1], expand_normal (elt), eltmode);
+ expand_insn (icode, 2, ops);
+ if (!rtx_equal_p (target, ops[0].value))
+ emit_move_insn (target, ops[0].value);
+ break;
+ }
+
n_elts = TYPE_VECTOR_SUBPARTS (type);
- if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
+ if (REG_P (target) && VECTOR_MODE_P (mode))
{
- machine_mode mode = GET_MODE (target);
machine_mode emode = eltmode;
if (CONSTRUCTOR_NELTS (exp)
@@ -6604,7 +6622,7 @@ store_constructor (tree exp, rtx target,
== n_elts);
emode = TYPE_MODE (etype);
}
- icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
+ icode = convert_optab_handler (vec_init_optab, mode, emode);
if (icode != CODE_FOR_nothing)
{
unsigned int i, n = n_elts;
@@ -6652,7 +6670,7 @@ store_constructor (tree exp, rtx target,
if (need_to_clear && size > 0 && !vector)
{
if (REG_P (target))
- emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+ emit_move_insn (target, CONST0_RTX (mode));
else
clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
cleared = 1;
@@ -6660,7 +6678,7 @@ store_constructor (tree exp, rtx target,
/* Inform later passes that the old value is dead. */
if (!cleared && !vector && REG_P (target))
- emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
+ emit_move_insn (target, CONST0_RTX (mode));
if (MEM_P (target))
alias = MEM_ALIAS_SET (target);
@@ -6711,8 +6729,7 @@ store_constructor (tree exp, rtx target,
if (vector)
emit_insn (GEN_FCN (icode) (target,
- gen_rtx_PARALLEL (GET_MODE (target),
- vector)));
+ gen_rtx_PARALLEL (mode, vector)));
break;
}
@@ -7690,6 +7707,19 @@ expand_operands (tree exp0, tree exp1, r
}
\f
+/* Expand constant vector element ELT, which has mode MODE. This is used
+ for members of VECTOR_CST and VEC_DUPLICATE_CST. */
+
+static rtx
+const_vector_element (scalar_mode mode, const_tree elt)
+{
+ if (TREE_CODE (elt) == REAL_CST)
+ return const_double_from_real_value (TREE_REAL_CST (elt), mode);
+ if (TREE_CODE (elt) == FIXED_CST)
+ return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
+ return immed_wide_int_const (wi::to_wide (elt), mode);
+}
+
/* Return a MEM that contains constant EXP. DEFER is as for
output_constant_def and MODIFIER is as for expand_expr. */
@@ -9555,6 +9585,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
return target;
+ case VEC_DUPLICATE_EXPR:
+ op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
+ target = expand_vector_broadcast (mode, op0);
+ gcc_assert (target);
+ return target;
+
case BIT_INSERT_EXPR:
{
unsigned bitpos = tree_to_uhwi (treeop2);
@@ -9988,6 +10024,11 @@ expand_expr_real_1 (tree exp, rtx target
tmode, modifier);
}
+ case VEC_DUPLICATE_CST:
+ op0 = const_vector_element (GET_MODE_INNER (mode),
+ VEC_DUPLICATE_CST_ELT (exp));
+ return gen_const_vec_duplicate (mode, op0);
+
case CONST_DECL:
if (modifier == EXPAND_WRITE)
{
@@ -11749,8 +11790,7 @@ const_vector_from_tree (tree exp)
{
rtvec v;
unsigned i, units;
- tree elt;
- machine_mode inner, mode;
+ machine_mode mode;
mode = TYPE_MODE (TREE_TYPE (exp));
@@ -11761,23 +11801,12 @@ const_vector_from_tree (tree exp)
return const_vector_mask_from_tree (exp);
units = VECTOR_CST_NELTS (exp);
- inner = GET_MODE_INNER (mode);
v = rtvec_alloc (units);
for (i = 0; i < units; ++i)
- {
- elt = VECTOR_CST_ELT (exp, i);
-
- if (TREE_CODE (elt) == REAL_CST)
- RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
- inner);
- else if (TREE_CODE (elt) == FIXED_CST)
- RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
- inner);
- else
- RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
- }
+ RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
+ VECTOR_CST_ELT (exp, i));
return gen_rtx_CONST_VECTOR (mode, v);
}
Index: gcc/internal-fn.c
===================================================================
--- gcc/internal-fn.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/internal-fn.c 2017-11-06 12:40:40.284587650 +0000
@@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
emit_move_insn (cntvar, const0_rtx);
emit_label (loop_lab);
}
- if (TREE_CODE (arg0) != VECTOR_CST)
+ if (!CONSTANT_CLASS_P (arg0))
{
rtx arg0r = expand_normal (arg0);
arg0 = make_tree (TREE_TYPE (arg0), arg0r);
}
- if (TREE_CODE (arg1) != VECTOR_CST)
+ if (!CONSTANT_CLASS_P (arg1))
{
rtx arg1r = expand_normal (arg1);
arg1 = make_tree (TREE_TYPE (arg1), arg1r);
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-cfg.c 2017-11-06 12:40:40.287566435 +0000
@@ -3798,6 +3798,17 @@ verify_gimple_assign_unary (gassign *stm
case CONJ_EXPR:
break;
+ case VEC_DUPLICATE_EXPR:
+ if (TREE_CODE (lhs_type) != VECTOR_TYPE
+ || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+ {
+ error ("vec_duplicate should be from a scalar to a like vector");
+ debug_generic_expr (lhs_type);
+ debug_generic_expr (rhs1_type);
+ return true;
+ }
+ return false;
+
default:
gcc_unreachable ();
}
@@ -4468,6 +4479,7 @@ verify_gimple_assign_single (gassign *st
case FIXED_CST:
case COMPLEX_CST:
case VECTOR_CST:
+ case VEC_DUPLICATE_CST:
case STRING_CST:
return res;
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c 2017-11-06 12:40:39.845713389 +0000
+++ gcc/tree-inline.c 2017-11-06 12:40:40.289552291 +0000
@@ -3930,6 +3930,7 @@ estimate_operator_cost (enum tree_code c
case VEC_PACK_FIX_TRUNC_EXPR:
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
+ case VEC_DUPLICATE_EXPR:
return 1;
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
2017-10-26 12:43 ` Richard Biener
@ 2017-11-06 15:21 ` Richard Sandiford
2017-11-07 10:38 ` Richard Biener
0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-11-06 15:21 UTC (permalink / raw)
To: Richard Biener; +Cc: GCC Patches
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>>> tree code equivalents of the VEC_SERIES rtx code. VEC_SERIES_EXPR
>>> is for non-constant inputs and is a normal tcc_binary. VEC_SERIES_CST
>>> is a tcc_constant.
>>>
>>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>>> vectors. This avoids the need to handle combinations of VECTOR_CST
>>> and VEC_SERIES_CST.
>>
>> Similar to the other patch can you document and verify that VEC_SERIES_CST
>> is only used on variable length vectors?
OK, done with the below, which also makes build_vec_series create
a VECTOR_CST for fixed-length vectors. I also added some selftests.
>> Ok with that change.
>
> Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
> via setting step == 0? I think you can do {1, 1, 1, 1... } + {1, 2,3
> ,4,5 } constant
> folding but you don't implement that.
That was done via vec_series_equivalent_p.
The problem with using VEC_SERIES with a step of zero is that we'd
then have to define VEC_SERIES for floats too (even in strict math
modes), but probably only for the special case of a zero step.
I think that'd end up being more complicated overall.
> Propagation can also turn
> VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
> into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).
VEC_SERIES_EXPR -> VEC_SERIES_CST/VECTOR_CST was done by const_binop.
And yeah, VEC_DUPLICATE_EXPR -> VEC_DUPLICATE_CST/VECTOR_CST was done
by const_unop in the VEC_DUPLICATE patch.
Tested as before. OK to install?
Thanks,
Richard
2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
* doc/md.texi (vec_series@var{m}): Document.
* tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
* tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
codes.
(VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
(build_vec_series): Declare.
* tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
(add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
(build_vec_series_cst, build_vec_series): New functions.
* cfgexpand.c (expand_debug_expr): Handle the new codes.
* tree-pretty-print.c (dump_generic_node): Likewise.
* dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
* gimple-expr.h (is_gimple_constant): Likewise.
* gimplify.c (gimplify_expr): Likewise.
* graphite-scop-detection.c (scan_tree_for_params): Likewise.
* ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
(func_checker::compare_operand): Likewise.
* ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
* print-tree.c (print_node): Likewise.
* tree-ssa-loop.c (for_each_index): Likewise.
* tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
* tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
(ao_ref_init_from_vn_reference): Likewise.
* varasm.c (const_hash_1, compare_constant): Likewise.
* fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
(fold_checksum_tree): Likewise.
(vec_series_equivalent_p): New function.
(const_binop): Use it. Fold VEC_SERIES_EXPRs of constants.
(test_vec_series_folding): New function.
(fold_const_c_tests): Call it.
* expmed.c (make_tree): Handle VEC_SERIES.
* gimple-pretty-print.c (dump_binary_rhs): Likewise.
* tree-inline.c (estimate_operator_cost): Likewise.
* expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
(expand_expr_real_2): Handle VEC_SERIES_EXPR.
(expand_expr_real_1): Handle VEC_SERIES_CST.
* optabs.def (vec_series_optab): New optab.
* optabs.h (expand_vec_series_expr): Declare.
* optabs.c (expand_vec_series_expr): New function.
* optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
* tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
(verify_gimple_assign_single): Handle VEC_SERIES_CST.
* tree-vect-generic.c (expand_vector_operations_1): Check that
the operands also have vector type.
Index: gcc/doc/generic.texi
===================================================================
--- gcc/doc/generic.texi 2017-11-06 12:20:31.075167123 +0000
+++ gcc/doc/generic.texi 2017-11-06 12:21:29.321209826 +0000
@@ -1037,6 +1037,7 @@ As this example indicates, the operands
@tindex COMPLEX_CST
@tindex VECTOR_CST
@tindex VEC_DUPLICATE_CST
+@tindex VEC_SERIES_CST
@tindex STRING_CST
@findex TREE_STRING_LENGTH
@findex TREE_STRING_POINTER
@@ -1098,6 +1099,18 @@ instead. The scalar element value is gi
@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
element of a @code{VECTOR_CST}.
+@item VEC_SERIES_CST
+These nodes represent a vector constant in which element @var{i}
+has the value @samp{@var{base} + @var{i} * @var{step}}, for some
+constant @var{base} and @var{step}. The value of @var{base} is
+given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
+given by @code{VEC_SERIES_CST_STEP}.
+
+At present only variable-length vectors use @code{VEC_SERIES_CST};
+constant-length vectors use @code{VECTOR_CST} instead. The nodes
+are also restricted to integral types, in order to avoid specifying
+the rounding behavior for floating-point types.
+
@item STRING_CST
These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
returns the length of the string, as an @code{int}. The
@@ -1702,6 +1715,7 @@ a value from @code{enum annot_expr_kind}
@node Vectors
@subsection Vectors
@tindex VEC_DUPLICATE_EXPR
+@tindex VEC_SERIES_EXPR
@tindex VEC_LSHIFT_EXPR
@tindex VEC_RSHIFT_EXPR
@tindex VEC_WIDEN_MULT_HI_EXPR
@@ -1721,6 +1735,14 @@ a value from @code{enum annot_expr_kind}
This node has a single operand and represents a vector in which every
element is equal to that operand.
+@item VEC_SERIES_EXPR
+This node represents a vector formed from a scalar base and step,
+given as the first and second operands respectively. Element @var{i}
+of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
+
+This node is restricted to integral types, in order to avoid
+specifying the rounding behavior for floating-point types.
+
@item VEC_LSHIFT_EXPR
@itemx VEC_RSHIFT_EXPR
These nodes represent whole vector left and right shifts, respectively.
Index: gcc/doc/md.texi
===================================================================
--- gcc/doc/md.texi 2017-11-06 12:20:31.076995065 +0000
+++ gcc/doc/md.texi 2017-11-06 12:21:29.322209826 +0000
@@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
This pattern is not allowed to @code{FAIL}.
+@cindex @code{vec_series@var{m}} instruction pattern
+@item @samp{vec_series@var{m}}
+Initialize vector output operand 0 so that element @var{i} is equal to
+operand 1 plus @var{i} times operand 2. In other words, create a linear
+series whose base value is operand 1 and whose step is operand 2.
+
+The vector output has mode @var{m} and the scalar inputs have the mode
+appropriate for one element of @var{m}. This pattern is not used for
+floating-point vectors, in order to avoid having to specify the
+rounding behavior for @var{i} > 1.
+
+This pattern is not allowed to @code{FAIL}.
+
@cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
@item @samp{vec_cmp@var{m}@var{n}}
Output a vector comparison. Operand 0 of mode @var{n} is the destination for
Index: gcc/tree.def
===================================================================
--- gcc/tree.def 2017-11-06 12:20:31.098930366 +0000
+++ gcc/tree.def 2017-11-06 12:21:29.335209826 +0000
@@ -309,6 +309,12 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
vectors; fixed-length vectors must use VECTOR_CST instead. */
DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
+/* Represents a vector constant in which element i is equal to
+ VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP. This is only ever
+ used for variable-length vectors; fixed-length vectors must use
+ VECTOR_CST instead. */
+DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
+
/* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
@@ -542,6 +548,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
/* Represents a vector in which every element is equal to operand 0. */
DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
+/* Vector series created from a start (base) value and a step.
+
+ A = VEC_SERIES_EXPR (B, C)
+
+ means
+
+ for (i = 0; i < N; i++)
+ A[i] = B + C * i; */
+DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
+
/* Vector conditional expression. It is like COND_EXPR, but with
vector operands.
Index: gcc/tree.h
===================================================================
--- gcc/tree.h 2017-11-06 12:20:31.099844337 +0000
+++ gcc/tree.h 2017-11-06 12:21:29.336209826 +0000
@@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
#define TYPE_REF_CAN_ALIAS_ALL(NODE) \
(PTR_OR_REF_CHECK (NODE)->base.static_flag)
-/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
- this means there was an overflow in folding. */
+/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
+ or VEC_SERES_CST, this means there was an overflow in folding. */
#define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
@@ -1013,6 +1013,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
#define VEC_DUPLICATE_CST_ELT(NODE) \
(VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
+/* In a VEC_SERIES_CST node. */
+#define VEC_SERIES_CST_BASE(NODE) \
+ (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
+#define VEC_SERIES_CST_STEP(NODE) \
+ (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
+
/* Define fields and accessors for some special-purpose tree nodes. */
#define IDENTIFIER_LENGTH(NODE) \
@@ -4017,6 +4023,7 @@ extern tree make_vector (unsigned CXX_ME
extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
extern tree build_vector_from_val (tree, tree);
+extern tree build_vec_series (tree, tree, tree);
extern void recompute_constructor_flags (tree);
extern void verify_constructor_flags (tree);
extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
Index: gcc/tree.c
===================================================================
--- gcc/tree.c 2017-11-06 12:20:31.098930366 +0000
+++ gcc/tree.c 2017-11-06 12:21:29.335209826 +0000
@@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
case COMPLEX_CST: return TS_COMPLEX;
case VECTOR_CST: return TS_VECTOR;
case VEC_DUPLICATE_CST: return TS_VECTOR;
+ case VEC_SERIES_CST: return TS_VECTOR;
case STRING_CST: return TS_STRING;
/* tcc_exceptional cases. */
case ERROR_MARK: return TS_COMMON;
@@ -831,6 +832,7 @@ tree_code_size (enum tree_code code)
case COMPLEX_CST: return sizeof (tree_complex);
case VECTOR_CST: return sizeof (tree_vector);
case VEC_DUPLICATE_CST: return sizeof (tree_vector);
+ case VEC_SERIES_CST: return sizeof (tree_vector) + sizeof (tree);
case STRING_CST: gcc_unreachable ();
default:
gcc_checking_assert (code >= NUM_TREE_CODES);
@@ -895,6 +897,9 @@ tree_size (const_tree node)
case VEC_DUPLICATE_CST:
return sizeof (struct tree_vector);
+ case VEC_SERIES_CST:
+ return sizeof (struct tree_vector) + sizeof (tree);
+
case STRING_CST:
return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
@@ -1730,6 +1735,34 @@ build_vec_duplicate_cst (tree type, tree
return t;
}
+/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
+
+ Note that this function is only suitable for callers that specifically
+ need a VEC_SERIES_CST node. Use build_vec_series to build a general
+ series vector from a general base and step. */
+
+static tree
+build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
+{
+ /* Shouldn't be used until we have variable-length vectors. */
+ gcc_unreachable ();
+
+ int length = sizeof (struct tree_vector) + sizeof (tree);
+
+ record_node_allocation_statistics (VEC_SERIES_CST, length);
+
+ tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
+
+ TREE_SET_CODE (t, VEC_SERIES_CST);
+ TREE_TYPE (t) = type;
+ t->base.u.nelts = 2;
+ VEC_SERIES_CST_BASE (t) = base;
+ VEC_SERIES_CST_STEP (t) = step;
+ TREE_CONSTANT (t) = 1;
+
+ return t;
+}
+
/* Build a newly constructed VECTOR_CST node of length LEN. */
tree
@@ -1847,6 +1880,33 @@ build_vector_from_val (tree vectype, tre
}
}
+/* Build a vector series of type TYPE in which element I has the value
+ BASE + I * STEP. The result is a constant if BASE and STEP are constant
+ and a VEC_SERIES_EXPR otherwise. */
+
+tree
+build_vec_series (tree type, tree base, tree step)
+{
+ if (integer_zerop (step))
+ return build_vector_from_val (type, base);
+ if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
+ {
+ unsigned int nunits = TYPE_VECTOR_SUBPARTS (type);
+ if (0)
+ return build_vec_series_cst (type, base, step);
+
+ auto_vec<tree, 32> v (nunits);
+ v.quick_push (base);
+ for (unsigned int i = 1; i < nunits; ++i)
+ {
+ base = const_binop (PLUS_EXPR, TREE_TYPE (base), base, step);
+ v.quick_push (base);
+ }
+ return build_vector (type, v);
+ }
+ return build2 (VEC_SERIES_EXPR, type, base, step);
+}
+
/* Something has messed with the elements of CONSTRUCTOR C after it was built;
calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */
@@ -7162,6 +7222,10 @@ add_expr (const_tree t, inchash::hash &h
case VEC_DUPLICATE_CST:
inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
return;
+ case VEC_SERIES_CST:
+ inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
+ inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
+ return;
case SSA_NAME:
/* We can just compare by pointer. */
hstate.add_hwi (SSA_NAME_VERSION (t));
@@ -11210,6 +11274,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
case FIXED_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
case BLOCK:
case PLACEHOLDER_EXPR:
@@ -12502,6 +12567,15 @@ drop_tree_overflow (tree t)
if (TREE_OVERFLOW (*elt))
*elt = drop_tree_overflow (*elt);
}
+ if (TREE_CODE (t) == VEC_SERIES_CST)
+ {
+ tree *elt = &VEC_SERIES_CST_BASE (t);
+ if (TREE_OVERFLOW (*elt))
+ *elt = drop_tree_overflow (*elt);
+ elt = &VEC_SERIES_CST_STEP (t);
+ if (TREE_OVERFLOW (*elt))
+ *elt = drop_tree_overflow (*elt);
+ }
return t;
}
Index: gcc/cfgexpand.c
===================================================================
--- gcc/cfgexpand.c 2017-11-06 12:20:31.074253152 +0000
+++ gcc/cfgexpand.c 2017-11-06 12:21:29.321209826 +0000
@@ -5070,6 +5070,8 @@ expand_debug_expr (tree exp)
case VEC_PERM_EXPR:
case VEC_DUPLICATE_CST:
case VEC_DUPLICATE_EXPR:
+ case VEC_SERIES_CST:
+ case VEC_SERIES_EXPR:
return NULL;
/* Misc codes. */
Index: gcc/tree-pretty-print.c
===================================================================
--- gcc/tree-pretty-print.c 2017-11-06 12:20:31.093446541 +0000
+++ gcc/tree-pretty-print.c 2017-11-06 12:21:29.333209826 +0000
@@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
pp_string (pp, ", ... }");
break;
+ case VEC_SERIES_CST:
+ pp_string (pp, "{ ");
+ dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
+ pp_string (pp, ", +, ");
+ dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
+ pp_string (pp, "}");
+ break;
+
case FUNCTION_TYPE:
case METHOD_TYPE:
dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
@@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
pp_string (pp, " > ");
break;
+ case VEC_SERIES_EXPR:
case VEC_WIDEN_MULT_HI_EXPR:
case VEC_WIDEN_MULT_LO_EXPR:
case VEC_WIDEN_MULT_EVEN_EXPR:
Index: gcc/dwarf2out.c
===================================================================
--- gcc/dwarf2out.c 2017-11-06 12:20:31.080650948 +0000
+++ gcc/dwarf2out.c 2017-11-06 12:21:29.325209826 +0000
@@ -18879,6 +18879,7 @@ rtl_for_decl_init (tree init, tree type)
{
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
break;
case CONSTRUCTOR:
if (TREE_CONSTANT (init))
Index: gcc/gimple-expr.h
===================================================================
--- gcc/gimple-expr.h 2017-11-06 12:20:31.087048745 +0000
+++ gcc/gimple-expr.h 2017-11-06 12:21:29.328209826 +0000
@@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
return true;
Index: gcc/gimplify.c
===================================================================
--- gcc/gimplify.c 2017-11-06 12:20:31.088876686 +0000
+++ gcc/gimplify.c 2017-11-06 12:21:29.329209826 +0000
@@ -11508,6 +11508,7 @@ gimplify_expr (tree *expr_p, gimple_seq
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
/* Drop the overflow flag on constants, we do not want
that in the GIMPLE IL. */
if (TREE_OVERFLOW_P (*expr_p))
Index: gcc/graphite-scop-detection.c
===================================================================
--- gcc/graphite-scop-detection.c 2017-11-06 12:20:31.088876686 +0000
+++ gcc/graphite-scop-detection.c 2017-11-06 12:21:29.329209826 +0000
@@ -1213,6 +1213,7 @@ scan_tree_for_params (sese_info_p s, tre
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
break;
default:
Index: gcc/ipa-icf-gimple.c
===================================================================
--- gcc/ipa-icf-gimple.c 2017-11-06 12:20:31.088876686 +0000
+++ gcc/ipa-icf-gimple.c 2017-11-06 12:21:29.329209826 +0000
@@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
case REAL_CST:
{
@@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
case REAL_CST:
case FUNCTION_DECL:
Index: gcc/ipa-icf.c
===================================================================
--- gcc/ipa-icf.c 2017-11-06 12:20:31.089790657 +0000
+++ gcc/ipa-icf.c 2017-11-06 12:21:29.330209826 +0000
@@ -1480,6 +1480,7 @@ sem_item::add_expr (const_tree exp, inch
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
inchash::add_expr (exp, hstate);
break;
case CONSTRUCTOR:
@@ -2040,6 +2041,11 @@ sem_variable::equals (tree t1, tree t2)
case VEC_DUPLICATE_CST:
return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
VEC_DUPLICATE_CST_ELT (t2));
+ case VEC_SERIES_CST:
+ return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
+ VEC_SERIES_CST_BASE (t2))
+ && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
+ VEC_SERIES_CST_STEP (t2)));
case ARRAY_REF:
case ARRAY_RANGE_REF:
{
Index: gcc/print-tree.c
===================================================================
--- gcc/print-tree.c 2017-11-06 12:20:31.090704628 +0000
+++ gcc/print-tree.c 2017-11-06 12:21:29.331209826 +0000
@@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
break;
+ case VEC_SERIES_CST:
+ print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
+ print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
+ break;
+
case COMPLEX_CST:
print_node (file, "real", TREE_REALPART (node), indent + 4);
print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
Index: gcc/tree-ssa-loop.c
===================================================================
--- gcc/tree-ssa-loop.c 2017-11-06 12:20:31.093446541 +0000
+++ gcc/tree-ssa-loop.c 2017-11-06 12:21:29.333209826 +0000
@@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
case RESULT_DECL:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case COMPLEX_CST:
case INTEGER_CST:
case REAL_CST:
Index: gcc/tree-ssa-pre.c
===================================================================
--- gcc/tree-ssa-pre.c 2017-11-06 12:20:31.093446541 +0000
+++ gcc/tree-ssa-pre.c 2017-11-06 12:21:29.333209826 +0000
@@ -2628,6 +2628,7 @@ create_component_ref_by_pieces_1 (basic_
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case REAL_CST:
case CONSTRUCTOR:
case VAR_DECL:
Index: gcc/tree-ssa-sccvn.c
===================================================================
--- gcc/tree-ssa-sccvn.c 2017-11-06 12:20:31.094360512 +0000
+++ gcc/tree-ssa-sccvn.c 2017-11-06 12:21:29.334209826 +0000
@@ -867,6 +867,7 @@ copy_reference_ops_from_ref (tree ref, v
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case REAL_CST:
case FIXED_CST:
case CONSTRUCTOR:
@@ -1060,6 +1061,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case REAL_CST:
case CONSTRUCTOR:
case CONST_DECL:
Index: gcc/varasm.c
===================================================================
--- gcc/varasm.c 2017-11-06 12:20:31.100758308 +0000
+++ gcc/varasm.c 2017-11-06 12:21:29.337209826 +0000
@@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
+ const_hash_1 (TREE_OPERAND (exp, 1)));
+ case VEC_SERIES_CST:
+ return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
+ + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
+
CASE_CONVERT:
return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
@@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
VEC_DUPLICATE_CST_ELT (t2));
+ case VEC_SERIES_CST:
+ return (compare_constant (VEC_SERIES_CST_BASE (t1),
+ VEC_SERIES_CST_BASE (t2))
+ && compare_constant (VEC_SERIES_CST_STEP (t1),
+ VEC_SERIES_CST_STEP (t2)));
+
case CONSTRUCTOR:
{
vec<constructor_elt, va_gc> *v1, *v2;
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c 2017-11-06 12:20:31.087048745 +0000
+++ gcc/fold-const.c 2017-11-06 12:21:29.328209826 +0000
@@ -421,6 +421,10 @@ negate_expr_p (tree t)
case VEC_DUPLICATE_CST:
return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
+ case VEC_SERIES_CST:
+ return (negate_expr_p (VEC_SERIES_CST_BASE (t))
+ && negate_expr_p (VEC_SERIES_CST_STEP (t)));
+
case COMPLEX_EXPR:
return negate_expr_p (TREE_OPERAND (t, 0))
&& negate_expr_p (TREE_OPERAND (t, 1));
@@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
return build_vector_from_val (type, sub);
}
+ case VEC_SERIES_CST:
+ {
+ tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
+ if (!neg_base)
+ return NULL_TREE;
+ tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
+ if (!neg_step)
+ return NULL_TREE;
+ return build_vec_series (type, neg_base, neg_step);
+ }
+
case COMPLEX_EXPR:
if (negate_expr_p (t))
return fold_build2_loc (loc, COMPLEX_EXPR, type,
@@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
return int_const_binop_1 (code, arg1, arg2, 1);
}
+/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
+ and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
+ The step will be zero for VEC_DUPLICATE_CST. */
+
+static bool
+vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
+{
+ if (TREE_CODE (exp) == VEC_SERIES_CST)
+ {
+ *base_out = VEC_SERIES_CST_BASE (exp);
+ *step_out = VEC_SERIES_CST_STEP (exp);
+ return true;
+ }
+ if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
+ {
+ *base_out = VEC_DUPLICATE_CST_ELT (exp);
+ *step_out = build_zero_cst (TREE_TYPE (*base_out));
+ return true;
+ }
+ return false;
+}
+
/* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
constant. We assume ARG1 and ARG2 have the same data type, or at least
are the same kind of constant and the same machine mode. Return zero if
@@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
return build_vector_from_val (TREE_TYPE (arg1), sub);
}
+ tree base1, step1, base2, step2;
+ if ((code == PLUS_EXPR || code == MINUS_EXPR)
+ && vec_series_equivalent_p (arg1, &base1, &step1)
+ && vec_series_equivalent_p (arg2, &base2, &step2))
+ {
+ tree new_base = const_binop (code, base1, base2);
+ if (!new_base)
+ return NULL_TREE;
+ tree new_step = const_binop (code, step1, step2);
+ if (!new_step)
+ return NULL_TREE;
+ return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
+ }
+
/* Shifts allow a scalar offset for a vector. */
if (TREE_CODE (arg1) == VECTOR_CST
&& TREE_CODE (arg2) == INTEGER_CST)
@@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
result as argument put those cases that need it here. */
switch (code)
{
+ case VEC_SERIES_EXPR:
+ if (CONSTANT_CLASS_P (arg1)
+ && CONSTANT_CLASS_P (arg2))
+ return build_vec_series (type, arg1, arg2);
+ return NULL_TREE;
+
case COMPLEX_EXPR:
if ((TREE_CODE (arg1) == REAL_CST
&& TREE_CODE (arg2) == REAL_CST)
@@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
VEC_DUPLICATE_CST_ELT (arg1), flags);
+ case VEC_SERIES_CST:
+ return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
+ VEC_SERIES_CST_BASE (arg1), flags)
+ && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
+ VEC_SERIES_CST_STEP (arg1), flags));
+
case COMPLEX_CST:
return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
flags)
@@ -12020,6 +12083,10 @@ fold_checksum_tree (const_tree expr, str
case VEC_DUPLICATE_CST:
fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
break;
+ case VEC_SERIES_CST:
+ fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
+ fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
+ break;
default:
break;
}
@@ -14528,6 +14595,54 @@ test_vec_duplicate_folding ()
ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
}
+/* Verify folding of VEC_SERIES_CSTs and VEC_SERIES_EXPRs. */
+
+static void
+test_vec_series_folding ()
+{
+ scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
+ machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
+ unsigned int nunits = GET_MODE_NUNITS (vec_mode);
+ if (nunits == 1)
+ nunits = 4;
+
+ tree type = build_vector_type (ssizetype, nunits);
+ tree s5_4 = build_vec_series (type, ssize_int (5), ssize_int (4));
+ tree s3_9 = build_vec_series (type, ssize_int (3), ssize_int (9));
+
+ tree neg_s5_4_a = fold_unary (NEGATE_EXPR, type, s5_4);
+ tree neg_s5_4_b = build_vec_series (type, ssize_int (-5), ssize_int (-4));
+ ASSERT_TRUE (operand_equal_p (neg_s5_4_a, neg_s5_4_b, 0));
+
+ tree s8_s13_a = fold_binary (PLUS_EXPR, type, s5_4, s3_9);
+ tree s8_s13_b = build_vec_series (type, ssize_int (8), ssize_int (13));
+ ASSERT_TRUE (operand_equal_p (s8_s13_a, s8_s13_b, 0));
+
+ tree s2_m5_a = fold_binary (MINUS_EXPR, type, s5_4, s3_9);
+ tree s2_m5_b = build_vec_series (type, ssize_int (2), ssize_int (-5));
+ ASSERT_TRUE (operand_equal_p (s2_m5_a, s2_m5_b, 0));
+
+ tree s11 = build_vector_from_val (type, ssize_int (11));
+ tree s16_4_a = fold_binary (PLUS_EXPR, type, s5_4, s11);
+ tree s16_4_b = fold_binary (PLUS_EXPR, type, s11, s5_4);
+ tree s16_4_c = build_vec_series (type, ssize_int (16), ssize_int (4));
+ ASSERT_TRUE (operand_equal_p (s16_4_a, s16_4_c, 0));
+ ASSERT_TRUE (operand_equal_p (s16_4_b, s16_4_c, 0));
+
+ tree sm6_4_a = fold_binary (MINUS_EXPR, type, s5_4, s11);
+ tree sm6_4_b = build_vec_series (type, ssize_int (-6), ssize_int (4));
+ ASSERT_TRUE (operand_equal_p (sm6_4_a, sm6_4_b, 0));
+
+ tree s6_m4_a = fold_binary (MINUS_EXPR, type, s11, s5_4);
+ tree s6_m4_b = build_vec_series (type, ssize_int (6), ssize_int (-4));
+ ASSERT_TRUE (operand_equal_p (s6_m4_a, s6_m4_b, 0));
+
+ tree s5_4_expr = fold_binary (VEC_SERIES_EXPR, type,
+ ssize_int (5), ssize_int (4));
+ ASSERT_TRUE (operand_equal_p (s5_4_expr, s5_4, 0));
+ ASSERT_FALSE (operand_equal_p (s5_4_expr, s3_9, 0));
+}
+
/* Run all of the selftests within this file. */
void
@@ -14536,6 +14651,7 @@ fold_const_c_tests ()
test_arithmetic_folding ();
test_vector_folding ();
test_vec_duplicate_folding ();
+ test_vec_series_folding ();
}
} // namespace selftest
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c 2017-11-06 12:20:31.081564919 +0000
+++ gcc/expmed.c 2017-11-06 12:21:29.325209826 +0000
@@ -5252,6 +5252,13 @@ make_tree (tree type, rtx x)
tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
return build_vector_from_val (type, elt_tree);
}
+ if (GET_CODE (op) == VEC_SERIES)
+ {
+ tree itype = TREE_TYPE (type);
+ tree base_tree = make_tree (itype, XEXP (op, 0));
+ tree step_tree = make_tree (itype, XEXP (op, 1));
+ return build_vec_series (type, base_tree, step_tree);
+ }
return make_tree (type, op);
}
Index: gcc/gimple-pretty-print.c
===================================================================
--- gcc/gimple-pretty-print.c 2017-11-06 12:20:31.087048745 +0000
+++ gcc/gimple-pretty-print.c 2017-11-06 12:21:29.328209826 +0000
@@ -431,6 +431,7 @@ dump_binary_rhs (pretty_printer *buffer,
case VEC_PACK_FIX_TRUNC_EXPR:
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
+ case VEC_SERIES_EXPR:
for (p = get_tree_code_name (code); *p; p++)
pp_character (buffer, TOUPPER (*p));
pp_string (buffer, " <");
Index: gcc/tree-inline.c
===================================================================
--- gcc/tree-inline.c 2017-11-06 12:20:31.092532570 +0000
+++ gcc/tree-inline.c 2017-11-06 12:21:29.332209826 +0000
@@ -3931,6 +3931,7 @@ estimate_operator_cost (enum tree_code c
case VEC_WIDEN_LSHIFT_HI_EXPR:
case VEC_WIDEN_LSHIFT_LO_EXPR:
case VEC_DUPLICATE_EXPR:
+ case VEC_SERIES_EXPR:
return 1;
Index: gcc/expr.c
===================================================================
--- gcc/expr.c 2017-11-06 12:20:31.082478890 +0000
+++ gcc/expr.c 2017-11-06 12:21:29.326209826 +0000
@@ -7708,7 +7708,7 @@ expand_operands (tree exp0, tree exp1, r
\f
/* Expand constant vector element ELT, which has mode MODE. This is used
- for members of VECTOR_CST and VEC_DUPLICATE_CST. */
+ for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST. */
static rtx
const_vector_element (scalar_mode mode, const_tree elt)
@@ -9591,6 +9591,10 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
gcc_assert (target);
return target;
+ case VEC_SERIES_EXPR:
+ expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
+ return expand_vec_series_expr (mode, op0, op1, target);
+
case BIT_INSERT_EXPR:
{
unsigned bitpos = tree_to_uhwi (treeop2);
@@ -10029,6 +10033,13 @@ expand_expr_real_1 (tree exp, rtx target
VEC_DUPLICATE_CST_ELT (exp));
return gen_const_vec_duplicate (mode, op0);
+ case VEC_SERIES_CST:
+ op0 = const_vector_element (GET_MODE_INNER (mode),
+ VEC_SERIES_CST_BASE (exp));
+ op1 = const_vector_element (GET_MODE_INNER (mode),
+ VEC_SERIES_CST_STEP (exp));
+ return gen_const_vec_series (mode, op0, op1);
+
case CONST_DECL:
if (modifier == EXPAND_WRITE)
{
Index: gcc/optabs.def
===================================================================
--- gcc/optabs.def 2017-11-06 12:20:31.090704628 +0000
+++ gcc/optabs.def 2017-11-06 12:21:29.331209826 +0000
@@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
+OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
Index: gcc/optabs.h
===================================================================
--- gcc/optabs.h 2017-11-06 12:20:31.090704628 +0000
+++ gcc/optabs.h 2017-11-06 12:21:29.331209826 +0000
@@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
/* Generate code for VEC_COND_EXPR. */
extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
+/* Generate code for VEC_SERIES_EXPR. */
+extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
+
/* Generate code for MULT_HIGHPART_EXPR. */
extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c 2017-11-06 12:20:31.090704628 +0000
+++ gcc/optabs.c 2017-11-06 12:21:29.330209826 +0000
@@ -5703,6 +5703,27 @@ expand_vec_cond_expr (tree vec_cond_type
return ops[0].value;
}
+/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
+ Use TARGET for the result if nonnull and convenient. */
+
+rtx
+expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
+{
+ struct expand_operand ops[3];
+ enum insn_code icode;
+ machine_mode emode = GET_MODE_INNER (vmode);
+
+ icode = direct_optab_handler (vec_series_optab, vmode);
+ gcc_assert (icode != CODE_FOR_nothing);
+
+ create_output_operand (&ops[0], target, vmode);
+ create_input_operand (&ops[1], op0, emode);
+ create_input_operand (&ops[2], op1, emode);
+
+ expand_insn (icode, 3, ops);
+ return ops[0].value;
+}
+
/* Generate insns for a vector comparison into a mask. */
rtx
Index: gcc/optabs-tree.c
===================================================================
--- gcc/optabs-tree.c 2017-11-06 12:20:31.089790657 +0000
+++ gcc/optabs-tree.c 2017-11-06 12:21:29.330209826 +0000
@@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
case VEC_DUPLICATE_EXPR:
return vec_duplicate_optab;
+ case VEC_SERIES_EXPR:
+ return vec_series_optab;
+
default:
break;
}
Index: gcc/tree-cfg.c
===================================================================
--- gcc/tree-cfg.c 2017-11-06 12:20:31.091618599 +0000
+++ gcc/tree-cfg.c 2017-11-06 12:21:29.332209826 +0000
@@ -4114,6 +4114,23 @@ verify_gimple_assign_binary (gassign *st
/* Continue with generic binary expression handling. */
break;
+ case VEC_SERIES_EXPR:
+ if (!useless_type_conversion_p (rhs1_type, rhs2_type))
+ {
+ error ("type mismatch in series expression");
+ debug_generic_expr (rhs1_type);
+ debug_generic_expr (rhs2_type);
+ return true;
+ }
+ if (TREE_CODE (lhs_type) != VECTOR_TYPE
+ || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
+ {
+ error ("vector type expected in series expression");
+ debug_generic_expr (lhs_type);
+ return true;
+ }
+ return false;
+
default:
gcc_unreachable ();
}
@@ -4480,6 +4497,7 @@ verify_gimple_assign_single (gassign *st
case COMPLEX_CST:
case VECTOR_CST:
case VEC_DUPLICATE_CST:
+ case VEC_SERIES_CST:
case STRING_CST:
return res;
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c 2017-11-06 12:20:31.094360512 +0000
+++ gcc/tree-vect-generic.c 2017-11-06 12:21:29.334209826 +0000
@@ -1596,7 +1596,8 @@ expand_vector_operations_1 (gimple_stmt_
if (rhs_class == GIMPLE_BINARY_RHS)
rhs2 = gimple_assign_rhs2 (stmt);
- if (TREE_CODE (type) != VECTOR_TYPE)
+ if (!VECTOR_TYPE_P (type)
+ || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
return;
/* If the vector operation is operating on all same vector elements
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [05/nn] Add VEC_DUPLICATE_{CST,EXPR} and associated optab
2017-11-06 15:09 ` Richard Sandiford
@ 2017-11-07 10:37 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-11-07 10:37 UTC (permalink / raw)
To: Richard Biener, GCC Patches, Richard Sandiford
On Mon, Nov 6, 2017 at 4:09 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> SVE needs a way of broadcasting a scalar to a variable-length vector.
>>> This patch adds VEC_DUPLICATE_CST for when VECTOR_CST would be used for
>>> fixed-length vectors and VEC_DUPLICATE_EXPR for when CONSTRUCTOR would
>>> be used for fixed-length vectors. VEC_DUPLICATE_EXPR is the tree
>>> equivalent of the existing rtl code VEC_DUPLICATE.
>>>
>>> Originally we had a single VEC_DUPLICATE_EXPR and used TREE_CONSTANT
>>> to mark constant nodes, but in response to last year's RFC, Richard B.
>>> suggested it would be better to have separate codes for the constant
>>> and non-constant cases. This allows VEC_DUPLICATE_EXPR to be treated
>>> as a normal unary operation and avoids the previous need for treating
>>> it as a GIMPLE_SINGLE_RHS.
>>>
>>> It might make sense to use VEC_DUPLICATE_CST for all duplicated
>>> vector constants, since it's a bit more compact than VECTOR_CST
>>> in that case, and is potentially more efficient to process.
>>> However, the nice thing about keeping it restricted to variable-length
>>> vectors is that there is then no need to handle combinations of
>>> VECTOR_CST and VEC_DUPLICATE_CST; a vector type will always use
>>> VECTOR_CST or never use it.
>>>
>>> The patch also adds a vec_duplicate_optab to go with VEC_DUPLICATE_EXPR.
>>
>> Index: gcc/tree-vect-generic.c
>> ===================================================================
>> --- gcc/tree-vect-generic.c 2017-10-23 11:38:53.934094740 +0100
>> +++ gcc/tree-vect-generic.c 2017-10-23 11:41:51.773953100 +0100
>> @@ -1419,6 +1419,7 @@ lower_vec_perm (gimple_stmt_iterator *gs
>> ssa_uniform_vector_p (tree op)
>> {
>> if (TREE_CODE (op) == VECTOR_CST
>> + || TREE_CODE (op) == VEC_DUPLICATE_CST
>> || TREE_CODE (op) == CONSTRUCTOR)
>> return uniform_vector_p (op);
>>
>> VEC_DUPLICATE_EXPR handling?
>
> Oops, yeah. I could have sworn it was there at one time...
>
>> Looks like for VEC_DUPLICATE_CST it could directly return true.
>
> The function is a bit misnamed: it returns the duplicated tree value
> rather than a bool.
>
>> I didn't see uniform_vector_p being updated?
>
> That part was there FWIW (for tree.c).
>
>> Can you add verification to either verify_expr or build_vec_duplicate_cst
>> that the type is one of variable size? And amend tree.def docs
>> accordingly. Because otherwise we miss a lot of cases in constant
>> folding (mixing VEC_DUPLICATE_CST and VECTOR_CST).
>
> OK, done in the patch below with a gcc_unreachable () bomb in
> build_vec_duplicate_cst, which becomes a gcc_assert when variable-length
> vectors are added. This meant changing the selftests to use
> build_vector_from_val rather than build_vec_duplicate_cst,
> but to still get testing of VEC_DUPLICATE_*, we then need to use
> the target's preferred vector length instead of always using 4.
>
> Tested as before. OK (given the slightly different selftests)?
Ok. I'll leave the missed constant foldings to you to figure out.
Richard.
> Thanks,
> Richard
>
>
> 2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hawyard@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * doc/generic.texi (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): Document.
> (VEC_COND_EXPR): Add missing @tindex.
> * doc/md.texi (vec_duplicate@var{m}): Document.
> * tree.def (VEC_DUPLICATE_CST, VEC_DUPLICATE_EXPR): New tree codes.
> * tree-core.h (tree_base): Document that u.nelts and TREE_OVERFLOW
> are used for VEC_DUPLICATE_CST as well.
> (tree_vector): Access base.n.nelts directly.
> * tree.h (TREE_OVERFLOW): Add VEC_DUPLICATE_CST to the list of
> valid codes.
> (VEC_DUPLICATE_CST_ELT): New macro.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (integer_zerop, integer_onep, integer_all_onesp, integer_truep)
> (real_zerop, real_onep, real_minus_onep, add_expr, initializer_zerop)
> (walk_tree_1, drop_tree_overflow): Handle VEC_DUPLICATE_CST.
> (build_vec_duplicate_cst): New function.
> (build_vector_from_val): Add stubbed-out handling of variable-length
> vectors, using build_vec_duplicate_cst and VEC_DUPLICATE_EXPR.
> (uniform_vector_p): Handle the new codes.
> (test_vec_duplicate_predicates_int): New function.
> (test_vec_duplicate_predicates_float): Likewise.
> (test_vec_duplicate_predicates): Likewise.
> (tree_c_tests): Call test_vec_duplicate_predicates.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * tree-vect-generic.c (ssa_uniform_vector_p): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_DUPLICATE_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-isl-ast-to-gimple.c
> (translate_isl_ast_to_gimple::is_constant): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * match.pd (negate_expr_p): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-chkp.c (chkp_find_bounds_1): Likewise.
> * tree-loop-distribution.c (const_with_all_bytes_same): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1, const_binop)
> (fold_convert_const, operand_equal_p, fold_view_convert_expr)
> (exact_inverse, fold_checksum_tree): Likewise.
> (const_unop): Likewise. Fold VEC_DUPLICATE_EXPRs of a constant.
> (test_vec_duplicate_folding): New function.
> (fold_const_c_tests): Call it.
> * optabs.def (vec_duplicate_optab): New optab.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_DUPLICATE_EXPR.
> * optabs.h (expand_vector_broadcast): Declare.
> * optabs.c (expand_vector_broadcast): Make non-static. Try using
> vec_duplicate_optab.
> * expr.c (store_constructor): Try using vec_duplicate_optab for
> uniform vectors.
> (const_vector_element): New function, split out from...
> (const_vector_from_tree): ...here.
> (expand_expr_real_2): Handle VEC_DUPLICATE_EXPR.
> (expand_expr_real_1): Handle VEC_DUPLICATE_CST.
> * internal-fn.c (expand_vector_ubsan_overflow): Use CONSTANT_P
> instead of checking for VECTOR_CST.
> * tree-cfg.c (verify_gimple_assign_unary): Handle VEC_DUPLICATE_EXPR.
> (verify_gimple_assign_single): Handle VEC_DUPLICATE_CST.
> * tree-inline.c (estimate_operator_cost): Handle VEC_DUPLICATE_EXPR.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/doc/generic.texi 2017-11-06 12:40:40.277637153 +0000
> @@ -1036,6 +1036,7 @@ As this example indicates, the operands
> @tindex FIXED_CST
> @tindex COMPLEX_CST
> @tindex VECTOR_CST
> +@tindex VEC_DUPLICATE_CST
> @tindex STRING_CST
> @findex TREE_STRING_LENGTH
> @findex TREE_STRING_POINTER
> @@ -1089,6 +1090,14 @@ constant nodes. Each individual constan
> double constant node. The first operand is a @code{TREE_LIST} of the
> constant nodes and is accessed through @code{TREE_VECTOR_CST_ELTS}.
>
> +@item VEC_DUPLICATE_CST
> +These nodes represent a vector constant in which every element has the
> +same scalar value. At present only variable-length vectors use
> +@code{VEC_DUPLICATE_CST}; constant-length vectors use @code{VECTOR_CST}
> +instead. The scalar element value is given by
> +@code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
> +element of a @code{VECTOR_CST}.
> +
> @item STRING_CST
> These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
> returns the length of the string, as an @code{int}. The
> @@ -1692,6 +1701,7 @@ a value from @code{enum annot_expr_kind}
>
> @node Vectors
> @subsection Vectors
> +@tindex VEC_DUPLICATE_EXPR
> @tindex VEC_LSHIFT_EXPR
> @tindex VEC_RSHIFT_EXPR
> @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1703,9 +1713,14 @@ a value from @code{enum annot_expr_kind}
> @tindex VEC_PACK_TRUNC_EXPR
> @tindex VEC_PACK_SAT_EXPR
> @tindex VEC_PACK_FIX_TRUNC_EXPR
> +@tindex VEC_COND_EXPR
> @tindex SAD_EXPR
>
> @table @code
> +@item VEC_DUPLICATE_EXPR
> +This node has a single operand and represents a vector in which every
> +element is equal to that operand.
> +
> @item VEC_LSHIFT_EXPR
> @itemx VEC_RSHIFT_EXPR
> These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/doc/md.texi 2017-11-06 12:40:40.278630081 +0000
> @@ -4888,6 +4888,17 @@ and operand 1 is parallel containing val
> the vector mode @var{m}, or a vector mode with the same element mode and
> smaller number of elements.
>
> +@cindex @code{vec_duplicate@var{m}} instruction pattern
> +@item @samp{vec_duplicate@var{m}}
> +Initialize vector output operand 0 so that each element has the value given
> +by scalar input operand 1. The vector has mode @var{m} and the scalar has
> +the mode appropriate for one element of @var{m}.
> +
> +This pattern only handles duplicates of non-constant inputs. Constant
> +vectors go through the @code{mov@var{m}} pattern instead.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
> @item @samp{vec_cmp@var{m}@var{n}}
> Output a vector comparison. Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree.def 2017-11-06 12:40:40.292531076 +0000
> @@ -304,6 +304,11 @@ DEFTREECODE (COMPLEX_CST, "complex_cst",
> /* Contents are in VECTOR_CST_ELTS field. */
> DEFTREECODE (VECTOR_CST, "vector_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which every element is equal to
> + VEC_DUPLICATE_CST_ELT. This is only ever used for variable-length
> + vectors; fixed-length vectors must use VECTOR_CST instead. */
> +DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
> +
> /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
> DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -534,6 +539,9 @@ DEFTREECODE (TARGET_EXPR, "target_expr",
> 1 and 2 are NULL. The operands are then taken from the cfg edges. */
> DEFTREECODE (COND_EXPR, "cond_expr", tcc_expression, 3)
>
> +/* Represents a vector in which every element is equal to operand 0. */
> +DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
> +
> /* Vector conditional expression. It is like COND_EXPR, but with
> vector operands.
>
> Index: gcc/tree-core.h
> ===================================================================
> --- gcc/tree-core.h 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-core.h 2017-11-06 12:40:40.288559363 +0000
> @@ -975,7 +975,8 @@ struct GTY(()) tree_base {
> /* VEC length. This field is only used with TREE_VEC. */
> int length;
>
> - /* Number of elements. This field is only used with VECTOR_CST. */
> + /* Number of elements. This field is only used with VECTOR_CST
> + and VEC_DUPLICATE_CST. It is always 1 for VEC_DUPLICATE_CST. */
> unsigned int nelts;
>
> /* SSA version number. This field is only used with SSA_NAME. */
> @@ -1065,7 +1066,7 @@ struct GTY(()) tree_base {
> public_flag:
>
> TREE_OVERFLOW in
> - INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST
> + INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
>
> TREE_PUBLIC in
> VAR_DECL, FUNCTION_DECL
> @@ -1332,7 +1333,7 @@ struct GTY(()) tree_complex {
>
> struct GTY(()) tree_vector {
> struct tree_typed typed;
> - tree GTY ((length ("VECTOR_CST_NELTS ((tree) &%h)"))) elts[1];
> + tree GTY ((length ("((tree) &%h)->base.u.nelts"))) elts[1];
> };
>
> struct GTY(()) tree_identifier {
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree.h 2017-11-06 12:40:40.293524004 +0000
> @@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
> #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
> (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, or VECTOR_CST, this means
> - there was an overflow in folding. */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> + this means there was an overflow in folding. */
>
> #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1009,6 +1009,10 @@ #define VECTOR_CST_NELTS(NODE) (VECTOR_C
> #define VECTOR_CST_ELTS(NODE) (VECTOR_CST_CHECK (NODE)->vector.elts)
> #define VECTOR_CST_ELT(NODE,IDX) (VECTOR_CST_CHECK (NODE)->vector.elts[IDX])
>
> +/* In a VEC_DUPLICATE_CST node. */
> +#define VEC_DUPLICATE_CST_ELT(NODE) \
> + (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
> +
> /* Define fields and accessors for some special-purpose tree nodes. */
>
> #define IDENTIFIER_LENGTH(NODE) \
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree.c 2017-11-06 12:40:40.292531076 +0000
> @@ -464,6 +464,7 @@ tree_node_structure_for_code (enum tree_
> case FIXED_CST: return TS_FIXED_CST;
> case COMPLEX_CST: return TS_COMPLEX;
> case VECTOR_CST: return TS_VECTOR;
> + case VEC_DUPLICATE_CST: return TS_VECTOR;
> case STRING_CST: return TS_STRING;
> /* tcc_exceptional cases. */
> case ERROR_MARK: return TS_COMMON;
> @@ -829,6 +830,7 @@ tree_code_size (enum tree_code code)
> case FIXED_CST: return sizeof (tree_fixed_cst);
> case COMPLEX_CST: return sizeof (tree_complex);
> case VECTOR_CST: return sizeof (tree_vector);
> + case VEC_DUPLICATE_CST: return sizeof (tree_vector);
> case STRING_CST: gcc_unreachable ();
> default:
> gcc_checking_assert (code >= NUM_TREE_CODES);
> @@ -890,6 +892,9 @@ tree_size (const_tree node)
> return (sizeof (struct tree_vector)
> + (VECTOR_CST_NELTS (node) - 1) * sizeof (tree));
>
> + case VEC_DUPLICATE_CST:
> + return sizeof (struct tree_vector);
> +
> case STRING_CST:
> return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1697,6 +1702,34 @@ cst_and_fits_in_hwi (const_tree x)
> && (tree_fits_shwi_p (x) || tree_fits_uhwi_p (x)));
> }
>
> +/* Build a new VEC_DUPLICATE_CST with type TYPE and operand EXP.
> +
> + This function is only suitable for callers that know TYPE is a
> + variable-length vector and specifically need a VEC_DUPLICATE_CST node.
> + Use build_vector_from_val to duplicate a general scalar into a general
> + vector type. */
> +
> +static tree
> +build_vec_duplicate_cst (tree type, tree exp MEM_STAT_DECL)
> +{
> + /* Shouldn't be used until we have variable-length vectors. */
> + gcc_unreachable ();
> +
> + int length = sizeof (struct tree_vector);
> +
> + record_node_allocation_statistics (VEC_DUPLICATE_CST, length);
> +
> + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> + TREE_SET_CODE (t, VEC_DUPLICATE_CST);
> + TREE_TYPE (t) = type;
> + t->base.u.nelts = 1;
> + VEC_DUPLICATE_CST_ELT (t) = exp;
> + TREE_CONSTANT (t) = 1;
> +
> + return t;
> +}
> +
> /* Build a newly constructed VECTOR_CST node of length LEN. */
>
> tree
> @@ -1790,6 +1823,13 @@ build_vector_from_val (tree vectype, tre
> gcc_checking_assert (types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (sc)),
> TREE_TYPE (vectype)));
>
> + if (0)
> + {
> + if (CONSTANT_CLASS_P (sc))
> + return build_vec_duplicate_cst (vectype, sc);
> + return fold_build1 (VEC_DUPLICATE_EXPR, vectype, sc);
> + }
> +
> if (CONSTANT_CLASS_P (sc))
> {
> auto_vec<tree, 32> v (nunits);
> @@ -2358,6 +2398,8 @@ integer_zerop (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return integer_zerop (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2384,6 +2426,8 @@ integer_onep (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return integer_onep (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2422,6 +2466,9 @@ integer_all_onesp (const_tree expr)
> return 1;
> }
>
> + else if (TREE_CODE (expr) == VEC_DUPLICATE_CST)
> + return integer_all_onesp (VEC_DUPLICATE_CST_ELT (expr));
> +
> else if (TREE_CODE (expr) != INTEGER_CST)
> return 0;
>
> @@ -2478,7 +2525,7 @@ integer_nonzerop (const_tree expr)
> int
> integer_truep (const_tree expr)
> {
> - if (TREE_CODE (expr) == VECTOR_CST)
> + if (TREE_CODE (expr) == VECTOR_CST || TREE_CODE (expr) == VEC_DUPLICATE_CST)
> return integer_all_onesp (expr);
> return integer_onep (expr);
> }
> @@ -2649,6 +2696,8 @@ real_zerop (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return real_zerop (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2677,6 +2726,8 @@ real_onep (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return real_onep (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -2704,6 +2755,8 @@ real_minus_onep (const_tree expr)
> return false;
> return true;
> }
> + case VEC_DUPLICATE_CST:
> + return real_minus_onep (VEC_DUPLICATE_CST_ELT (expr));
> default:
> return false;
> }
> @@ -7106,6 +7159,9 @@ add_expr (const_tree t, inchash::hash &h
> inchash::add_expr (VECTOR_CST_ELT (t, i), hstate, flags);
> return;
> }
> + case VEC_DUPLICATE_CST:
> + inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
> + return;
> case SSA_NAME:
> /* We can just compare by pointer. */
> hstate.add_hwi (SSA_NAME_VERSION (t));
> @@ -10367,6 +10423,9 @@ initializer_zerop (const_tree init)
> return true;
> }
>
> + case VEC_DUPLICATE_CST:
> + return initializer_zerop (VEC_DUPLICATE_CST_ELT (init));
> +
> case CONSTRUCTOR:
> {
> unsigned HOST_WIDE_INT idx;
> @@ -10412,7 +10471,13 @@ uniform_vector_p (const_tree vec)
>
> gcc_assert (VECTOR_TYPE_P (TREE_TYPE (vec)));
>
> - if (TREE_CODE (vec) == VECTOR_CST)
> + if (TREE_CODE (vec) == VEC_DUPLICATE_CST)
> + return VEC_DUPLICATE_CST_ELT (vec);
> +
> + else if (TREE_CODE (vec) == VEC_DUPLICATE_EXPR)
> + return TREE_OPERAND (vec, 0);
> +
> + else if (TREE_CODE (vec) == VECTOR_CST)
> {
> first = VECTOR_CST_ELT (vec, 0);
> for (i = 1; i < VECTOR_CST_NELTS (vec); ++i)
> @@ -11144,6 +11209,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
> case REAL_CST:
> case FIXED_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> case BLOCK:
> case PLACEHOLDER_EXPR:
> @@ -12430,6 +12496,12 @@ drop_tree_overflow (tree t)
> elt = drop_tree_overflow (elt);
> }
> }
> + if (TREE_CODE (t) == VEC_DUPLICATE_CST)
> + {
> + tree *elt = &VEC_DUPLICATE_CST_ELT (t);
> + if (TREE_OVERFLOW (*elt))
> + *elt = drop_tree_overflow (*elt);
> + }
> return t;
> }
>
> @@ -13850,6 +13922,102 @@ test_integer_constants ()
> ASSERT_EQ (type, TREE_TYPE (zero));
> }
>
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs
> + for integral type TYPE. */
> +
> +static void
> +test_vec_duplicate_predicates_int (tree type)
> +{
> + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (type);
> + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> + /* This will be 1 if VEC_MODE isn't a vector mode. */
> + unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> + tree vec_type = build_vector_type (type, nunits);
> +
> + tree zero = build_zero_cst (type);
> + tree vec_zero = build_vector_from_val (vec_type, zero);
> + ASSERT_TRUE (integer_zerop (vec_zero));
> + ASSERT_FALSE (integer_onep (vec_zero));
> + ASSERT_FALSE (integer_minus_onep (vec_zero));
> + ASSERT_FALSE (integer_all_onesp (vec_zero));
> + ASSERT_FALSE (integer_truep (vec_zero));
> + ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> + tree one = build_one_cst (type);
> + tree vec_one = build_vector_from_val (vec_type, one);
> + ASSERT_FALSE (integer_zerop (vec_one));
> + ASSERT_TRUE (integer_onep (vec_one));
> + ASSERT_FALSE (integer_minus_onep (vec_one));
> + ASSERT_FALSE (integer_all_onesp (vec_one));
> + ASSERT_FALSE (integer_truep (vec_one));
> + ASSERT_FALSE (initializer_zerop (vec_one));
> +
> + tree minus_one = build_minus_one_cst (type);
> + tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
> + ASSERT_FALSE (integer_zerop (vec_minus_one));
> + ASSERT_FALSE (integer_onep (vec_minus_one));
> + ASSERT_TRUE (integer_minus_onep (vec_minus_one));
> + ASSERT_TRUE (integer_all_onesp (vec_minus_one));
> + ASSERT_TRUE (integer_truep (vec_minus_one));
> + ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> + tree x = create_tmp_var_raw (type, "x");
> + tree vec_x = build1 (VEC_DUPLICATE_EXPR, vec_type, x);
> + ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> + ASSERT_EQ (uniform_vector_p (vec_one), one);
> + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> + ASSERT_EQ (uniform_vector_p (vec_x), x);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs for floating-point
> + type TYPE. */
> +
> +static void
> +test_vec_duplicate_predicates_float (tree type)
> +{
> + scalar_float_mode float_mode = SCALAR_FLOAT_TYPE_MODE (type);
> + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (float_mode);
> + /* This will be 1 if VEC_MODE isn't a vector mode. */
> + unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> + tree vec_type = build_vector_type (type, nunits);
> +
> + tree zero = build_zero_cst (type);
> + tree vec_zero = build_vector_from_val (vec_type, zero);
> + ASSERT_TRUE (real_zerop (vec_zero));
> + ASSERT_FALSE (real_onep (vec_zero));
> + ASSERT_FALSE (real_minus_onep (vec_zero));
> + ASSERT_TRUE (initializer_zerop (vec_zero));
> +
> + tree one = build_one_cst (type);
> + tree vec_one = build_vector_from_val (vec_type, one);
> + ASSERT_FALSE (real_zerop (vec_one));
> + ASSERT_TRUE (real_onep (vec_one));
> + ASSERT_FALSE (real_minus_onep (vec_one));
> + ASSERT_FALSE (initializer_zerop (vec_one));
> +
> + tree minus_one = build_minus_one_cst (type);
> + tree vec_minus_one = build_vector_from_val (vec_type, minus_one);
> + ASSERT_FALSE (real_zerop (vec_minus_one));
> + ASSERT_FALSE (real_onep (vec_minus_one));
> + ASSERT_TRUE (real_minus_onep (vec_minus_one));
> + ASSERT_FALSE (initializer_zerop (vec_minus_one));
> +
> + ASSERT_EQ (uniform_vector_p (vec_zero), zero);
> + ASSERT_EQ (uniform_vector_p (vec_one), one);
> + ASSERT_EQ (uniform_vector_p (vec_minus_one), minus_one);
> +}
> +
> +/* Verify predicate handling of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
> +
> +static void
> +test_vec_duplicate_predicates ()
> +{
> + test_vec_duplicate_predicates_int (integer_type_node);
> + test_vec_duplicate_predicates_float (float_type_node);
> +}
> +
> /* Verify identifiers. */
>
> static void
> @@ -13878,6 +14046,7 @@ test_labels ()
> tree_c_tests ()
> {
> test_integer_constants ();
> + test_vec_duplicate_predicates ();
> test_identifiers ();
> test_labels ();
> }
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/cfgexpand.c 2017-11-06 12:40:40.276644225 +0000
> @@ -5068,6 +5068,8 @@ expand_debug_expr (tree exp)
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> case VEC_PERM_EXPR:
> + case VEC_DUPLICATE_CST:
> + case VEC_DUPLICATE_EXPR:
> return NULL;
>
> /* Misc codes. */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-pretty-print.c 2017-11-06 12:40:40.289552291 +0000
> @@ -1802,6 +1802,12 @@ dump_generic_node (pretty_printer *pp, t
> }
> break;
>
> + case VEC_DUPLICATE_CST:
> + pp_string (pp, "{ ");
> + dump_generic_node (pp, VEC_DUPLICATE_CST_ELT (node), spc, flags, false);
> + pp_string (pp, ", ... }");
> + break;
> +
> case FUNCTION_TYPE:
> case METHOD_TYPE:
> dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3231,6 +3237,15 @@ dump_generic_node (pretty_printer *pp, t
> pp_string (pp, " > ");
> break;
>
> + case VEC_DUPLICATE_EXPR:
> + pp_space (pp);
> + for (str = get_tree_code_name (code); *str; str++)
> + pp_character (pp, TOUPPER (*str));
> + pp_string (pp, " < ");
> + dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> + pp_string (pp, " > ");
> + break;
> +
> case VEC_UNPACK_HI_EXPR:
> pp_string (pp, " VEC_UNPACK_HI_EXPR < ");
> dump_generic_node (pp, TREE_OPERAND (node, 0), spc, flags, false);
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-vect-generic.c 2017-11-06 12:40:40.291538147 +0000
> @@ -1419,6 +1419,8 @@ lower_vec_perm (gimple_stmt_iterator *gs
> ssa_uniform_vector_p (tree op)
> {
> if (TREE_CODE (op) == VECTOR_CST
> + || TREE_CODE (op) == VEC_DUPLICATE_CST
> + || TREE_CODE (op) == VEC_DUPLICATE_EXPR
> || TREE_CODE (op) == CONSTRUCTOR)
> return uniform_vector_p (op);
> if (TREE_CODE (op) == SSA_NAME)
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/dwarf2out.c 2017-11-06 12:40:40.280615937 +0000
> @@ -18878,6 +18878,7 @@ rtl_for_decl_init (tree init, tree type)
> switch (TREE_CODE (init))
> {
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> break;
> case CONSTRUCTOR:
> if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/gimple-expr.h 2017-11-06 12:40:40.282601794 +0000
> @@ -134,6 +134,7 @@ is_gimple_constant (const_tree t)
> case FIXED_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/gimplify.c 2017-11-06 12:40:40.283594722 +0000
> @@ -11507,6 +11507,7 @@ gimplify_expr (tree *expr_p, gimple_seq
> case STRING_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> /* Drop the overflow flag on constants, we do not want
> that in the GIMPLE IL. */
> if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-isl-ast-to-gimple.c
> ===================================================================
> --- gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/graphite-isl-ast-to-gimple.c 2017-11-06 12:40:40.284587650 +0000
> @@ -211,7 +211,8 @@ enum phi_node_kind
> return TREE_CODE (op) == INTEGER_CST
> || TREE_CODE (op) == REAL_CST
> || TREE_CODE (op) == COMPLEX_CST
> - || TREE_CODE (op) == VECTOR_CST;
> + || TREE_CODE (op) == VECTOR_CST
> + || TREE_CODE (op) == VEC_DUPLICATE_CST;
> }
>
> private:
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/graphite-scop-detection.c 2017-11-06 12:40:40.284587650 +0000
> @@ -1212,6 +1212,7 @@ scan_tree_for_params (sese_info_p s, tre
> case REAL_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> break;
>
> default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/ipa-icf-gimple.c 2017-11-06 12:40:40.285580578 +0000
> @@ -333,6 +333,7 @@ func_checker::compare_cst_or_decl (tree
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> case REAL_CST:
> {
> @@ -528,6 +529,7 @@ func_checker::compare_operand (tree t1,
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> case REAL_CST:
> case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/ipa-icf.c 2017-11-06 12:40:40.285580578 +0000
> @@ -1479,6 +1479,7 @@ sem_item::add_expr (const_tree exp, inch
> case STRING_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> inchash::add_expr (exp, hstate);
> break;
> case CONSTRUCTOR:
> @@ -2036,6 +2037,9 @@ sem_variable::equals (tree t1, tree t2)
>
> return 1;
> }
> + case VEC_DUPLICATE_CST:
> + return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
> + VEC_DUPLICATE_CST_ELT (t2));
> case ARRAY_REF:
> case ARRAY_RANGE_REF:
> {
> Index: gcc/match.pd
> ===================================================================
> --- gcc/match.pd 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/match.pd 2017-11-06 12:40:40.285580578 +0000
> @@ -958,6 +958,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (match negate_expr_p
> VECTOR_CST
> (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
> +(match negate_expr_p
> + VEC_DUPLICATE_CST
> + (if (FLOAT_TYPE_P (TREE_TYPE (type)) || TYPE_OVERFLOW_WRAPS (type))))
>
> /* (-A) * (-B) -> A * B */
> (simplify
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/print-tree.c 2017-11-06 12:40:40.287566435 +0000
> @@ -783,6 +783,10 @@ print_node (FILE *file, const char *pref
> }
> break;
>
> + case VEC_DUPLICATE_CST:
> + print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
> + break;
> +
> case COMPLEX_CST:
> print_node (file, "real", TREE_REALPART (node), indent + 4);
> print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-chkp.c
> ===================================================================
> --- gcc/tree-chkp.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-chkp.c 2017-11-06 12:40:40.288559363 +0000
> @@ -3799,6 +3799,7 @@ chkp_find_bounds_1 (tree ptr, tree ptr_s
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> if (integer_zerop (ptr_src))
> bounds = chkp_get_none_bounds ();
> else
> Index: gcc/tree-loop-distribution.c
> ===================================================================
> --- gcc/tree-loop-distribution.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-loop-distribution.c 2017-11-06 12:40:40.289552291 +0000
> @@ -927,6 +927,9 @@ const_with_all_bytes_same (tree val)
> && CONSTRUCTOR_NELTS (val) == 0))
> return 0;
>
> + if (TREE_CODE (val) == VEC_DUPLICATE_CST)
> + return const_with_all_bytes_same (VEC_DUPLICATE_CST_ELT (val));
> +
> if (real_zerop (val))
> {
> /* Only return 0 for +0.0, not for -0.0, which doesn't have
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-ssa-loop.c 2017-11-06 12:40:40.290545219 +0000
> @@ -616,6 +616,7 @@ for_each_index (tree *addr_p, bool (*cbc
> case STRING_CST:
> case RESULT_DECL:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case COMPLEX_CST:
> case INTEGER_CST:
> case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-ssa-pre.c 2017-11-06 12:40:40.290545219 +0000
> @@ -2627,6 +2627,7 @@ create_component_ref_by_pieces_1 (basic_
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-ssa-sccvn.c 2017-11-06 12:40:40.291538147 +0000
> @@ -866,6 +866,7 @@ copy_reference_ops_from_ref (tree ref, v
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case REAL_CST:
> case FIXED_CST:
> case CONSTRUCTOR:
> @@ -1058,6 +1059,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
> case INTEGER_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case CONST_DECL:
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/varasm.c 2017-11-06 12:40:40.293524004 +0000
> @@ -3068,6 +3068,9 @@ const_hash_1 (const tree exp)
> CASE_CONVERT:
> return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> + case VEC_DUPLICATE_CST:
> + return const_hash_1 (VEC_DUPLICATE_CST_ELT (exp)) * 7 + 3;
> +
> default:
> /* A language specific constant. Just hash the code. */
> return code;
> @@ -3158,6 +3161,10 @@ compare_constant (const tree t1, const t
> return 1;
> }
>
> + case VEC_DUPLICATE_CST:
> + return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
> + VEC_DUPLICATE_CST_ELT (t2));
> +
> case CONSTRUCTOR:
> {
> vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/fold-const.c 2017-11-06 12:40:40.282601794 +0000
> @@ -418,6 +418,9 @@ negate_expr_p (tree t)
> return true;
> }
>
> + case VEC_DUPLICATE_CST:
> + return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
> +
> case COMPLEX_EXPR:
> return negate_expr_p (TREE_OPERAND (t, 0))
> && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -579,6 +582,14 @@ fold_negate_expr_1 (location_t loc, tree
> return build_vector (type, elts);
> }
>
> + case VEC_DUPLICATE_CST:
> + {
> + tree sub = fold_negate_expr (loc, VEC_DUPLICATE_CST_ELT (t));
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (type, sub);
> + }
> +
> case COMPLEX_EXPR:
> if (negate_expr_p (t))
> return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1436,6 +1447,16 @@ const_binop (enum tree_code code, tree a
> return build_vector (type, elts);
> }
>
> + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> + && TREE_CODE (arg2) == VEC_DUPLICATE_CST)
> + {
> + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1),
> + VEC_DUPLICATE_CST_ELT (arg2));
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (TREE_TYPE (arg1), sub);
> + }
> +
> /* Shifts allow a scalar offset for a vector. */
> if (TREE_CODE (arg1) == VECTOR_CST
> && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1459,6 +1480,15 @@ const_binop (enum tree_code code, tree a
>
> return build_vector (type, elts);
> }
> +
> + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> + && TREE_CODE (arg2) == INTEGER_CST)
> + {
> + tree sub = const_binop (code, VEC_DUPLICATE_CST_ELT (arg1), arg2);
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (TREE_TYPE (arg1), sub);
> + }
> return NULL_TREE;
> }
>
> @@ -1652,6 +1682,13 @@ const_unop (enum tree_code code, tree ty
> if (i == count)
> return build_vector (type, elements);
> }
> + else if (TREE_CODE (arg0) == VEC_DUPLICATE_CST)
> + {
> + tree sub = const_unop (BIT_NOT_EXPR, TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (arg0));
> + if (sub)
> + return build_vector_from_val (type, sub);
> + }
> break;
>
> case TRUTH_NOT_EXPR:
> @@ -1737,6 +1774,11 @@ const_unop (enum tree_code code, tree ty
> return res;
> }
>
> + case VEC_DUPLICATE_EXPR:
> + if (CONSTANT_CLASS_P (arg0))
> + return build_vector_from_val (type, arg0);
> + return NULL_TREE;
> +
> default:
> break;
> }
> @@ -2167,6 +2209,15 @@ fold_convert_const (enum tree_code code,
> }
> return build_vector (type, v);
> }
> + if (TREE_CODE (arg1) == VEC_DUPLICATE_CST
> + && (TYPE_VECTOR_SUBPARTS (type)
> + == TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg1))))
> + {
> + tree sub = fold_convert_const (code, TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (arg1));
> + if (sub)
> + return build_vector_from_val (type, sub);
> + }
> }
> return NULL_TREE;
> }
> @@ -2953,6 +3004,10 @@ operand_equal_p (const_tree arg0, const_
> return 1;
> }
>
> + case VEC_DUPLICATE_CST:
> + return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
> + VEC_DUPLICATE_CST_ELT (arg1), flags);
> +
> case COMPLEX_CST:
> return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
> flags)
> @@ -7475,6 +7530,20 @@ can_native_interpret_type_p (tree type)
> static tree
> fold_view_convert_expr (tree type, tree expr)
> {
> + /* Recurse on duplicated vectors if the target type is also a vector
> + and if the elements line up. */
> + tree expr_type = TREE_TYPE (expr);
> + if (TREE_CODE (expr) == VEC_DUPLICATE_CST
> + && VECTOR_TYPE_P (type)
> + && TYPE_VECTOR_SUBPARTS (type) == TYPE_VECTOR_SUBPARTS (expr_type)
> + && TYPE_SIZE (TREE_TYPE (type)) == TYPE_SIZE (TREE_TYPE (expr_type)))
> + {
> + tree sub = fold_view_convert_expr (TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (expr));
> + if (sub)
> + return build_vector_from_val (type, sub);
> + }
> +
> /* We support up to 512-bit values (for V8DFmode). */
> unsigned char buffer[64];
> int len;
> @@ -8874,6 +8943,15 @@ exact_inverse (tree type, tree cst)
> return build_vector (type, elts);
> }
>
> + case VEC_DUPLICATE_CST:
> + {
> + tree sub = exact_inverse (TREE_TYPE (type),
> + VEC_DUPLICATE_CST_ELT (cst));
> + if (!sub)
> + return NULL_TREE;
> + return build_vector_from_val (type, sub);
> + }
> +
> default:
> return NULL_TREE;
> }
> @@ -11939,6 +12017,9 @@ fold_checksum_tree (const_tree expr, str
> for (i = 0; i < (int) VECTOR_CST_NELTS (expr); ++i)
> fold_checksum_tree (VECTOR_CST_ELT (expr, i), ctx, ht);
> break;
> + case VEC_DUPLICATE_CST:
> + fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
> + break;
> default:
> break;
> }
> @@ -14412,6 +14493,41 @@ test_vector_folding ()
> ASSERT_FALSE (integer_nonzerop (fold_build2 (NE_EXPR, res_type, one, one)));
> }
>
> +/* Verify folding of VEC_DUPLICATE_CSTs and VEC_DUPLICATE_EXPRs. */
> +
> +static void
> +test_vec_duplicate_folding ()
> +{
> + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
> + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> + /* This will be 1 if VEC_MODE isn't a vector mode. */
> + unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> +
> + tree type = build_vector_type (ssizetype, nunits);
> + tree dup5 = build_vector_from_val (type, ssize_int (5));
> + tree dup3 = build_vector_from_val (type, ssize_int (3));
> +
> + tree neg_dup5 = fold_unary (NEGATE_EXPR, type, dup5);
> + ASSERT_EQ (uniform_vector_p (neg_dup5), ssize_int (-5));
> +
> + tree not_dup5 = fold_unary (BIT_NOT_EXPR, type, dup5);
> + ASSERT_EQ (uniform_vector_p (not_dup5), ssize_int (-6));
> +
> + tree dup5_plus_dup3 = fold_binary (PLUS_EXPR, type, dup5, dup3);
> + ASSERT_EQ (uniform_vector_p (dup5_plus_dup3), ssize_int (8));
> +
> + tree dup5_lsl_2 = fold_binary (LSHIFT_EXPR, type, dup5, ssize_int (2));
> + ASSERT_EQ (uniform_vector_p (dup5_lsl_2), ssize_int (20));
> +
> + tree size_vector = build_vector_type (sizetype, nunits);
> + tree size_dup5 = fold_convert (size_vector, dup5);
> + ASSERT_EQ (uniform_vector_p (size_dup5), size_int (5));
> +
> + tree dup5_expr = fold_unary (VEC_DUPLICATE_EXPR, type, ssize_int (5));
> + tree dup5_cst = build_vector_from_val (type, ssize_int (5));
> + ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
> +}
> +
> /* Run all of the selftests within this file. */
>
> void
> @@ -14419,6 +14535,7 @@ fold_const_c_tests ()
> {
> test_arithmetic_folding ();
> test_vector_folding ();
> + test_vec_duplicate_folding ();
> }
>
> } // namespace selftest
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs.def 2017-11-06 12:40:40.286573506 +0000
> @@ -364,3 +364,5 @@ OPTAB_D (atomic_xor_optab, "atomic_xor$I
>
> OPTAB_D (get_thread_pointer_optab, "get_thread_pointer$I$a")
> OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
> +
> +OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs-tree.c 2017-11-06 12:40:40.286573506 +0000
> @@ -210,6 +210,9 @@ optab_for_tree_code (enum tree_code code
> return TYPE_UNSIGNED (type) ?
> vec_pack_ufix_trunc_optab : vec_pack_sfix_trunc_optab;
>
> + case VEC_DUPLICATE_EXPR:
> + return vec_duplicate_optab;
> +
> default:
> break;
> }
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs.h 2017-11-06 12:40:40.287566435 +0000
> @@ -181,6 +181,7 @@ extern rtx simplify_expand_binop (machin
> enum optab_methods methods);
> extern bool force_expand_binop (machine_mode, optab, rtx, rtx, rtx, int,
> enum optab_methods);
> +extern rtx expand_vector_broadcast (machine_mode, rtx);
>
> /* Generate code for a simple binary or unary operation. "Simple" in
> this case means "can be unambiguously described by a (mode, code)
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/optabs.c 2017-11-06 12:40:40.286573506 +0000
> @@ -367,7 +367,7 @@ force_expand_binop (machine_mode mode, o
> mode of OP must be the element mode of VMODE. If OP is a constant,
> then the return value will be a constant. */
>
> -static rtx
> +rtx
> expand_vector_broadcast (machine_mode vmode, rtx op)
> {
> enum insn_code icode;
> @@ -380,6 +380,16 @@ expand_vector_broadcast (machine_mode vm
> if (valid_for_const_vec_duplicate_p (vmode, op))
> return gen_const_vec_duplicate (vmode, op);
>
> + icode = optab_handler (vec_duplicate_optab, vmode);
> + if (icode != CODE_FOR_nothing)
> + {
> + struct expand_operand ops[2];
> + create_output_operand (&ops[0], NULL_RTX, vmode);
> + create_input_operand (&ops[1], op, GET_MODE (op));
> + expand_insn (icode, 2, ops);
> + return ops[0].value;
> + }
> +
> /* ??? If the target doesn't have a vec_init, then we have no easy way
> of performing this operation. Most of this sort of generic support
> is hidden away in the vector lowering support in gimple. */
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/expr.c 2017-11-06 12:40:40.281608865 +0000
> @@ -6576,7 +6576,8 @@ store_constructor (tree exp, rtx target,
> constructor_elt *ce;
> int i;
> int need_to_clear;
> - int icode = CODE_FOR_nothing;
> + insn_code icode = CODE_FOR_nothing;
> + tree elt;
> tree elttype = TREE_TYPE (type);
> int elt_size = tree_to_uhwi (TYPE_SIZE (elttype));
> machine_mode eltmode = TYPE_MODE (elttype);
> @@ -6586,13 +6587,30 @@ store_constructor (tree exp, rtx target,
> unsigned n_elts;
> alias_set_type alias;
> bool vec_vec_init_p = false;
> + machine_mode mode = GET_MODE (target);
>
> gcc_assert (eltmode != BLKmode);
>
> + /* Try using vec_duplicate_optab for uniform vectors. */
> + if (!TREE_SIDE_EFFECTS (exp)
> + && VECTOR_MODE_P (mode)
> + && eltmode == GET_MODE_INNER (mode)
> + && ((icode = optab_handler (vec_duplicate_optab, mode))
> + != CODE_FOR_nothing)
> + && (elt = uniform_vector_p (exp)))
> + {
> + struct expand_operand ops[2];
> + create_output_operand (&ops[0], target, mode);
> + create_input_operand (&ops[1], expand_normal (elt), eltmode);
> + expand_insn (icode, 2, ops);
> + if (!rtx_equal_p (target, ops[0].value))
> + emit_move_insn (target, ops[0].value);
> + break;
> + }
> +
> n_elts = TYPE_VECTOR_SUBPARTS (type);
> - if (REG_P (target) && VECTOR_MODE_P (GET_MODE (target)))
> + if (REG_P (target) && VECTOR_MODE_P (mode))
> {
> - machine_mode mode = GET_MODE (target);
> machine_mode emode = eltmode;
>
> if (CONSTRUCTOR_NELTS (exp)
> @@ -6604,7 +6622,7 @@ store_constructor (tree exp, rtx target,
> == n_elts);
> emode = TYPE_MODE (etype);
> }
> - icode = (int) convert_optab_handler (vec_init_optab, mode, emode);
> + icode = convert_optab_handler (vec_init_optab, mode, emode);
> if (icode != CODE_FOR_nothing)
> {
> unsigned int i, n = n_elts;
> @@ -6652,7 +6670,7 @@ store_constructor (tree exp, rtx target,
> if (need_to_clear && size > 0 && !vector)
> {
> if (REG_P (target))
> - emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> + emit_move_insn (target, CONST0_RTX (mode));
> else
> clear_storage (target, GEN_INT (size), BLOCK_OP_NORMAL);
> cleared = 1;
> @@ -6660,7 +6678,7 @@ store_constructor (tree exp, rtx target,
>
> /* Inform later passes that the old value is dead. */
> if (!cleared && !vector && REG_P (target))
> - emit_move_insn (target, CONST0_RTX (GET_MODE (target)));
> + emit_move_insn (target, CONST0_RTX (mode));
>
> if (MEM_P (target))
> alias = MEM_ALIAS_SET (target);
> @@ -6711,8 +6729,7 @@ store_constructor (tree exp, rtx target,
>
> if (vector)
> emit_insn (GEN_FCN (icode) (target,
> - gen_rtx_PARALLEL (GET_MODE (target),
> - vector)));
> + gen_rtx_PARALLEL (mode, vector)));
> break;
> }
>
> @@ -7690,6 +7707,19 @@ expand_operands (tree exp0, tree exp1, r
> }
>
>
> +/* Expand constant vector element ELT, which has mode MODE. This is used
> + for members of VECTOR_CST and VEC_DUPLICATE_CST. */
> +
> +static rtx
> +const_vector_element (scalar_mode mode, const_tree elt)
> +{
> + if (TREE_CODE (elt) == REAL_CST)
> + return const_double_from_real_value (TREE_REAL_CST (elt), mode);
> + if (TREE_CODE (elt) == FIXED_CST)
> + return CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt), mode);
> + return immed_wide_int_const (wi::to_wide (elt), mode);
> +}
> +
> /* Return a MEM that contains constant EXP. DEFER is as for
> output_constant_def and MODIFIER is as for expand_expr. */
>
> @@ -9555,6 +9585,12 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
> target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
> return target;
>
> + case VEC_DUPLICATE_EXPR:
> + op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
> + target = expand_vector_broadcast (mode, op0);
> + gcc_assert (target);
> + return target;
> +
> case BIT_INSERT_EXPR:
> {
> unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -9988,6 +10024,11 @@ expand_expr_real_1 (tree exp, rtx target
> tmode, modifier);
> }
>
> + case VEC_DUPLICATE_CST:
> + op0 = const_vector_element (GET_MODE_INNER (mode),
> + VEC_DUPLICATE_CST_ELT (exp));
> + return gen_const_vec_duplicate (mode, op0);
> +
> case CONST_DECL:
> if (modifier == EXPAND_WRITE)
> {
> @@ -11749,8 +11790,7 @@ const_vector_from_tree (tree exp)
> {
> rtvec v;
> unsigned i, units;
> - tree elt;
> - machine_mode inner, mode;
> + machine_mode mode;
>
> mode = TYPE_MODE (TREE_TYPE (exp));
>
> @@ -11761,23 +11801,12 @@ const_vector_from_tree (tree exp)
> return const_vector_mask_from_tree (exp);
>
> units = VECTOR_CST_NELTS (exp);
> - inner = GET_MODE_INNER (mode);
>
> v = rtvec_alloc (units);
>
> for (i = 0; i < units; ++i)
> - {
> - elt = VECTOR_CST_ELT (exp, i);
> -
> - if (TREE_CODE (elt) == REAL_CST)
> - RTVEC_ELT (v, i) = const_double_from_real_value (TREE_REAL_CST (elt),
> - inner);
> - else if (TREE_CODE (elt) == FIXED_CST)
> - RTVEC_ELT (v, i) = CONST_FIXED_FROM_FIXED_VALUE (TREE_FIXED_CST (elt),
> - inner);
> - else
> - RTVEC_ELT (v, i) = immed_wide_int_const (wi::to_wide (elt), inner);
> - }
> + RTVEC_ELT (v, i) = const_vector_element (GET_MODE_INNER (mode),
> + VECTOR_CST_ELT (exp, i));
>
> return gen_rtx_CONST_VECTOR (mode, v);
> }
> Index: gcc/internal-fn.c
> ===================================================================
> --- gcc/internal-fn.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/internal-fn.c 2017-11-06 12:40:40.284587650 +0000
> @@ -1911,12 +1911,12 @@ expand_vector_ubsan_overflow (location_t
> emit_move_insn (cntvar, const0_rtx);
> emit_label (loop_lab);
> }
> - if (TREE_CODE (arg0) != VECTOR_CST)
> + if (!CONSTANT_CLASS_P (arg0))
> {
> rtx arg0r = expand_normal (arg0);
> arg0 = make_tree (TREE_TYPE (arg0), arg0r);
> }
> - if (TREE_CODE (arg1) != VECTOR_CST)
> + if (!CONSTANT_CLASS_P (arg1))
> {
> rtx arg1r = expand_normal (arg1);
> arg1 = make_tree (TREE_TYPE (arg1), arg1r);
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-cfg.c 2017-11-06 12:40:40.287566435 +0000
> @@ -3798,6 +3798,17 @@ verify_gimple_assign_unary (gassign *stm
> case CONJ_EXPR:
> break;
>
> + case VEC_DUPLICATE_EXPR:
> + if (TREE_CODE (lhs_type) != VECTOR_TYPE
> + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> + {
> + error ("vec_duplicate should be from a scalar to a like vector");
> + debug_generic_expr (lhs_type);
> + debug_generic_expr (rhs1_type);
> + return true;
> + }
> + return false;
> +
> default:
> gcc_unreachable ();
> }
> @@ -4468,6 +4479,7 @@ verify_gimple_assign_single (gassign *st
> case FIXED_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> + case VEC_DUPLICATE_CST:
> case STRING_CST:
> return res;
>
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c 2017-11-06 12:40:39.845713389 +0000
> +++ gcc/tree-inline.c 2017-11-06 12:40:40.289552291 +0000
> @@ -3930,6 +3930,7 @@ estimate_operator_cost (enum tree_code c
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> + case VEC_DUPLICATE_EXPR:
>
> return 1;
>
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [06/nn] Add VEC_SERIES_{CST,EXPR} and associated optab
2017-11-06 15:21 ` Richard Sandiford
@ 2017-11-07 10:38 ` Richard Biener
0 siblings, 0 replies; 90+ messages in thread
From: Richard Biener @ 2017-11-07 10:38 UTC (permalink / raw)
To: Richard Biener, GCC Patches, Richard Sandiford
On Mon, Nov 6, 2017 at 4:21 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:
>> On Thu, Oct 26, 2017 at 2:23 PM, Richard Biener
>> <richard.guenther@gmail.com> wrote:
>>> On Mon, Oct 23, 2017 at 1:20 PM, Richard Sandiford
>>> <richard.sandiford@linaro.org> wrote:
>>>> Similarly to the VEC_DUPLICATE_{CST,EXPR}, this patch adds two
>>>> tree code equivalents of the VEC_SERIES rtx code. VEC_SERIES_EXPR
>>>> is for non-constant inputs and is a normal tcc_binary. VEC_SERIES_CST
>>>> is a tcc_constant.
>>>>
>>>> Like VEC_DUPLICATE_CST, VEC_SERIES_CST is only used for variable-length
>>>> vectors. This avoids the need to handle combinations of VECTOR_CST
>>>> and VEC_SERIES_CST.
>>>
>>> Similar to the other patch can you document and verify that VEC_SERIES_CST
>>> is only used on variable length vectors?
>
> OK, done with the below, which also makes build_vec_series create
> a VECTOR_CST for fixed-length vectors. I also added some selftests.
>
>>> Ok with that change.
>>
>> Btw, did you think of merging VEC_DUPLICATE_CST with VEC_SERIES_CST
>> via setting step == 0? I think you can do {1, 1, 1, 1... } + {1, 2,3
>> ,4,5 } constant
>> folding but you don't implement that.
>
> That was done via vec_series_equivalent_p.
Constant folding of VEC_DUPLICATE_CST + VEC_SERIES_CST? Didn't see that.
> The problem with using VEC_SERIES with a step of zero is that we'd
> then have to define VEC_SERIES for floats too (even in strict math
> modes), but probably only for the special case of a zero step.
> I think that'd end up being more complicated overall.
>
>> Propagation can also turn
>> VEC_SERIES_EXPR into VEC_SERIES_CST and VEC_DUPLICATE_EXPR
>> into VEC_DUPLICATE_CST (didn't see the former, don't remember the latter).
>
> VEC_SERIES_EXPR -> VEC_SERIES_CST/VECTOR_CST was done by const_binop.
Ok, must have missed that. Would be nice to add comments before the
"transform".
> And yeah, VEC_DUPLICATE_EXPR -> VEC_DUPLICATE_CST/VECTOR_CST was done
> by const_unop in the VEC_DUPLICATE patch.
>
> Tested as before. OK to install?
Ok.
Thanks,
Richard.
> Thanks,
> Richard
>
>
> 2017-11-06 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * doc/generic.texi (VEC_SERIES_CST, VEC_SERIES_EXPR): Document.
> * doc/md.texi (vec_series@var{m}): Document.
> * tree.def (VEC_SERIES_CST, VEC_SERIES_EXPR): New tree codes.
> * tree.h (TREE_OVERFLOW): Add VEC_SERIES_CST to the list of valid
> codes.
> (VEC_SERIES_CST_BASE, VEC_SERIES_CST_STEP): New macros.
> (build_vec_series): Declare.
> * tree.c (tree_node_structure_for_code, tree_code_size, tree_size)
> (add_expr, walk_tree_1, drop_tree_overflow): Handle VEC_SERIES_CST.
> (build_vec_series_cst, build_vec_series): New functions.
> * cfgexpand.c (expand_debug_expr): Handle the new codes.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * dwarf2out.c (rtl_for_decl_init): Handle VEC_SERIES_CST.
> * gimple-expr.h (is_gimple_constant): Likewise.
> * gimplify.c (gimplify_expr): Likewise.
> * graphite-scop-detection.c (scan_tree_for_params): Likewise.
> * ipa-icf-gimple.c (func_checker::compare_cst_or_decl): Likewise.
> (func_checker::compare_operand): Likewise.
> * ipa-icf.c (sem_item::add_expr, sem_variable::equals): Likewise.
> * print-tree.c (print_node): Likewise.
> * tree-ssa-loop.c (for_each_index): Likewise.
> * tree-ssa-pre.c (create_component_ref_by_pieces_1): Likewise.
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Likewise.
> (ao_ref_init_from_vn_reference): Likewise.
> * varasm.c (const_hash_1, compare_constant): Likewise.
> * fold-const.c (negate_expr_p, fold_negate_expr_1, operand_equal_p)
> (fold_checksum_tree): Likewise.
> (vec_series_equivalent_p): New function.
> (const_binop): Use it. Fold VEC_SERIES_EXPRs of constants.
> (test_vec_series_folding): New function.
> (fold_const_c_tests): Call it.
> * expmed.c (make_tree): Handle VEC_SERIES.
> * gimple-pretty-print.c (dump_binary_rhs): Likewise.
> * tree-inline.c (estimate_operator_cost): Likewise.
> * expr.c (const_vector_element): Include VEC_SERIES_CST in comment.
> (expand_expr_real_2): Handle VEC_SERIES_EXPR.
> (expand_expr_real_1): Handle VEC_SERIES_CST.
> * optabs.def (vec_series_optab): New optab.
> * optabs.h (expand_vec_series_expr): Declare.
> * optabs.c (expand_vec_series_expr): New function.
> * optabs-tree.c (optab_for_tree_code): Handle VEC_SERIES_EXPR.
> * tree-cfg.c (verify_gimple_assign_binary): Handle VEC_SERIES_EXPR.
> (verify_gimple_assign_single): Handle VEC_SERIES_CST.
> * tree-vect-generic.c (expand_vector_operations_1): Check that
> the operands also have vector type.
>
> Index: gcc/doc/generic.texi
> ===================================================================
> --- gcc/doc/generic.texi 2017-11-06 12:20:31.075167123 +0000
> +++ gcc/doc/generic.texi 2017-11-06 12:21:29.321209826 +0000
> @@ -1037,6 +1037,7 @@ As this example indicates, the operands
> @tindex COMPLEX_CST
> @tindex VECTOR_CST
> @tindex VEC_DUPLICATE_CST
> +@tindex VEC_SERIES_CST
> @tindex STRING_CST
> @findex TREE_STRING_LENGTH
> @findex TREE_STRING_POINTER
> @@ -1098,6 +1099,18 @@ instead. The scalar element value is gi
> @code{VEC_DUPLICATE_CST_ELT} and has the same restrictions as the
> element of a @code{VECTOR_CST}.
>
> +@item VEC_SERIES_CST
> +These nodes represent a vector constant in which element @var{i}
> +has the value @samp{@var{base} + @var{i} * @var{step}}, for some
> +constant @var{base} and @var{step}. The value of @var{base} is
> +given by @code{VEC_SERIES_CST_BASE} and the value of @var{step} is
> +given by @code{VEC_SERIES_CST_STEP}.
> +
> +At present only variable-length vectors use @code{VEC_SERIES_CST};
> +constant-length vectors use @code{VECTOR_CST} instead. The nodes
> +are also restricted to integral types, in order to avoid specifying
> +the rounding behavior for floating-point types.
> +
> @item STRING_CST
> These nodes represent string-constants. The @code{TREE_STRING_LENGTH}
> returns the length of the string, as an @code{int}. The
> @@ -1702,6 +1715,7 @@ a value from @code{enum annot_expr_kind}
> @node Vectors
> @subsection Vectors
> @tindex VEC_DUPLICATE_EXPR
> +@tindex VEC_SERIES_EXPR
> @tindex VEC_LSHIFT_EXPR
> @tindex VEC_RSHIFT_EXPR
> @tindex VEC_WIDEN_MULT_HI_EXPR
> @@ -1721,6 +1735,14 @@ a value from @code{enum annot_expr_kind}
> This node has a single operand and represents a vector in which every
> element is equal to that operand.
>
> +@item VEC_SERIES_EXPR
> +This node represents a vector formed from a scalar base and step,
> +given as the first and second operands respectively. Element @var{i}
> +of the result is equal to @samp{@var{base} + @var{i}*@var{step}}.
> +
> +This node is restricted to integral types, in order to avoid
> +specifying the rounding behavior for floating-point types.
> +
> @item VEC_LSHIFT_EXPR
> @itemx VEC_RSHIFT_EXPR
> These nodes represent whole vector left and right shifts, respectively.
> Index: gcc/doc/md.texi
> ===================================================================
> --- gcc/doc/md.texi 2017-11-06 12:20:31.076995065 +0000
> +++ gcc/doc/md.texi 2017-11-06 12:21:29.322209826 +0000
> @@ -4899,6 +4899,19 @@ vectors go through the @code{mov@var{m}}
>
> This pattern is not allowed to @code{FAIL}.
>
> +@cindex @code{vec_series@var{m}} instruction pattern
> +@item @samp{vec_series@var{m}}
> +Initialize vector output operand 0 so that element @var{i} is equal to
> +operand 1 plus @var{i} times operand 2. In other words, create a linear
> +series whose base value is operand 1 and whose step is operand 2.
> +
> +The vector output has mode @var{m} and the scalar inputs have the mode
> +appropriate for one element of @var{m}. This pattern is not used for
> +floating-point vectors, in order to avoid having to specify the
> +rounding behavior for @var{i} > 1.
> +
> +This pattern is not allowed to @code{FAIL}.
> +
> @cindex @code{vec_cmp@var{m}@var{n}} instruction pattern
> @item @samp{vec_cmp@var{m}@var{n}}
> Output a vector comparison. Operand 0 of mode @var{n} is the destination for
> Index: gcc/tree.def
> ===================================================================
> --- gcc/tree.def 2017-11-06 12:20:31.098930366 +0000
> +++ gcc/tree.def 2017-11-06 12:21:29.335209826 +0000
> @@ -309,6 +309,12 @@ DEFTREECODE (VECTOR_CST, "vector_cst", t
> vectors; fixed-length vectors must use VECTOR_CST instead. */
> DEFTREECODE (VEC_DUPLICATE_CST, "vec_duplicate_cst", tcc_constant, 0)
>
> +/* Represents a vector constant in which element i is equal to
> + VEC_SERIES_CST_BASE + i * VEC_SERIES_CST_STEP. This is only ever
> + used for variable-length vectors; fixed-length vectors must use
> + VECTOR_CST instead. */
> +DEFTREECODE (VEC_SERIES_CST, "vec_series_cst", tcc_constant, 0)
> +
> /* Contents are TREE_STRING_LENGTH and the actual contents of the string. */
> DEFTREECODE (STRING_CST, "string_cst", tcc_constant, 0)
>
> @@ -542,6 +548,16 @@ DEFTREECODE (COND_EXPR, "cond_expr", tcc
> /* Represents a vector in which every element is equal to operand 0. */
> DEFTREECODE (VEC_DUPLICATE_EXPR, "vec_duplicate_expr", tcc_unary, 1)
>
> +/* Vector series created from a start (base) value and a step.
> +
> + A = VEC_SERIES_EXPR (B, C)
> +
> + means
> +
> + for (i = 0; i < N; i++)
> + A[i] = B + C * i; */
> +DEFTREECODE (VEC_SERIES_EXPR, "vec_series_expr", tcc_binary, 2)
> +
> /* Vector conditional expression. It is like COND_EXPR, but with
> vector operands.
>
> Index: gcc/tree.h
> ===================================================================
> --- gcc/tree.h 2017-11-06 12:20:31.099844337 +0000
> +++ gcc/tree.h 2017-11-06 12:21:29.336209826 +0000
> @@ -709,8 +709,8 @@ #define TREE_SYMBOL_REFERENCED(NODE) \
> #define TYPE_REF_CAN_ALIAS_ALL(NODE) \
> (PTR_OR_REF_CHECK (NODE)->base.static_flag)
>
> -/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST or VEC_DUPLICATE_CST,
> - this means there was an overflow in folding. */
> +/* In an INTEGER_CST, REAL_CST, COMPLEX_CST, VECTOR_CST, VEC_DUPLICATE_CST
> + or VEC_SERES_CST, this means there was an overflow in folding. */
>
> #define TREE_OVERFLOW(NODE) (CST_CHECK (NODE)->base.public_flag)
>
> @@ -1013,6 +1013,12 @@ #define VECTOR_CST_ELT(NODE,IDX) (VECTOR
> #define VEC_DUPLICATE_CST_ELT(NODE) \
> (VEC_DUPLICATE_CST_CHECK (NODE)->vector.elts[0])
>
> +/* In a VEC_SERIES_CST node. */
> +#define VEC_SERIES_CST_BASE(NODE) \
> + (VEC_SERIES_CST_CHECK (NODE)->vector.elts[0])
> +#define VEC_SERIES_CST_STEP(NODE) \
> + (VEC_SERIES_CST_CHECK (NODE)->vector.elts[1])
> +
> /* Define fields and accessors for some special-purpose tree nodes. */
>
> #define IDENTIFIER_LENGTH(NODE) \
> @@ -4017,6 +4023,7 @@ extern tree make_vector (unsigned CXX_ME
> extern tree build_vector (tree, vec<tree> CXX_MEM_STAT_INFO);
> extern tree build_vector_from_ctor (tree, vec<constructor_elt, va_gc> *);
> extern tree build_vector_from_val (tree, tree);
> +extern tree build_vec_series (tree, tree, tree);
> extern void recompute_constructor_flags (tree);
> extern void verify_constructor_flags (tree);
> extern tree build_constructor (tree, vec<constructor_elt, va_gc> *);
> Index: gcc/tree.c
> ===================================================================
> --- gcc/tree.c 2017-11-06 12:20:31.098930366 +0000
> +++ gcc/tree.c 2017-11-06 12:21:29.335209826 +0000
> @@ -465,6 +465,7 @@ tree_node_structure_for_code (enum tree_
> case COMPLEX_CST: return TS_COMPLEX;
> case VECTOR_CST: return TS_VECTOR;
> case VEC_DUPLICATE_CST: return TS_VECTOR;
> + case VEC_SERIES_CST: return TS_VECTOR;
> case STRING_CST: return TS_STRING;
> /* tcc_exceptional cases. */
> case ERROR_MARK: return TS_COMMON;
> @@ -831,6 +832,7 @@ tree_code_size (enum tree_code code)
> case COMPLEX_CST: return sizeof (tree_complex);
> case VECTOR_CST: return sizeof (tree_vector);
> case VEC_DUPLICATE_CST: return sizeof (tree_vector);
> + case VEC_SERIES_CST: return sizeof (tree_vector) + sizeof (tree);
> case STRING_CST: gcc_unreachable ();
> default:
> gcc_checking_assert (code >= NUM_TREE_CODES);
> @@ -895,6 +897,9 @@ tree_size (const_tree node)
> case VEC_DUPLICATE_CST:
> return sizeof (struct tree_vector);
>
> + case VEC_SERIES_CST:
> + return sizeof (struct tree_vector) + sizeof (tree);
> +
> case STRING_CST:
> return TREE_STRING_LENGTH (node) + offsetof (struct tree_string, str) + 1;
>
> @@ -1730,6 +1735,34 @@ build_vec_duplicate_cst (tree type, tree
> return t;
> }
>
> +/* Build a new VEC_SERIES_CST with type TYPE, base BASE and step STEP.
> +
> + Note that this function is only suitable for callers that specifically
> + need a VEC_SERIES_CST node. Use build_vec_series to build a general
> + series vector from a general base and step. */
> +
> +static tree
> +build_vec_series_cst (tree type, tree base, tree step MEM_STAT_DECL)
> +{
> + /* Shouldn't be used until we have variable-length vectors. */
> + gcc_unreachable ();
> +
> + int length = sizeof (struct tree_vector) + sizeof (tree);
> +
> + record_node_allocation_statistics (VEC_SERIES_CST, length);
> +
> + tree t = ggc_alloc_cleared_tree_node_stat (length PASS_MEM_STAT);
> +
> + TREE_SET_CODE (t, VEC_SERIES_CST);
> + TREE_TYPE (t) = type;
> + t->base.u.nelts = 2;
> + VEC_SERIES_CST_BASE (t) = base;
> + VEC_SERIES_CST_STEP (t) = step;
> + TREE_CONSTANT (t) = 1;
> +
> + return t;
> +}
> +
> /* Build a newly constructed VECTOR_CST node of length LEN. */
>
> tree
> @@ -1847,6 +1880,33 @@ build_vector_from_val (tree vectype, tre
> }
> }
>
> +/* Build a vector series of type TYPE in which element I has the value
> + BASE + I * STEP. The result is a constant if BASE and STEP are constant
> + and a VEC_SERIES_EXPR otherwise. */
> +
> +tree
> +build_vec_series (tree type, tree base, tree step)
> +{
> + if (integer_zerop (step))
> + return build_vector_from_val (type, base);
> + if (CONSTANT_CLASS_P (base) && CONSTANT_CLASS_P (step))
> + {
> + unsigned int nunits = TYPE_VECTOR_SUBPARTS (type);
> + if (0)
> + return build_vec_series_cst (type, base, step);
> +
> + auto_vec<tree, 32> v (nunits);
> + v.quick_push (base);
> + for (unsigned int i = 1; i < nunits; ++i)
> + {
> + base = const_binop (PLUS_EXPR, TREE_TYPE (base), base, step);
> + v.quick_push (base);
> + }
> + return build_vector (type, v);
> + }
> + return build2 (VEC_SERIES_EXPR, type, base, step);
> +}
> +
> /* Something has messed with the elements of CONSTRUCTOR C after it was built;
> calculate TREE_CONSTANT and TREE_SIDE_EFFECTS. */
>
> @@ -7162,6 +7222,10 @@ add_expr (const_tree t, inchash::hash &h
> case VEC_DUPLICATE_CST:
> inchash::add_expr (VEC_DUPLICATE_CST_ELT (t), hstate);
> return;
> + case VEC_SERIES_CST:
> + inchash::add_expr (VEC_SERIES_CST_BASE (t), hstate);
> + inchash::add_expr (VEC_SERIES_CST_STEP (t), hstate);
> + return;
> case SSA_NAME:
> /* We can just compare by pointer. */
> hstate.add_hwi (SSA_NAME_VERSION (t));
> @@ -11210,6 +11274,7 @@ #define WALK_SUBTREE_TAIL(NODE) \
> case FIXED_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> case BLOCK:
> case PLACEHOLDER_EXPR:
> @@ -12502,6 +12567,15 @@ drop_tree_overflow (tree t)
> if (TREE_OVERFLOW (*elt))
> *elt = drop_tree_overflow (*elt);
> }
> + if (TREE_CODE (t) == VEC_SERIES_CST)
> + {
> + tree *elt = &VEC_SERIES_CST_BASE (t);
> + if (TREE_OVERFLOW (*elt))
> + *elt = drop_tree_overflow (*elt);
> + elt = &VEC_SERIES_CST_STEP (t);
> + if (TREE_OVERFLOW (*elt))
> + *elt = drop_tree_overflow (*elt);
> + }
> return t;
> }
>
> Index: gcc/cfgexpand.c
> ===================================================================
> --- gcc/cfgexpand.c 2017-11-06 12:20:31.074253152 +0000
> +++ gcc/cfgexpand.c 2017-11-06 12:21:29.321209826 +0000
> @@ -5070,6 +5070,8 @@ expand_debug_expr (tree exp)
> case VEC_PERM_EXPR:
> case VEC_DUPLICATE_CST:
> case VEC_DUPLICATE_EXPR:
> + case VEC_SERIES_CST:
> + case VEC_SERIES_EXPR:
> return NULL;
>
> /* Misc codes. */
> Index: gcc/tree-pretty-print.c
> ===================================================================
> --- gcc/tree-pretty-print.c 2017-11-06 12:20:31.093446541 +0000
> +++ gcc/tree-pretty-print.c 2017-11-06 12:21:29.333209826 +0000
> @@ -1808,6 +1808,14 @@ dump_generic_node (pretty_printer *pp, t
> pp_string (pp, ", ... }");
> break;
>
> + case VEC_SERIES_CST:
> + pp_string (pp, "{ ");
> + dump_generic_node (pp, VEC_SERIES_CST_BASE (node), spc, flags, false);
> + pp_string (pp, ", +, ");
> + dump_generic_node (pp, VEC_SERIES_CST_STEP (node), spc, flags, false);
> + pp_string (pp, "}");
> + break;
> +
> case FUNCTION_TYPE:
> case METHOD_TYPE:
> dump_generic_node (pp, TREE_TYPE (node), spc, flags, false);
> @@ -3221,6 +3229,7 @@ dump_generic_node (pretty_printer *pp, t
> pp_string (pp, " > ");
> break;
>
> + case VEC_SERIES_EXPR:
> case VEC_WIDEN_MULT_HI_EXPR:
> case VEC_WIDEN_MULT_LO_EXPR:
> case VEC_WIDEN_MULT_EVEN_EXPR:
> Index: gcc/dwarf2out.c
> ===================================================================
> --- gcc/dwarf2out.c 2017-11-06 12:20:31.080650948 +0000
> +++ gcc/dwarf2out.c 2017-11-06 12:21:29.325209826 +0000
> @@ -18879,6 +18879,7 @@ rtl_for_decl_init (tree init, tree type)
> {
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> break;
> case CONSTRUCTOR:
> if (TREE_CONSTANT (init))
> Index: gcc/gimple-expr.h
> ===================================================================
> --- gcc/gimple-expr.h 2017-11-06 12:20:31.087048745 +0000
> +++ gcc/gimple-expr.h 2017-11-06 12:21:29.328209826 +0000
> @@ -135,6 +135,7 @@ is_gimple_constant (const_tree t)
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> return true;
>
> Index: gcc/gimplify.c
> ===================================================================
> --- gcc/gimplify.c 2017-11-06 12:20:31.088876686 +0000
> +++ gcc/gimplify.c 2017-11-06 12:21:29.329209826 +0000
> @@ -11508,6 +11508,7 @@ gimplify_expr (tree *expr_p, gimple_seq
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> /* Drop the overflow flag on constants, we do not want
> that in the GIMPLE IL. */
> if (TREE_OVERFLOW_P (*expr_p))
> Index: gcc/graphite-scop-detection.c
> ===================================================================
> --- gcc/graphite-scop-detection.c 2017-11-06 12:20:31.088876686 +0000
> +++ gcc/graphite-scop-detection.c 2017-11-06 12:21:29.329209826 +0000
> @@ -1213,6 +1213,7 @@ scan_tree_for_params (sese_info_p s, tre
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> break;
>
> default:
> Index: gcc/ipa-icf-gimple.c
> ===================================================================
> --- gcc/ipa-icf-gimple.c 2017-11-06 12:20:31.088876686 +0000
> +++ gcc/ipa-icf-gimple.c 2017-11-06 12:21:29.329209826 +0000
> @@ -334,6 +334,7 @@ func_checker::compare_cst_or_decl (tree
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> case REAL_CST:
> {
> @@ -530,6 +531,7 @@ func_checker::compare_operand (tree t1,
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> case REAL_CST:
> case FUNCTION_DECL:
> Index: gcc/ipa-icf.c
> ===================================================================
> --- gcc/ipa-icf.c 2017-11-06 12:20:31.089790657 +0000
> +++ gcc/ipa-icf.c 2017-11-06 12:21:29.330209826 +0000
> @@ -1480,6 +1480,7 @@ sem_item::add_expr (const_tree exp, inch
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> inchash::add_expr (exp, hstate);
> break;
> case CONSTRUCTOR:
> @@ -2040,6 +2041,11 @@ sem_variable::equals (tree t1, tree t2)
> case VEC_DUPLICATE_CST:
> return sem_variable::equals (VEC_DUPLICATE_CST_ELT (t1),
> VEC_DUPLICATE_CST_ELT (t2));
> + case VEC_SERIES_CST:
> + return (sem_variable::equals (VEC_SERIES_CST_BASE (t1),
> + VEC_SERIES_CST_BASE (t2))
> + && sem_variable::equals (VEC_SERIES_CST_STEP (t1),
> + VEC_SERIES_CST_STEP (t2)));
> case ARRAY_REF:
> case ARRAY_RANGE_REF:
> {
> Index: gcc/print-tree.c
> ===================================================================
> --- gcc/print-tree.c 2017-11-06 12:20:31.090704628 +0000
> +++ gcc/print-tree.c 2017-11-06 12:21:29.331209826 +0000
> @@ -787,6 +787,11 @@ print_node (FILE *file, const char *pref
> print_node (file, "elt", VEC_DUPLICATE_CST_ELT (node), indent + 4);
> break;
>
> + case VEC_SERIES_CST:
> + print_node (file, "base", VEC_SERIES_CST_BASE (node), indent + 4);
> + print_node (file, "step", VEC_SERIES_CST_STEP (node), indent + 4);
> + break;
> +
> case COMPLEX_CST:
> print_node (file, "real", TREE_REALPART (node), indent + 4);
> print_node (file, "imag", TREE_IMAGPART (node), indent + 4);
> Index: gcc/tree-ssa-loop.c
> ===================================================================
> --- gcc/tree-ssa-loop.c 2017-11-06 12:20:31.093446541 +0000
> +++ gcc/tree-ssa-loop.c 2017-11-06 12:21:29.333209826 +0000
> @@ -617,6 +617,7 @@ for_each_index (tree *addr_p, bool (*cbc
> case RESULT_DECL:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case COMPLEX_CST:
> case INTEGER_CST:
> case REAL_CST:
> Index: gcc/tree-ssa-pre.c
> ===================================================================
> --- gcc/tree-ssa-pre.c 2017-11-06 12:20:31.093446541 +0000
> +++ gcc/tree-ssa-pre.c 2017-11-06 12:21:29.333209826 +0000
> @@ -2628,6 +2628,7 @@ create_component_ref_by_pieces_1 (basic_
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case VAR_DECL:
> Index: gcc/tree-ssa-sccvn.c
> ===================================================================
> --- gcc/tree-ssa-sccvn.c 2017-11-06 12:20:31.094360512 +0000
> +++ gcc/tree-ssa-sccvn.c 2017-11-06 12:21:29.334209826 +0000
> @@ -867,6 +867,7 @@ copy_reference_ops_from_ref (tree ref, v
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case REAL_CST:
> case FIXED_CST:
> case CONSTRUCTOR:
> @@ -1060,6 +1061,7 @@ ao_ref_init_from_vn_reference (ao_ref *r
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case REAL_CST:
> case CONSTRUCTOR:
> case CONST_DECL:
> Index: gcc/varasm.c
> ===================================================================
> --- gcc/varasm.c 2017-11-06 12:20:31.100758308 +0000
> +++ gcc/varasm.c 2017-11-06 12:21:29.337209826 +0000
> @@ -3065,6 +3065,10 @@ const_hash_1 (const tree exp)
> return (const_hash_1 (TREE_OPERAND (exp, 0)) * 9
> + const_hash_1 (TREE_OPERAND (exp, 1)));
>
> + case VEC_SERIES_CST:
> + return (const_hash_1 (VEC_SERIES_CST_BASE (exp)) * 11
> + + const_hash_1 (VEC_SERIES_CST_STEP (exp)));
> +
> CASE_CONVERT:
> return const_hash_1 (TREE_OPERAND (exp, 0)) * 7 + 2;
>
> @@ -3165,6 +3169,12 @@ compare_constant (const tree t1, const t
> return compare_constant (VEC_DUPLICATE_CST_ELT (t1),
> VEC_DUPLICATE_CST_ELT (t2));
>
> + case VEC_SERIES_CST:
> + return (compare_constant (VEC_SERIES_CST_BASE (t1),
> + VEC_SERIES_CST_BASE (t2))
> + && compare_constant (VEC_SERIES_CST_STEP (t1),
> + VEC_SERIES_CST_STEP (t2)));
> +
> case CONSTRUCTOR:
> {
> vec<constructor_elt, va_gc> *v1, *v2;
> Index: gcc/fold-const.c
> ===================================================================
> --- gcc/fold-const.c 2017-11-06 12:20:31.087048745 +0000
> +++ gcc/fold-const.c 2017-11-06 12:21:29.328209826 +0000
> @@ -421,6 +421,10 @@ negate_expr_p (tree t)
> case VEC_DUPLICATE_CST:
> return negate_expr_p (VEC_DUPLICATE_CST_ELT (t));
>
> + case VEC_SERIES_CST:
> + return (negate_expr_p (VEC_SERIES_CST_BASE (t))
> + && negate_expr_p (VEC_SERIES_CST_STEP (t)));
> +
> case COMPLEX_EXPR:
> return negate_expr_p (TREE_OPERAND (t, 0))
> && negate_expr_p (TREE_OPERAND (t, 1));
> @@ -590,6 +594,17 @@ fold_negate_expr_1 (location_t loc, tree
> return build_vector_from_val (type, sub);
> }
>
> + case VEC_SERIES_CST:
> + {
> + tree neg_base = fold_negate_expr (loc, VEC_SERIES_CST_BASE (t));
> + if (!neg_base)
> + return NULL_TREE;
> + tree neg_step = fold_negate_expr (loc, VEC_SERIES_CST_STEP (t));
> + if (!neg_step)
> + return NULL_TREE;
> + return build_vec_series (type, neg_base, neg_step);
> + }
> +
> case COMPLEX_EXPR:
> if (negate_expr_p (t))
> return fold_build2_loc (loc, COMPLEX_EXPR, type,
> @@ -1131,6 +1146,28 @@ int_const_binop (enum tree_code code, co
> return int_const_binop_1 (code, arg1, arg2, 1);
> }
>
> +/* Return true if EXP is a VEC_DUPLICATE_CST or a VEC_SERIES_CST,
> + and if so express it as a linear series in *BASE_OUT and *STEP_OUT.
> + The step will be zero for VEC_DUPLICATE_CST. */
> +
> +static bool
> +vec_series_equivalent_p (const_tree exp, tree *base_out, tree *step_out)
> +{
> + if (TREE_CODE (exp) == VEC_SERIES_CST)
> + {
> + *base_out = VEC_SERIES_CST_BASE (exp);
> + *step_out = VEC_SERIES_CST_STEP (exp);
> + return true;
> + }
> + if (TREE_CODE (exp) == VEC_DUPLICATE_CST)
> + {
> + *base_out = VEC_DUPLICATE_CST_ELT (exp);
> + *step_out = build_zero_cst (TREE_TYPE (*base_out));
> + return true;
> + }
> + return false;
> +}
> +
> /* Combine two constants ARG1 and ARG2 under operation CODE to produce a new
> constant. We assume ARG1 and ARG2 have the same data type, or at least
> are the same kind of constant and the same machine mode. Return zero if
> @@ -1457,6 +1494,20 @@ const_binop (enum tree_code code, tree a
> return build_vector_from_val (TREE_TYPE (arg1), sub);
> }
>
> + tree base1, step1, base2, step2;
> + if ((code == PLUS_EXPR || code == MINUS_EXPR)
> + && vec_series_equivalent_p (arg1, &base1, &step1)
> + && vec_series_equivalent_p (arg2, &base2, &step2))
> + {
> + tree new_base = const_binop (code, base1, base2);
> + if (!new_base)
> + return NULL_TREE;
> + tree new_step = const_binop (code, step1, step2);
> + if (!new_step)
> + return NULL_TREE;
> + return build_vec_series (TREE_TYPE (arg1), new_base, new_step);
> + }
> +
> /* Shifts allow a scalar offset for a vector. */
> if (TREE_CODE (arg1) == VECTOR_CST
> && TREE_CODE (arg2) == INTEGER_CST)
> @@ -1505,6 +1556,12 @@ const_binop (enum tree_code code, tree t
> result as argument put those cases that need it here. */
> switch (code)
> {
> + case VEC_SERIES_EXPR:
> + if (CONSTANT_CLASS_P (arg1)
> + && CONSTANT_CLASS_P (arg2))
> + return build_vec_series (type, arg1, arg2);
> + return NULL_TREE;
> +
> case COMPLEX_EXPR:
> if ((TREE_CODE (arg1) == REAL_CST
> && TREE_CODE (arg2) == REAL_CST)
> @@ -3008,6 +3065,12 @@ operand_equal_p (const_tree arg0, const_
> return operand_equal_p (VEC_DUPLICATE_CST_ELT (arg0),
> VEC_DUPLICATE_CST_ELT (arg1), flags);
>
> + case VEC_SERIES_CST:
> + return (operand_equal_p (VEC_SERIES_CST_BASE (arg0),
> + VEC_SERIES_CST_BASE (arg1), flags)
> + && operand_equal_p (VEC_SERIES_CST_STEP (arg0),
> + VEC_SERIES_CST_STEP (arg1), flags));
> +
> case COMPLEX_CST:
> return (operand_equal_p (TREE_REALPART (arg0), TREE_REALPART (arg1),
> flags)
> @@ -12020,6 +12083,10 @@ fold_checksum_tree (const_tree expr, str
> case VEC_DUPLICATE_CST:
> fold_checksum_tree (VEC_DUPLICATE_CST_ELT (expr), ctx, ht);
> break;
> + case VEC_SERIES_CST:
> + fold_checksum_tree (VEC_SERIES_CST_BASE (expr), ctx, ht);
> + fold_checksum_tree (VEC_SERIES_CST_STEP (expr), ctx, ht);
> + break;
> default:
> break;
> }
> @@ -14528,6 +14595,54 @@ test_vec_duplicate_folding ()
> ASSERT_TRUE (operand_equal_p (dup5_expr, dup5_cst, 0));
> }
>
> +/* Verify folding of VEC_SERIES_CSTs and VEC_SERIES_EXPRs. */
> +
> +static void
> +test_vec_series_folding ()
> +{
> + scalar_int_mode int_mode = SCALAR_INT_TYPE_MODE (ssizetype);
> + machine_mode vec_mode = targetm.vectorize.preferred_simd_mode (int_mode);
> + unsigned int nunits = GET_MODE_NUNITS (vec_mode);
> + if (nunits == 1)
> + nunits = 4;
> +
> + tree type = build_vector_type (ssizetype, nunits);
> + tree s5_4 = build_vec_series (type, ssize_int (5), ssize_int (4));
> + tree s3_9 = build_vec_series (type, ssize_int (3), ssize_int (9));
> +
> + tree neg_s5_4_a = fold_unary (NEGATE_EXPR, type, s5_4);
> + tree neg_s5_4_b = build_vec_series (type, ssize_int (-5), ssize_int (-4));
> + ASSERT_TRUE (operand_equal_p (neg_s5_4_a, neg_s5_4_b, 0));
> +
> + tree s8_s13_a = fold_binary (PLUS_EXPR, type, s5_4, s3_9);
> + tree s8_s13_b = build_vec_series (type, ssize_int (8), ssize_int (13));
> + ASSERT_TRUE (operand_equal_p (s8_s13_a, s8_s13_b, 0));
> +
> + tree s2_m5_a = fold_binary (MINUS_EXPR, type, s5_4, s3_9);
> + tree s2_m5_b = build_vec_series (type, ssize_int (2), ssize_int (-5));
> + ASSERT_TRUE (operand_equal_p (s2_m5_a, s2_m5_b, 0));
> +
> + tree s11 = build_vector_from_val (type, ssize_int (11));
> + tree s16_4_a = fold_binary (PLUS_EXPR, type, s5_4, s11);
> + tree s16_4_b = fold_binary (PLUS_EXPR, type, s11, s5_4);
> + tree s16_4_c = build_vec_series (type, ssize_int (16), ssize_int (4));
> + ASSERT_TRUE (operand_equal_p (s16_4_a, s16_4_c, 0));
> + ASSERT_TRUE (operand_equal_p (s16_4_b, s16_4_c, 0));
> +
> + tree sm6_4_a = fold_binary (MINUS_EXPR, type, s5_4, s11);
> + tree sm6_4_b = build_vec_series (type, ssize_int (-6), ssize_int (4));
> + ASSERT_TRUE (operand_equal_p (sm6_4_a, sm6_4_b, 0));
> +
> + tree s6_m4_a = fold_binary (MINUS_EXPR, type, s11, s5_4);
> + tree s6_m4_b = build_vec_series (type, ssize_int (6), ssize_int (-4));
> + ASSERT_TRUE (operand_equal_p (s6_m4_a, s6_m4_b, 0));
> +
> + tree s5_4_expr = fold_binary (VEC_SERIES_EXPR, type,
> + ssize_int (5), ssize_int (4));
> + ASSERT_TRUE (operand_equal_p (s5_4_expr, s5_4, 0));
> + ASSERT_FALSE (operand_equal_p (s5_4_expr, s3_9, 0));
> +}
> +
> /* Run all of the selftests within this file. */
>
> void
> @@ -14536,6 +14651,7 @@ fold_const_c_tests ()
> test_arithmetic_folding ();
> test_vector_folding ();
> test_vec_duplicate_folding ();
> + test_vec_series_folding ();
> }
>
> } // namespace selftest
> Index: gcc/expmed.c
> ===================================================================
> --- gcc/expmed.c 2017-11-06 12:20:31.081564919 +0000
> +++ gcc/expmed.c 2017-11-06 12:21:29.325209826 +0000
> @@ -5252,6 +5252,13 @@ make_tree (tree type, rtx x)
> tree elt_tree = make_tree (TREE_TYPE (type), XEXP (op, 0));
> return build_vector_from_val (type, elt_tree);
> }
> + if (GET_CODE (op) == VEC_SERIES)
> + {
> + tree itype = TREE_TYPE (type);
> + tree base_tree = make_tree (itype, XEXP (op, 0));
> + tree step_tree = make_tree (itype, XEXP (op, 1));
> + return build_vec_series (type, base_tree, step_tree);
> + }
> return make_tree (type, op);
> }
>
> Index: gcc/gimple-pretty-print.c
> ===================================================================
> --- gcc/gimple-pretty-print.c 2017-11-06 12:20:31.087048745 +0000
> +++ gcc/gimple-pretty-print.c 2017-11-06 12:21:29.328209826 +0000
> @@ -431,6 +431,7 @@ dump_binary_rhs (pretty_printer *buffer,
> case VEC_PACK_FIX_TRUNC_EXPR:
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> + case VEC_SERIES_EXPR:
> for (p = get_tree_code_name (code); *p; p++)
> pp_character (buffer, TOUPPER (*p));
> pp_string (buffer, " <");
> Index: gcc/tree-inline.c
> ===================================================================
> --- gcc/tree-inline.c 2017-11-06 12:20:31.092532570 +0000
> +++ gcc/tree-inline.c 2017-11-06 12:21:29.332209826 +0000
> @@ -3931,6 +3931,7 @@ estimate_operator_cost (enum tree_code c
> case VEC_WIDEN_LSHIFT_HI_EXPR:
> case VEC_WIDEN_LSHIFT_LO_EXPR:
> case VEC_DUPLICATE_EXPR:
> + case VEC_SERIES_EXPR:
>
> return 1;
>
> Index: gcc/expr.c
> ===================================================================
> --- gcc/expr.c 2017-11-06 12:20:31.082478890 +0000
> +++ gcc/expr.c 2017-11-06 12:21:29.326209826 +0000
> @@ -7708,7 +7708,7 @@ expand_operands (tree exp0, tree exp1, r
>
>
> /* Expand constant vector element ELT, which has mode MODE. This is used
> - for members of VECTOR_CST and VEC_DUPLICATE_CST. */
> + for members of VECTOR_CST, VEC_DUPLICATE_CST and VEC_SERIES_CST. */
>
> static rtx
> const_vector_element (scalar_mode mode, const_tree elt)
> @@ -9591,6 +9591,10 @@ #define REDUCE_BIT_FIELD(expr) (reduce_b
> gcc_assert (target);
> return target;
>
> + case VEC_SERIES_EXPR:
> + expand_operands (treeop0, treeop1, NULL_RTX, &op0, &op1, modifier);
> + return expand_vec_series_expr (mode, op0, op1, target);
> +
> case BIT_INSERT_EXPR:
> {
> unsigned bitpos = tree_to_uhwi (treeop2);
> @@ -10029,6 +10033,13 @@ expand_expr_real_1 (tree exp, rtx target
> VEC_DUPLICATE_CST_ELT (exp));
> return gen_const_vec_duplicate (mode, op0);
>
> + case VEC_SERIES_CST:
> + op0 = const_vector_element (GET_MODE_INNER (mode),
> + VEC_SERIES_CST_BASE (exp));
> + op1 = const_vector_element (GET_MODE_INNER (mode),
> + VEC_SERIES_CST_STEP (exp));
> + return gen_const_vec_series (mode, op0, op1);
> +
> case CONST_DECL:
> if (modifier == EXPAND_WRITE)
> {
> Index: gcc/optabs.def
> ===================================================================
> --- gcc/optabs.def 2017-11-06 12:20:31.090704628 +0000
> +++ gcc/optabs.def 2017-11-06 12:21:29.331209826 +0000
> @@ -366,3 +366,4 @@ OPTAB_D (get_thread_pointer_optab, "get_
> OPTAB_D (set_thread_pointer_optab, "set_thread_pointer$I$a")
>
> OPTAB_DC (vec_duplicate_optab, "vec_duplicate$a", VEC_DUPLICATE)
> +OPTAB_DC (vec_series_optab, "vec_series$a", VEC_SERIES)
> Index: gcc/optabs.h
> ===================================================================
> --- gcc/optabs.h 2017-11-06 12:20:31.090704628 +0000
> +++ gcc/optabs.h 2017-11-06 12:21:29.331209826 +0000
> @@ -316,6 +316,9 @@ extern rtx expand_vec_cmp_expr (tree, tr
> /* Generate code for VEC_COND_EXPR. */
> extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
>
> +/* Generate code for VEC_SERIES_EXPR. */
> +extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
> +
> /* Generate code for MULT_HIGHPART_EXPR. */
> extern rtx expand_mult_highpart (machine_mode, rtx, rtx, rtx, bool);
>
> Index: gcc/optabs.c
> ===================================================================
> --- gcc/optabs.c 2017-11-06 12:20:31.090704628 +0000
> +++ gcc/optabs.c 2017-11-06 12:21:29.330209826 +0000
> @@ -5703,6 +5703,27 @@ expand_vec_cond_expr (tree vec_cond_type
> return ops[0].value;
> }
>
> +/* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
> + Use TARGET for the result if nonnull and convenient. */
> +
> +rtx
> +expand_vec_series_expr (machine_mode vmode, rtx op0, rtx op1, rtx target)
> +{
> + struct expand_operand ops[3];
> + enum insn_code icode;
> + machine_mode emode = GET_MODE_INNER (vmode);
> +
> + icode = direct_optab_handler (vec_series_optab, vmode);
> + gcc_assert (icode != CODE_FOR_nothing);
> +
> + create_output_operand (&ops[0], target, vmode);
> + create_input_operand (&ops[1], op0, emode);
> + create_input_operand (&ops[2], op1, emode);
> +
> + expand_insn (icode, 3, ops);
> + return ops[0].value;
> +}
> +
> /* Generate insns for a vector comparison into a mask. */
>
> rtx
> Index: gcc/optabs-tree.c
> ===================================================================
> --- gcc/optabs-tree.c 2017-11-06 12:20:31.089790657 +0000
> +++ gcc/optabs-tree.c 2017-11-06 12:21:29.330209826 +0000
> @@ -213,6 +213,9 @@ optab_for_tree_code (enum tree_code code
> case VEC_DUPLICATE_EXPR:
> return vec_duplicate_optab;
>
> + case VEC_SERIES_EXPR:
> + return vec_series_optab;
> +
> default:
> break;
> }
> Index: gcc/tree-cfg.c
> ===================================================================
> --- gcc/tree-cfg.c 2017-11-06 12:20:31.091618599 +0000
> +++ gcc/tree-cfg.c 2017-11-06 12:21:29.332209826 +0000
> @@ -4114,6 +4114,23 @@ verify_gimple_assign_binary (gassign *st
> /* Continue with generic binary expression handling. */
> break;
>
> + case VEC_SERIES_EXPR:
> + if (!useless_type_conversion_p (rhs1_type, rhs2_type))
> + {
> + error ("type mismatch in series expression");
> + debug_generic_expr (rhs1_type);
> + debug_generic_expr (rhs2_type);
> + return true;
> + }
> + if (TREE_CODE (lhs_type) != VECTOR_TYPE
> + || !useless_type_conversion_p (TREE_TYPE (lhs_type), rhs1_type))
> + {
> + error ("vector type expected in series expression");
> + debug_generic_expr (lhs_type);
> + return true;
> + }
> + return false;
> +
> default:
> gcc_unreachable ();
> }
> @@ -4480,6 +4497,7 @@ verify_gimple_assign_single (gassign *st
> case COMPLEX_CST:
> case VECTOR_CST:
> case VEC_DUPLICATE_CST:
> + case VEC_SERIES_CST:
> case STRING_CST:
> return res;
>
> Index: gcc/tree-vect-generic.c
> ===================================================================
> --- gcc/tree-vect-generic.c 2017-11-06 12:20:31.094360512 +0000
> +++ gcc/tree-vect-generic.c 2017-11-06 12:21:29.334209826 +0000
> @@ -1596,7 +1596,8 @@ expand_vector_operations_1 (gimple_stmt_
> if (rhs_class == GIMPLE_BINARY_RHS)
> rhs2 = gimple_assign_rhs2 (stmt);
>
> - if (TREE_CODE (type) != VECTOR_TYPE)
> + if (!VECTOR_TYPE_P (type)
> + || !VECTOR_TYPE_P (TREE_TYPE (rhs1)))
> return;
>
> /* If the vector operation is operating on all same vector elements
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [02/nn] Add more vec_duplicate simplifications
2017-10-25 16:35 ` Jeff Law
@ 2017-11-10 9:42 ` Christophe Lyon
0 siblings, 0 replies; 90+ messages in thread
From: Christophe Lyon @ 2017-11-10 9:42 UTC (permalink / raw)
To: Jeff Law; +Cc: gcc-patches, Richard Sandiford
On 25 October 2017 at 18:29, Jeff Law <law@redhat.com> wrote:
> On 10/23/2017 05:17 AM, Richard Sandiford wrote:
>> This patch adds a vec_duplicate_p helper that tests for constant
>> or non-constant vector duplicates. Together with the existing
>> const_vec_duplicate_p, this complements the gen_vec_duplicate
>> and gen_const_vec_duplicate added by a previous patch.
>>
>> The patch uses the new routines to add more rtx simplifications
>> involving vector duplicates. These mirror simplifications that
>> we already do for CONST_VECTOR broadcasts and are needed for
>> variable-length SVE, which uses:
>>
>> (const:M (vec_duplicate:M X))
>>
>> to represent constant broadcasts instead. The simplifications do
>> trigger on the testsuite for variable duplicates too, and in each
>> case I saw the change was an improvement. E.g.:
>>
> [ snip ]
>
>>
>> The best way of testing the new simplifications seemed to be
>> via selftests. The patch cribs part of David's patch here:
>> https://gcc.gnu.org/ml/gcc-patches/2016-07/msg00270.html .
> Cool. I really wish I had more time to promote David's work by adding
> selftests to various things. There's certainly cases where it's the
> most direct and useful way to test certain bits of lower level
> infrastructure we have. Glad to see you found it useful here.
>
>
>
>>
>>
>> 2017-10-23 Richard Sandiford <richard.sandiford@linaro.org>
>> David Malcolm <dmalcolm@redhat.com>
>> Alan Hayward <alan.hayward@arm.com>
>> David Sherwood <david.sherwood@arm.com>
>>
>> gcc/
>> * rtl.h (vec_duplicate_p): New function.
>> * selftest-rtl.c (assert_rtx_eq_at): New function.
>> * selftest-rtl.h (ASSERT_RTX_EQ): New macro.
>> (assert_rtx_eq_at): Declare.
>> * selftest.h (selftest::simplify_rtx_c_tests): Declare.
>> * selftest-run-tests.c (selftest::run_tests): Call it.
>> * simplify-rtx.c: Include selftest.h and selftest-rtl.h.
>> (simplify_unary_operation_1): Recursively handle vector duplicates.
>> (simplify_binary_operation_1): Likewise. Handle VEC_SELECTs of
>> vector duplicates.
>> (simplify_subreg): Handle subregs of vector duplicates.
>> (make_test_reg, test_vector_ops_duplicate, test_vector_ops)
>> (selftest::simplify_rtx_c_tests): New functions.
Hi Richard,
I've noticed that this patch (r254294) causes
FAIL: gcc.dg/vect/vect-126.c (internal compiler error)
FAIL: gcc.dg/vect/vect-126.c -flto -ffat-lto-objects (internal compiler error)
on arm* targets.
Sorry if this has been reported before, I've restarted validations
only recently,
so the process is still catching up.
gcc.log has this:
spawn -ignore SIGHUP
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/xgcc
-B/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-linux-gnueabihf/gcc3/gcc/
/gcc/testsuite/gcc.dg/vect/vect-126.c -fno-diagnostics-show-caret
-fdiagnostics-color=never -ffast-math -ftree-vectorize
-fno-vect-cost-model -fno-common -O2 -fdump-tree-vect-details -S -o
vect-126.s
during RTL pass: combine
/gcc/testsuite/gcc.dg/vect/vect-126.c: In function 'f5':
/gcc/testsuite/gcc.dg/vect/vect-126.c:53:1: internal compiler error:
in neon_valid_immediate, at config/arm/arm.c:11850
0xf3e6c8 neon_valid_immediate
/gcc/config/arm/arm.c:11850
0xf3ea9a neon_immediate_valid_for_move(rtx_def*, machine_mode, rtx_def**, int*)
/gcc/config/arm/arm.c:11968
0xf40a20 arm_rtx_costs_internal
/gcc/config/arm/arm.c:10695
0xf40a20 arm_rtx_costs
/gcc/config/arm/arm.c:10946
0xb113ef rtx_cost(rtx_def*, machine_mode, rtx_code, int, bool)
/gcc/rtlanal.c:4187
0xb1169f set_src_cost
/gcc/rtl.h:2700
0xb1169f pattern_cost(rtx_def*, bool)
/gcc/rtlanal.c:5315
0x128bb3b combine_validate_cost
/gcc/combine.c:893
0x128bb3b try_combine
/gcc/combine.c:4113
0x12923d5 combine_instructions
/gcc/combine.c:1452
0x12926ed rest_of_handle_combine
/gcc/combine.c:14795
0x12926ed execute
/gcc/combine.c:14840
Please submit a full bug report,
Thanks,
Christophe
> Thanks for the examples of how this affects various targets. Seems like
> it ought to be a consistent win when they trigger.
>
> jeff
^ permalink raw reply [flat|nested] 90+ messages in thread
* Re: [14/nn] Add helpers for shift count modes
2017-10-26 12:07 ` Richard Biener
@ 2017-11-20 21:04 ` Richard Sandiford
2017-11-21 15:00 ` Richard Biener
0 siblings, 1 reply; 90+ messages in thread
From: Richard Sandiford @ 2017-11-20 21:04 UTC (permalink / raw)
To: Richard Biener, Jeff Law; +Cc: GCC Patches
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, Oct 26, 2017 at 2:06 PM, Richard Biener
> <richard.guenther@gmail.com> wrote:
>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds a stub helper routine to provide the mode
>>> of a scalar shift amount, given the mode of the values
>>> being shifted.
>>>
>>> One long-standing problem has been to decide what this mode
>>> should be for arbitrary rtxes (as opposed to those directly
>>> tied to a target pattern). Is it the mode of the shifted
>>> elements? Is it word_mode? Or maybe QImode? Is it whatever
>>> the corresponding target pattern says? (In which case what
>>> should the mode be when the target doesn't have a pattern?)
>>>
>>> For now the patch picks word_mode, which should be safe on
>>> all targets but could perhaps become suboptimal if the helper
>>> routine is used more often than it is in this patch. As it
>>> stands the patch does not change the generated code.
>>>
>>> The patch also adds a helper function that constructs rtxes
>>> for constant shift amounts, again given the mode of the value
>>> being shifted. As well as helping with the SVE patches, this
>>> is one step towards allowing CONST_INTs to have a real mode.
>>
>> I think gen_shift_amount_mode is flawed and while encapsulating
>> constant shift amount RTX generation into a gen_int_shift_amount
>> looks good to me I'd rather have that ??? in this function (and
>> I'd use the mode of the RTX shifted, not word_mode...).
OK. I'd gone for word_mode because that's what expand_binop uses
for CONST_INTs:
op1_mode = (GET_MODE (op1) != VOIDmode
? as_a <scalar_int_mode> (GET_MODE (op1))
: word_mode);
But using the inner mode should be fine too. The patch below does that.
>> In the end it's up to insn recognizing to convert the op to the
>> expected mode and for generic RTL it's us that should decide
>> on the mode -- on GENERIC the shift amount has to be an
>> integer so why not simply use a mode that is large enough to
>> make the constant fit?
...but I can do that instead if you think it's better.
>> Just throwing in some comments here, RTL isn't my primary
>> expertise.
>
> To add a little bit - shift amounts is maybe the only(?) place
> where a modeless CONST_INT makes sense! So "fixing"
> that first sounds backwards.
But even here they have a mode conceptually, since out-of-range shift
amounts are target-defined rather than undefined. E.g. if the target
interprets the shift amount as unsigned, then for a shift amount
(const_int -1) it matters whether the mode is QImode (and so we're
shifting by 255) or HImode (and so we're shifting by 65535.
OK, so shifts by 65535 make no sense in practice, but *conceptually*... :-)
Jeff Law <law@redhat.com> writes:
> On 10/26/2017 06:06 AM, Richard Biener wrote:
>> On Mon, Oct 23, 2017 at 1:25 PM, Richard Sandiford
>> <richard.sandiford@linaro.org> wrote:
>>> This patch adds a stub helper routine to provide the mode
>>> of a scalar shift amount, given the mode of the values
>>> being shifted.
>>>
>>> One long-standing problem has been to decide what this mode
>>> should be for arbitrary rtxes (as opposed to those directly
>>> tied to a target pattern). Is it the mode of the shifted
>>> elements? Is it word_mode? Or maybe QImode? Is it whatever
>>> the corresponding target pattern says? (In which case what
>>> should the mode be when the target doesn't have a pattern?)
>>>
>>> For now the patch picks word_mode, which should be safe on
>>> all targets but could perhaps become suboptimal if the helper
>>> routine is used more often than it is in this patch. As it
>>> stands the patch does not change the generated code.
>>>
>>> The patch also adds a helper function that constructs rtxes
>>> for constant shift amounts, again given the mode of the value
>>> being shifted. As well as helping with the SVE patches, this
>>> is one step towards allowing CONST_INTs to have a real mode.
>>
>> I think gen_shift_amount_mode is flawed and while encapsulating
>> constant shift amount RTX generation into a gen_int_shift_amount
>> looks good to me I'd rather have that ??? in this function (and
>> I'd use the mode of the RTX shifted, not word_mode...).
>>
>> In the end it's up to insn recognizing to convert the op to the
>> expected mode and for generic RTL it's us that should decide
>> on the mode -- on GENERIC the shift amount has to be an
>> integer so why not simply use a mode that is large enough to
>> make the constant fit?
>>
>> Just throwing in some comments here, RTL isn't my primary
>> expertise.
> I wonder if encapsulation + a target hook to specify the mode would be
> better? We'd then have to argue over word_mode, vs QImode vs something
> else for the default, but at least we'd have a way for the target to
> specify the mode is generally best when working on shift counts.
>
> In the end I doubt there's a single definition that is overall better.
> Largely because I suspect there are times when the narrowest mode is
> best, or the mode of the operand being shifted.
>
> So thoughts on doing the encapsulation with a target hook to specify the
> desired mode? Does that get us what we need for SVE and does it provide
> us a path forward on this issue if we were to try to move towards
> CONST_INTs with modes?
I think it'd better to do that only if we have a use case, since
it's hard to predict what the best way of handling it is until then.
E.g. I'd still like to hold out the possibility of doing this automatically
from the .md file instead, if some kind of override ends up being necessary.
Like you say, we have to argue over the default either way, and I think
that's been the sticking point.
Thanks,
Richard
2017-11-20 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* emit-rtl.h (gen_int_shift_amount): Declare.
* emit-rtl.c (gen_int_shift_amount): New function.
* asan.c (asan_emit_stack_protection): Use gen_int_shift_amount
instead of GEN_INT.
* calls.c (shift_return_value): Likewise.
* cse.c (fold_rtx): Likewise.
* dse.c (find_shift_sequence): Likewise.
* expmed.c (init_expmed_one_mode, store_bit_field_1, expand_shift_1)
(expand_shift, expand_smod_pow2): Likewise.
* lower-subreg.c (shift_cost): Likewise.
* simplify-rtx.c (simplify_unary_operation_1): Likewise.
(simplify_binary_operation_1): Likewise.
* combine.c (try_combine, find_split_point, force_int_to_mode)
(simplify_shift_const_1, simplify_shift_const): Likewise.
(change_zero_ext): Likewise. Use simplify_gen_binary.
* optabs.c (expand_superword_shift, expand_doubleword_mult)
(expand_unop, expand_binop): Use gen_int_shift_amount instead
of GEN_INT.
(shift_amt_for_vec_perm_mask): Add a machine_mode argument.
Use gen_int_shift_amount instead of GEN_INT.
(expand_vec_perm): Update caller accordingly. Use
gen_int_shift_amount instead of GEN_INT.
Index: gcc/emit-rtl.h
===================================================================
--- gcc/emit-rtl.h 2017-11-20 20:37:41.918226976 +0000
+++ gcc/emit-rtl.h 2017-11-20 20:37:51.661320782 +0000
@@ -369,6 +369,7 @@ extern void set_reg_attrs_for_parm (rtx,
extern void set_reg_attrs_for_decl_rtl (tree t, rtx x);
extern void adjust_reg_mode (rtx, machine_mode);
extern int mem_expr_equal_p (const_tree, const_tree);
+extern rtx gen_int_shift_amount (machine_mode, HOST_WIDE_INT);
extern bool need_atomic_barrier_p (enum memmodel, bool);
Index: gcc/emit-rtl.c
===================================================================
--- gcc/emit-rtl.c 2017-11-20 20:37:41.918226976 +0000
+++ gcc/emit-rtl.c 2017-11-20 20:37:51.660320782 +0000
@@ -6507,6 +6507,24 @@ need_atomic_barrier_p (enum memmodel mod
}
}
+/* Return a constant shift amount for shifting a value of mode MODE
+ by VALUE bits. */
+
+rtx
+gen_int_shift_amount (machine_mode mode, HOST_WIDE_INT value)
+{
+ /* ??? Using the inner mode should be wide enough for all useful
+ cases (e.g. QImode usually has 8 shiftable bits, while a QImode
+ shift amount has a range of [-128, 127]). But in principle
+ a target could require target-dependent behaviour for a
+ shift whose shift amount is wider than the shifted value.
+ Perhaps this should be automatically derived from the .md
+ files instead, or perhaps have a target hook. */
+ scalar_int_mode shift_mode
+ = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
+ return gen_int_mode (value, shift_mode);
+}
+
/* Initialize fields of rtl_data related to stack alignment. */
void
Index: gcc/asan.c
===================================================================
--- gcc/asan.c 2017-11-20 20:37:41.918226976 +0000
+++ gcc/asan.c 2017-11-20 20:37:51.657320781 +0000
@@ -1386,7 +1386,7 @@ asan_emit_stack_protection (rtx base, rt
TREE_ASM_WRITTEN (id) = 1;
emit_move_insn (mem, expand_normal (build_fold_addr_expr (decl)));
shadow_base = expand_binop (Pmode, lshr_optab, base,
- GEN_INT (ASAN_SHADOW_SHIFT),
+ gen_int_shift_amount (Pmode, ASAN_SHADOW_SHIFT),
NULL_RTX, 1, OPTAB_DIRECT);
shadow_base
= plus_constant (Pmode, shadow_base,
Index: gcc/calls.c
===================================================================
--- gcc/calls.c 2017-11-20 20:37:41.918226976 +0000
+++ gcc/calls.c 2017-11-20 20:37:51.657320781 +0000
@@ -2742,15 +2742,17 @@ shift_return_value (machine_mode mode, b
HOST_WIDE_INT shift;
gcc_assert (REG_P (value) && HARD_REGISTER_P (value));
- shift = GET_MODE_BITSIZE (GET_MODE (value)) - GET_MODE_BITSIZE (mode);
+ machine_mode value_mode = GET_MODE (value);
+ shift = GET_MODE_BITSIZE (value_mode) - GET_MODE_BITSIZE (mode);
if (shift == 0)
return false;
/* Use ashr rather than lshr for right shifts. This is for the benefit
of the MIPS port, which requires SImode values to be sign-extended
when stored in 64-bit registers. */
- if (!force_expand_binop (GET_MODE (value), left_p ? ashl_optab : ashr_optab,
- value, GEN_INT (shift), value, 1, OPTAB_WIDEN))
+ if (!force_expand_binop (value_mode, left_p ? ashl_optab : ashr_optab,
+ value, gen_int_shift_amount (value_mode, shift),
+ value, 1, OPTAB_WIDEN))
gcc_unreachable ();
return true;
}
Index: gcc/cse.c
===================================================================
--- gcc/cse.c 2017-11-20 20:37:41.918226976 +0000
+++ gcc/cse.c 2017-11-20 20:37:51.660320782 +0000
@@ -3611,9 +3611,9 @@ fold_rtx (rtx x, rtx_insn *insn)
|| INTVAL (const_arg1) < 0))
{
if (SHIFT_COUNT_TRUNCATED)
- canon_const_arg1 = GEN_INT (INTVAL (const_arg1)
- & (GET_MODE_UNIT_BITSIZE (mode)
- - 1));
+ canon_const_arg1 = gen_int_shift_amount
+ (mode, (INTVAL (const_arg1)
+ & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
else
break;
}
@@ -3660,9 +3660,9 @@ fold_rtx (rtx x, rtx_insn *insn)
|| INTVAL (inner_const) < 0))
{
if (SHIFT_COUNT_TRUNCATED)
- inner_const = GEN_INT (INTVAL (inner_const)
- & (GET_MODE_UNIT_BITSIZE (mode)
- - 1));
+ inner_const = gen_int_shift_amount
+ (mode, (INTVAL (inner_const)
+ & (GET_MODE_UNIT_BITSIZE (mode) - 1)));
else
break;
}
@@ -3692,7 +3692,8 @@ fold_rtx (rtx x, rtx_insn *insn)
/* As an exception, we can turn an ASHIFTRT of this
form into a shift of the number of bits - 1. */
if (code == ASHIFTRT)
- new_const = GEN_INT (GET_MODE_UNIT_BITSIZE (mode) - 1);
+ new_const = gen_int_shift_amount
+ (mode, GET_MODE_UNIT_BITSIZE (mode) - 1);
else if (!side_effects_p (XEXP (y, 0)))
return CONST0_RTX (mode);
else
Index: gcc/dse.c
===================================================================
--- gcc/dse.c 2017-11-20 20:37:41.918226976 +0000
+++ gcc/dse.c 2017-11-20 20:37:51.660320782 +0000
@@ -1605,8 +1605,9 @@ find_shift_sequence (int access_size,
store_mode, byte);
if (ret && CONSTANT_P (ret))
{
+ rtx shift_rtx = gen_int_shift_amount (new_mode, shift);
ret = simplify_const_binary_operation (LSHIFTRT, new_mode,
- ret, GEN_INT (shift));
+ ret, shift_rtx);
if (ret && CONSTANT_P (ret))
{
byte = subreg_lowpart_offset (read_mode, new_mode);
@@ -1642,7 +1643,8 @@ find_shift_sequence (int access_size,
of one dsp where the cost of these two was not the same. But
this really is a rare case anyway. */
target = expand_binop (new_mode, lshr_optab, new_reg,
- GEN_INT (shift), new_reg, 1, OPTAB_DIRECT);
+ gen_int_shift_amount (new_mode, shift),
+ new_reg, 1, OPTAB_DIRECT);
shift_seq = get_insns ();
end_sequence ();
Index: gcc/expmed.c
===================================================================
--- gcc/expmed.c 2017-11-20 20:37:41.918226976 +0000
+++ gcc/expmed.c 2017-11-20 20:37:51.661320782 +0000
@@ -222,7 +222,8 @@ init_expmed_one_mode (struct init_expmed
PUT_MODE (all->zext, wider_mode);
PUT_MODE (all->wide_mult, wider_mode);
PUT_MODE (all->wide_lshr, wider_mode);
- XEXP (all->wide_lshr, 1) = GEN_INT (mode_bitsize);
+ XEXP (all->wide_lshr, 1)
+ = gen_int_shift_amount (wider_mode, mode_bitsize);
set_mul_widen_cost (speed, wider_mode,
set_src_cost (all->wide_mult, wider_mode, speed));
@@ -909,12 +910,14 @@ store_bit_field_1 (rtx str_rtx, unsigned
to make sure that for big-endian machines the higher order
bits are used. */
if (new_bitsize < BITS_PER_WORD && BYTES_BIG_ENDIAN && !backwards)
- value_word = simplify_expand_binop (word_mode, lshr_optab,
- value_word,
- GEN_INT (BITS_PER_WORD
- - new_bitsize),
- NULL_RTX, true,
- OPTAB_LIB_WIDEN);
+ {
+ int shift = BITS_PER_WORD - new_bitsize;
+ rtx shift_rtx = gen_int_shift_amount (word_mode, shift);
+ value_word = simplify_expand_binop (word_mode, lshr_optab,
+ value_word, shift_rtx,
+ NULL_RTX, true,
+ OPTAB_LIB_WIDEN);
+ }
if (!store_bit_field_1 (op0, new_bitsize,
bitnum + bit_offset,
@@ -2365,8 +2368,9 @@ expand_shift_1 (enum tree_code code, mac
if (CONST_INT_P (op1)
&& ((unsigned HOST_WIDE_INT) INTVAL (op1) >=
(unsigned HOST_WIDE_INT) GET_MODE_BITSIZE (scalar_mode)))
- op1 = GEN_INT ((unsigned HOST_WIDE_INT) INTVAL (op1)
- % GET_MODE_BITSIZE (scalar_mode));