* [00/nn] AArch64 patches preparing for SVE
@ 2017-10-27 13:22 Richard Sandiford
2017-10-27 13:23 ` [01/nn] [AArch64] Generate permute patterns using rtx builders Richard Sandiford
` (11 more replies)
0 siblings, 12 replies; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:22 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This series of patches are the AArch64 changes needed before SVE
support goes in. It's based on top of:
https://gcc.gnu.org/ml/gcc-patches/2017-09/msg01163.html
and Jeff's latest stach-clash protection changes.
Series tested on aarch64-linux-gnu.
Richard
^ permalink raw reply [flat|nested] 29+ messages in thread
* [01/nn] [AArch64] Generate permute patterns using rtx builders
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
@ 2017-10-27 13:23 ` Richard Sandiford
2017-10-31 18:02 ` James Greenhalgh
2017-10-27 13:25 ` [02/nn] [AArch64] Move code around Richard Sandiford
` (10 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:23 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch replaces switch statements that call specific generator
functions with code that constructs the rtl pattern directly.
This seemed to scale better to SVE and also seems less error-prone.
As a side-effect, the patch fixes the REV handling for diff==1,
vmode==E_V4HFmode and adds missing support for diff==3,
vmode==E_V4HFmode.
To compensate for the lack of switches that check for specific modes,
the patch makes aarch64_expand_vec_perm_const_1 reject permutes on
single-element vectors (specifically V1DImode).
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp)
(aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev)
(aarch64_evpc_dup): Generate rtl direcly, rather than using
named expanders.
(aarch64_expand_vec_perm_const_1): Explicitly check for permutes
of a single element.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:10:08.337833963 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:10:14.622293803 +0100
@@ -13475,7 +13475,6 @@ aarch64_evpc_trn (struct expand_vec_perm
{
unsigned int i, odd, mask, nelt = d->perm.length ();
rtx out, in0, in1, x;
- rtx (*gen) (rtx, rtx, rtx);
machine_mode vmode = d->vmode;
if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13512,48 +13511,8 @@ aarch64_evpc_trn (struct expand_vec_perm
}
out = d->target;
- if (odd)
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_trn2v16qi; break;
- case E_V8QImode: gen = gen_aarch64_trn2v8qi; break;
- case E_V8HImode: gen = gen_aarch64_trn2v8hi; break;
- case E_V4HImode: gen = gen_aarch64_trn2v4hi; break;
- case E_V4SImode: gen = gen_aarch64_trn2v4si; break;
- case E_V2SImode: gen = gen_aarch64_trn2v2si; break;
- case E_V2DImode: gen = gen_aarch64_trn2v2di; break;
- case E_V4HFmode: gen = gen_aarch64_trn2v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_trn2v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_trn2v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_trn2v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_trn2v2df; break;
- default:
- return false;
- }
- }
- else
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_trn1v16qi; break;
- case E_V8QImode: gen = gen_aarch64_trn1v8qi; break;
- case E_V8HImode: gen = gen_aarch64_trn1v8hi; break;
- case E_V4HImode: gen = gen_aarch64_trn1v4hi; break;
- case E_V4SImode: gen = gen_aarch64_trn1v4si; break;
- case E_V2SImode: gen = gen_aarch64_trn1v2si; break;
- case E_V2DImode: gen = gen_aarch64_trn1v2di; break;
- case E_V4HFmode: gen = gen_aarch64_trn1v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_trn1v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_trn1v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_trn1v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_trn1v2df; break;
- default:
- return false;
- }
- }
-
- emit_insn (gen (out, in0, in1));
+ emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ odd ? UNSPEC_TRN2 : UNSPEC_TRN1));
return true;
}
@@ -13563,7 +13522,6 @@ aarch64_evpc_uzp (struct expand_vec_perm
{
unsigned int i, odd, mask, nelt = d->perm.length ();
rtx out, in0, in1, x;
- rtx (*gen) (rtx, rtx, rtx);
machine_mode vmode = d->vmode;
if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13599,48 +13557,8 @@ aarch64_evpc_uzp (struct expand_vec_perm
}
out = d->target;
- if (odd)
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_uzp2v16qi; break;
- case E_V8QImode: gen = gen_aarch64_uzp2v8qi; break;
- case E_V8HImode: gen = gen_aarch64_uzp2v8hi; break;
- case E_V4HImode: gen = gen_aarch64_uzp2v4hi; break;
- case E_V4SImode: gen = gen_aarch64_uzp2v4si; break;
- case E_V2SImode: gen = gen_aarch64_uzp2v2si; break;
- case E_V2DImode: gen = gen_aarch64_uzp2v2di; break;
- case E_V4HFmode: gen = gen_aarch64_uzp2v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_uzp2v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_uzp2v2df; break;
- default:
- return false;
- }
- }
- else
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_uzp1v16qi; break;
- case E_V8QImode: gen = gen_aarch64_uzp1v8qi; break;
- case E_V8HImode: gen = gen_aarch64_uzp1v8hi; break;
- case E_V4HImode: gen = gen_aarch64_uzp1v4hi; break;
- case E_V4SImode: gen = gen_aarch64_uzp1v4si; break;
- case E_V2SImode: gen = gen_aarch64_uzp1v2si; break;
- case E_V2DImode: gen = gen_aarch64_uzp1v2di; break;
- case E_V4HFmode: gen = gen_aarch64_uzp1v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_uzp1v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_uzp1v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_uzp1v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_uzp1v2df; break;
- default:
- return false;
- }
- }
-
- emit_insn (gen (out, in0, in1));
+ emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ odd ? UNSPEC_UZP2 : UNSPEC_UZP1));
return true;
}
@@ -13650,7 +13568,6 @@ aarch64_evpc_zip (struct expand_vec_perm
{
unsigned int i, high, mask, nelt = d->perm.length ();
rtx out, in0, in1, x;
- rtx (*gen) (rtx, rtx, rtx);
machine_mode vmode = d->vmode;
if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13691,48 +13608,8 @@ aarch64_evpc_zip (struct expand_vec_perm
}
out = d->target;
- if (high)
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_zip2v16qi; break;
- case E_V8QImode: gen = gen_aarch64_zip2v8qi; break;
- case E_V8HImode: gen = gen_aarch64_zip2v8hi; break;
- case E_V4HImode: gen = gen_aarch64_zip2v4hi; break;
- case E_V4SImode: gen = gen_aarch64_zip2v4si; break;
- case E_V2SImode: gen = gen_aarch64_zip2v2si; break;
- case E_V2DImode: gen = gen_aarch64_zip2v2di; break;
- case E_V4HFmode: gen = gen_aarch64_zip2v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_zip2v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_zip2v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_zip2v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_zip2v2df; break;
- default:
- return false;
- }
- }
- else
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_zip1v16qi; break;
- case E_V8QImode: gen = gen_aarch64_zip1v8qi; break;
- case E_V8HImode: gen = gen_aarch64_zip1v8hi; break;
- case E_V4HImode: gen = gen_aarch64_zip1v4hi; break;
- case E_V4SImode: gen = gen_aarch64_zip1v4si; break;
- case E_V2SImode: gen = gen_aarch64_zip1v2si; break;
- case E_V2DImode: gen = gen_aarch64_zip1v2di; break;
- case E_V4HFmode: gen = gen_aarch64_zip1v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_zip1v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_zip1v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_zip1v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_zip1v2df; break;
- default:
- return false;
- }
- }
-
- emit_insn (gen (out, in0, in1));
+ emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ high ? UNSPEC_ZIP2 : UNSPEC_ZIP1));
return true;
}
@@ -13742,7 +13619,6 @@ aarch64_evpc_zip (struct expand_vec_perm
aarch64_evpc_ext (struct expand_vec_perm_d *d)
{
unsigned int i, nelt = d->perm.length ();
- rtx (*gen) (rtx, rtx, rtx, rtx);
rtx offset;
unsigned int location = d->perm[0]; /* Always < nelt. */
@@ -13760,24 +13636,6 @@ aarch64_evpc_ext (struct expand_vec_perm
return false;
}
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_extv16qi; break;
- case E_V8QImode: gen = gen_aarch64_extv8qi; break;
- case E_V4HImode: gen = gen_aarch64_extv4hi; break;
- case E_V8HImode: gen = gen_aarch64_extv8hi; break;
- case E_V2SImode: gen = gen_aarch64_extv2si; break;
- case E_V4SImode: gen = gen_aarch64_extv4si; break;
- case E_V4HFmode: gen = gen_aarch64_extv4hf; break;
- case E_V8HFmode: gen = gen_aarch64_extv8hf; break;
- case E_V2SFmode: gen = gen_aarch64_extv2sf; break;
- case E_V4SFmode: gen = gen_aarch64_extv4sf; break;
- case E_V2DImode: gen = gen_aarch64_extv2di; break;
- case E_V2DFmode: gen = gen_aarch64_extv2df; break;
- default:
- return false;
- }
-
/* Success! */
if (d->testing_p)
return true;
@@ -13796,7 +13654,10 @@ aarch64_evpc_ext (struct expand_vec_perm
}
offset = GEN_INT (location);
- emit_insn (gen (d->target, d->op0, d->op1, offset));
+ emit_set_insn (d->target,
+ gen_rtx_UNSPEC (d->vmode,
+ gen_rtvec (3, d->op0, d->op1, offset),
+ UNSPEC_EXT));
return true;
}
@@ -13805,55 +13666,21 @@ aarch64_evpc_ext (struct expand_vec_perm
static bool
aarch64_evpc_rev (struct expand_vec_perm_d *d)
{
- unsigned int i, j, diff, nelt = d->perm.length ();
- rtx (*gen) (rtx, rtx);
+ unsigned int i, j, diff, size, unspec, nelt = d->perm.length ();
if (!d->one_vector_p)
return false;
diff = d->perm[0];
- switch (diff)
- {
- case 7:
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_rev64v16qi; break;
- case E_V8QImode: gen = gen_aarch64_rev64v8qi; break;
- default:
- return false;
- }
- break;
- case 3:
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_rev32v16qi; break;
- case E_V8QImode: gen = gen_aarch64_rev32v8qi; break;
- case E_V8HImode: gen = gen_aarch64_rev64v8hi; break;
- case E_V4HImode: gen = gen_aarch64_rev64v4hi; break;
- default:
- return false;
- }
- break;
- case 1:
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_rev16v16qi; break;
- case E_V8QImode: gen = gen_aarch64_rev16v8qi; break;
- case E_V8HImode: gen = gen_aarch64_rev32v8hi; break;
- case E_V4HImode: gen = gen_aarch64_rev32v4hi; break;
- case E_V4SImode: gen = gen_aarch64_rev64v4si; break;
- case E_V2SImode: gen = gen_aarch64_rev64v2si; break;
- case E_V4SFmode: gen = gen_aarch64_rev64v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_rev64v2sf; break;
- case E_V8HFmode: gen = gen_aarch64_rev64v8hf; break;
- case E_V4HFmode: gen = gen_aarch64_rev64v4hf; break;
- default:
- return false;
- }
- break;
- default:
- return false;
- }
+ size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
+ if (size == 8)
+ unspec = UNSPEC_REV64;
+ else if (size == 4)
+ unspec = UNSPEC_REV32;
+ else if (size == 2)
+ unspec = UNSPEC_REV16;
+ else
+ return false;
for (i = 0; i < nelt ; i += diff + 1)
for (j = 0; j <= diff; j += 1)
@@ -13872,14 +13699,14 @@ aarch64_evpc_rev (struct expand_vec_perm
if (d->testing_p)
return true;
- emit_insn (gen (d->target, d->op0));
+ emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0),
+ unspec));
return true;
}
static bool
aarch64_evpc_dup (struct expand_vec_perm_d *d)
{
- rtx (*gen) (rtx, rtx, rtx);
rtx out = d->target;
rtx in0;
machine_mode vmode = d->vmode;
@@ -13901,25 +13728,9 @@ aarch64_evpc_dup (struct expand_vec_perm
in0 = d->op0;
lane = GEN_INT (elt); /* The pattern corrects for big-endian. */
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_dup_lanev16qi; break;
- case E_V8QImode: gen = gen_aarch64_dup_lanev8qi; break;
- case E_V8HImode: gen = gen_aarch64_dup_lanev8hi; break;
- case E_V4HImode: gen = gen_aarch64_dup_lanev4hi; break;
- case E_V4SImode: gen = gen_aarch64_dup_lanev4si; break;
- case E_V2SImode: gen = gen_aarch64_dup_lanev2si; break;
- case E_V2DImode: gen = gen_aarch64_dup_lanev2di; break;
- case E_V8HFmode: gen = gen_aarch64_dup_lanev8hf; break;
- case E_V4HFmode: gen = gen_aarch64_dup_lanev4hf; break;
- case E_V4SFmode: gen = gen_aarch64_dup_lanev4sf; break;
- case E_V2SFmode: gen = gen_aarch64_dup_lanev2sf; break;
- case E_V2DFmode: gen = gen_aarch64_dup_lanev2df; break;
- default:
- return false;
- }
-
- emit_insn (gen (out, in0, lane));
+ rtx parallel = gen_rtx_PARALLEL (vmode, gen_rtvec (1, lane));
+ rtx select = gen_rtx_VEC_SELECT (GET_MODE_INNER (vmode), in0, parallel);
+ emit_set_insn (out, gen_rtx_VEC_DUPLICATE (vmode, select));
return true;
}
@@ -13972,7 +13783,7 @@ aarch64_expand_vec_perm_const_1 (struct
std::swap (d->op0, d->op1);
}
- if (TARGET_SIMD)
+ if (TARGET_SIMD && nelt > 1)
{
if (aarch64_evpc_rev (d))
return true;
^ permalink raw reply [flat|nested] 29+ messages in thread
* [02/nn] [AArch64] Move code around
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
2017-10-27 13:23 ` [01/nn] [AArch64] Generate permute patterns using rtx builders Richard Sandiford
@ 2017-10-27 13:25 ` Richard Sandiford
2017-10-31 18:03 ` James Greenhalgh
2017-10-27 13:26 ` [03/nn] [AArch64] Rework interface to add constant/offset routines Richard Sandiford
` (9 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:25 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch simply moves code around, in order to make the later
patches easier to read, and to avoid forward declarations.
It doesn't add the missing function comments because the interfaces
will change in a later patch.
2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_add_constant_internal)
(aarch64_add_constant, aarch64_add_sp, aarch64_sub_sp): Move
earlier in file.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:10:14.622293803 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:10:17.740863052 +0100
@@ -1966,6 +1966,87 @@ aarch64_internal_mov_immediate (rtx dest
return num_insns;
}
+/* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to hold a
+ temporary value if necessary. FRAME_RELATED_P should be true if
+ the RTX_FRAME_RELATED flag should be set and CFA adjustments added
+ to the generated instructions. If SCRATCHREG is known to hold
+ abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
+ immediate again.
+
+ Since this function may be used to adjust the stack pointer, we must
+ ensure that it cannot cause transient stack deallocation (for example
+ by first incrementing SP and then decrementing when adjusting by a
+ large immediate). */
+
+static void
+aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
+ int scratchreg, HOST_WIDE_INT delta,
+ bool frame_related_p, bool emit_move_imm)
+{
+ HOST_WIDE_INT mdelta = abs_hwi (delta);
+ rtx this_rtx = gen_rtx_REG (mode, regnum);
+ rtx_insn *insn;
+
+ if (!mdelta)
+ return;
+
+ /* Single instruction adjustment. */
+ if (aarch64_uimm12_shift (mdelta))
+ {
+ insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ return;
+ }
+
+ /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
+ Only do this if mdelta is not a 16-bit move as adjusting using a move
+ is better. */
+ if (mdelta < 0x1000000 && !aarch64_move_imm (mdelta, mode))
+ {
+ HOST_WIDE_INT low_off = mdelta & 0xfff;
+
+ low_off = delta < 0 ? -low_off : low_off;
+ insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ return;
+ }
+
+ /* Emit a move immediate if required and an addition/subtraction. */
+ rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
+ if (emit_move_imm)
+ aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
+ insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
+ : gen_add2_insn (this_rtx, scratch_rtx));
+ if (frame_related_p)
+ {
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ rtx adj = plus_constant (mode, this_rtx, delta);
+ add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+ }
+}
+
+static inline void
+aarch64_add_constant (scalar_int_mode mode, int regnum, int scratchreg,
+ HOST_WIDE_INT delta)
+{
+ aarch64_add_constant_internal (mode, regnum, scratchreg, delta, false, true);
+}
+
+static inline void
+aarch64_add_sp (int scratchreg, HOST_WIDE_INT delta, bool emit_move_imm)
+{
+ aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, delta,
+ true, emit_move_imm);
+}
+
+static inline void
+aarch64_sub_sp (int scratchreg, HOST_WIDE_INT delta, bool frame_related_p)
+{
+ aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, -delta,
+ frame_related_p, true);
+}
void
aarch64_expand_mov_immediate (rtx dest, rtx imm)
@@ -2077,88 +2158,6 @@ aarch64_expand_mov_immediate (rtx dest,
as_a <scalar_int_mode> (mode));
}
-/* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to hold a
- temporary value if necessary. FRAME_RELATED_P should be true if
- the RTX_FRAME_RELATED flag should be set and CFA adjustments added
- to the generated instructions. If SCRATCHREG is known to hold
- abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
- immediate again.
-
- Since this function may be used to adjust the stack pointer, we must
- ensure that it cannot cause transient stack deallocation (for example
- by first incrementing SP and then decrementing when adjusting by a
- large immediate). */
-
-static void
-aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
- int scratchreg, HOST_WIDE_INT delta,
- bool frame_related_p, bool emit_move_imm)
-{
- HOST_WIDE_INT mdelta = abs_hwi (delta);
- rtx this_rtx = gen_rtx_REG (mode, regnum);
- rtx_insn *insn;
-
- if (!mdelta)
- return;
-
- /* Single instruction adjustment. */
- if (aarch64_uimm12_shift (mdelta))
- {
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
- RTX_FRAME_RELATED_P (insn) = frame_related_p;
- return;
- }
-
- /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
- Only do this if mdelta is not a 16-bit move as adjusting using a move
- is better. */
- if (mdelta < 0x1000000 && !aarch64_move_imm (mdelta, mode))
- {
- HOST_WIDE_INT low_off = mdelta & 0xfff;
-
- low_off = delta < 0 ? -low_off : low_off;
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
- RTX_FRAME_RELATED_P (insn) = frame_related_p;
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
- RTX_FRAME_RELATED_P (insn) = frame_related_p;
- return;
- }
-
- /* Emit a move immediate if required and an addition/subtraction. */
- rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
- if (emit_move_imm)
- aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
- insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
- : gen_add2_insn (this_rtx, scratch_rtx));
- if (frame_related_p)
- {
- RTX_FRAME_RELATED_P (insn) = frame_related_p;
- rtx adj = plus_constant (mode, this_rtx, delta);
- add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
- }
-}
-
-static inline void
-aarch64_add_constant (scalar_int_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta)
-{
- aarch64_add_constant_internal (mode, regnum, scratchreg, delta, false, true);
-}
-
-static inline void
-aarch64_add_sp (int scratchreg, HOST_WIDE_INT delta, bool emit_move_imm)
-{
- aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, delta,
- true, emit_move_imm);
-}
-
-static inline void
-aarch64_sub_sp (int scratchreg, HOST_WIDE_INT delta, bool frame_related_p)
-{
- aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, -delta,
- frame_related_p, true);
-}
-
static bool
aarch64_function_ok_for_sibcall (tree decl ATTRIBUTE_UNUSED,
tree exp ATTRIBUTE_UNUSED)
^ permalink raw reply [flat|nested] 29+ messages in thread
* [03/nn] [AArch64] Rework interface to add constant/offset routines
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
2017-10-27 13:23 ` [01/nn] [AArch64] Generate permute patterns using rtx builders Richard Sandiford
2017-10-27 13:25 ` [02/nn] [AArch64] Move code around Richard Sandiford
@ 2017-10-27 13:26 ` Richard Sandiford
2017-10-30 11:03 ` Richard Sandiford
2017-10-27 13:27 ` [04/nn] [AArch64] Rename the internal "Upl" constraint Richard Sandiford
` (8 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:26 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
The port had aarch64_add_offset and aarch64_add_constant routines
that did similar things. This patch replaces them with an expanded
version of aarch64_add_offset that takes separate source and
destination registers. The new routine also takes a poly_int64 offset
instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT
case to aarch64_add_offset_1, which is basically a repurposed
aarch64_add_constant_internal. The SVE patch will put the handling
of VL-based constants in aarch64_add_offset, while still using
aarch64_add_offset_1 for the constant part.
The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0
as well as temp1 once SVE is added.
A side-effect of the patch is that we now generate:
mov x29, sp
instead of:
add x29, sp, 0
in the pr70044.c test.
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
x exists before using it.
(aarch64_add_constant_internal): Rename to...
(aarch64_add_offset_1): ...this. Replace regnum with separate
src and dest rtxes. Handle the case in which they're different,
including when the offset is zero. Replace scratchreg with an rtx.
Use 2 additions if there is no spare register into which we can
move a 16-bit constant.
(aarch64_add_constant): Delete.
(aarch64_add_offset): Replace reg with separate src and dest
rtxes. Take a poly_int64 offset instead of a HOST_WIDE_INT.
Use aarch64_add_offset_1.
(aarch64_add_sp, aarch64_sub_sp): Take the scratch register as
an rtx rather than an int. Take the delta as a poly_int64
rather than a HOST_WIDE_INT. Use aarch64_add_offset.
(aarch64_expand_mov_immediate): Update uses of aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Take the scratch register
as an rtx rather than an int. Use Pmode rather than word_mode
in the loop code. Update calls to aarch64_sub_sp.
(aarch64_expand_prologue): Update calls to aarch64_sub_sp,
aarch64_allocate_and_probe_stack_space and aarch64_add_offset.
(aarch64_expand_epilogue): Update calls to aarch64_add_offset
and aarch64_add_sp.
(aarch64_output_mi_thunk): Use aarch64_add_offset rather than
aarch64_add_constant.
gcc/testsuite/
* gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:10:17.740863052 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:11:14.425034427 +0100
@@ -1818,30 +1818,13 @@ aarch64_force_temporary (machine_mode mo
return force_reg (mode, value);
else
{
- x = aarch64_emit_move (x, value);
+ gcc_assert (x);
+ aarch64_emit_move (x, value);
return x;
}
}
-static rtx
-aarch64_add_offset (scalar_int_mode mode, rtx temp, rtx reg,
- HOST_WIDE_INT offset)
-{
- if (!aarch64_plus_immediate (GEN_INT (offset), mode))
- {
- rtx high;
- /* Load the full offset into a register. This
- might be improvable in the future. */
- high = GEN_INT (offset);
- offset = 0;
- high = aarch64_force_temporary (mode, temp, high);
- reg = aarch64_force_temporary (mode, temp,
- gen_rtx_PLUS (mode, high, reg));
- }
- return plus_constant (mode, reg, offset);
-}
-
static int
aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
scalar_int_mode mode)
@@ -1966,86 +1949,123 @@ aarch64_internal_mov_immediate (rtx dest
return num_insns;
}
-/* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to hold a
- temporary value if necessary. FRAME_RELATED_P should be true if
- the RTX_FRAME_RELATED flag should be set and CFA adjustments added
- to the generated instructions. If SCRATCHREG is known to hold
- abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
- immediate again.
-
- Since this function may be used to adjust the stack pointer, we must
- ensure that it cannot cause transient stack deallocation (for example
- by first incrementing SP and then decrementing when adjusting by a
- large immediate). */
+/* A subroutine of aarch64_add_offset that handles the case in which
+ OFFSET is known at compile time. The arguments are otherwise the same. */
static void
-aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
- int scratchreg, HOST_WIDE_INT delta,
- bool frame_related_p, bool emit_move_imm)
+aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
+ rtx src, HOST_WIDE_INT offset, rtx temp1,
+ bool frame_related_p, bool emit_move_imm)
{
- HOST_WIDE_INT mdelta = abs_hwi (delta);
- rtx this_rtx = gen_rtx_REG (mode, regnum);
+ gcc_assert (emit_move_imm || temp1 != NULL_RTX);
+ gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+
+ HOST_WIDE_INT moffset = abs_hwi (offset);
rtx_insn *insn;
- if (!mdelta)
- return;
+ if (!moffset)
+ {
+ if (!rtx_equal_p (dest, src))
+ {
+ insn = emit_insn (gen_rtx_SET (dest, src));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ }
+ return;
+ }
/* Single instruction adjustment. */
- if (aarch64_uimm12_shift (mdelta))
+ if (aarch64_uimm12_shift (moffset))
{
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+ insn = emit_insn (gen_add3_insn (dest, src, GEN_INT (offset)));
RTX_FRAME_RELATED_P (insn) = frame_related_p;
return;
}
- /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
- Only do this if mdelta is not a 16-bit move as adjusting using a move
- is better. */
- if (mdelta < 0x1000000 && !aarch64_move_imm (mdelta, mode))
+ /* Emit 2 additions/subtractions if the adjustment is less than 24 bits
+ and either:
+
+ a) the offset cannot be loaded by a 16-bit move or
+ b) there is no spare register into which we can move it. */
+ if (moffset < 0x1000000
+ && ((!temp1 && !can_create_pseudo_p ())
+ || !aarch64_move_imm (moffset, mode)))
{
- HOST_WIDE_INT low_off = mdelta & 0xfff;
+ HOST_WIDE_INT low_off = moffset & 0xfff;
- low_off = delta < 0 ? -low_off : low_off;
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+ low_off = offset < 0 ? -low_off : low_off;
+ insn = emit_insn (gen_add3_insn (dest, src, GEN_INT (low_off)));
RTX_FRAME_RELATED_P (insn) = frame_related_p;
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+ insn = emit_insn (gen_add2_insn (dest, GEN_INT (offset - low_off)));
RTX_FRAME_RELATED_P (insn) = frame_related_p;
return;
}
/* Emit a move immediate if required and an addition/subtraction. */
- rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
if (emit_move_imm)
- aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
- insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
- : gen_add2_insn (this_rtx, scratch_rtx));
+ {
+ gcc_assert (temp1 != NULL_RTX || can_create_pseudo_p ());
+ temp1 = aarch64_force_temporary (mode, temp1, GEN_INT (moffset));
+ }
+ insn = emit_insn (offset < 0
+ ? gen_sub3_insn (dest, src, temp1)
+ : gen_add3_insn (dest, src, temp1));
if (frame_related_p)
{
RTX_FRAME_RELATED_P (insn) = frame_related_p;
- rtx adj = plus_constant (mode, this_rtx, delta);
- add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+ rtx adj = plus_constant (mode, src, offset);
+ add_reg_note (insn, REG_CFA_ADJUST_CFA, gen_rtx_SET (dest, adj));
}
}
-static inline void
-aarch64_add_constant (scalar_int_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta)
-{
- aarch64_add_constant_internal (mode, regnum, scratchreg, delta, false, true);
-}
+/* Set DEST to SRC + OFFSET. MODE is the mode of the addition.
+ FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
+ be set and CFA adjustments added to the generated instructions.
+
+ TEMP1, if nonnull, is a register of mode MODE that can be used as a
+ temporary if register allocation is already complete. This temporary
+ register may overlap DEST but must not overlap SRC. If TEMP1 is known
+ to hold abs (OFFSET), EMIT_MOVE_IMM can be set to false to avoid emitting
+ the immediate again.
+
+ Since this function may be used to adjust the stack pointer, we must
+ ensure that it cannot cause transient stack deallocation (for example
+ by first incrementing SP and then decrementing when adjusting by a
+ large immediate). */
+
+static void
+aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+ poly_int64 offset, rtx temp1, bool frame_related_p,
+ bool emit_move_imm = true)
+{
+ gcc_assert (emit_move_imm || temp1 != NULL_RTX);
+ gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+
+ /* SVE support will go here. */
+ HOST_WIDE_INT constant = offset.to_constant ();
+ aarch64_add_offset_1 (mode, dest, src, constant, temp1,
+ frame_related_p, emit_move_imm);
+}
+
+/* Add DELTA to the stack pointer, marking the instructions frame-related.
+ TEMP1 is available as a temporary if nonnull. EMIT_MOVE_IMM is false
+ if TEMP1 already contains abs (DELTA). */
static inline void
-aarch64_add_sp (int scratchreg, HOST_WIDE_INT delta, bool emit_move_imm)
+aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
{
- aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, delta,
- true, emit_move_imm);
+ aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta,
+ temp1, true, emit_move_imm);
}
+/* Subtract DELTA fom the stack pointer, marking the instructions
+ frame-related if FRAME_RELATED_P. TEMP1 is available as a temporary
+ if nonnull. */
+
static inline void
-aarch64_sub_sp (int scratchreg, HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, poly_int64 delta, bool frame_related_p)
{
- aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, -delta,
- frame_related_p, true);
+ aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
+ temp1, frame_related_p);
}
void
@@ -2078,9 +2098,8 @@ aarch64_expand_mov_immediate (rtx dest,
{
gcc_assert (can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- base = aarch64_add_offset (int_mode, NULL, base,
- INTVAL (offset));
- aarch64_emit_move (dest, base);
+ aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
+ NULL_RTX, false);
return;
}
@@ -2119,9 +2138,8 @@ aarch64_expand_mov_immediate (rtx dest,
{
gcc_assert(can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- base = aarch64_add_offset (int_mode, NULL, base,
- INTVAL (offset));
- aarch64_emit_move (dest, base);
+ aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
+ NULL_RTX, false);
return;
}
/* FALLTHRU */
@@ -3613,11 +3631,10 @@ aarch64_set_handled_components (sbitmap
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
-/* Allocate SIZE bytes of stack space using SCRATCH_REG as a scratch
- register. */
+/* Allocate SIZE bytes of stack space using TEMP1 as a scratch register. */
static void
-aarch64_allocate_and_probe_stack_space (int scratchreg, HOST_WIDE_INT size)
+aarch64_allocate_and_probe_stack_space (rtx temp1, HOST_WIDE_INT size)
{
HOST_WIDE_INT probe_interval
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
@@ -3631,7 +3648,7 @@ aarch64_allocate_and_probe_stack_space (
We can allocate GUARD_SIZE - GUARD_USED_BY_CALLER as a single chunk
without any probing. */
gcc_assert (size >= guard_size - guard_used_by_caller);
- aarch64_sub_sp (scratchreg, guard_size - guard_used_by_caller, true);
+ aarch64_sub_sp (temp1, guard_size - guard_used_by_caller, true);
HOST_WIDE_INT orig_size = size;
size -= (guard_size - guard_used_by_caller);
@@ -3643,17 +3660,16 @@ aarch64_allocate_and_probe_stack_space (
if (rounded_size && rounded_size <= 4 * probe_interval)
{
/* We don't use aarch64_sub_sp here because we don't want to
- repeatedly load SCRATCHREG. */
- rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg);
+ repeatedly load TEMP1. */
if (probe_interval > ARITH_FACTOR)
- emit_move_insn (scratch_rtx, GEN_INT (-probe_interval));
+ emit_move_insn (temp1, GEN_INT (-probe_interval));
else
- scratch_rtx = GEN_INT (-probe_interval);
+ temp1 = GEN_INT (-probe_interval);
for (HOST_WIDE_INT i = 0; i < rounded_size; i += probe_interval)
{
rtx_insn *insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
- scratch_rtx));
+ temp1));
add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
if (probe_interval > ARITH_FACTOR)
@@ -3674,10 +3690,10 @@ aarch64_allocate_and_probe_stack_space (
else if (rounded_size)
{
/* Compute the ending address. */
- rtx temp = gen_rtx_REG (word_mode, scratchreg);
- emit_move_insn (temp, GEN_INT (-rounded_size));
+ unsigned int scratchreg = REGNO (temp1);
+ emit_move_insn (temp1, GEN_INT (-rounded_size));
rtx_insn *insn
- = emit_insn (gen_add3_insn (temp, stack_pointer_rtx, temp));
+ = emit_insn (gen_add3_insn (temp1, stack_pointer_rtx, temp1));
/* For the initial allocation, we don't have a frame pointer
set up, so we always need CFI notes. If we're doing the
@@ -3692,7 +3708,7 @@ aarch64_allocate_and_probe_stack_space (
/* We want the CFA independent of the stack pointer for the
duration of the loop. */
add_reg_note (insn, REG_CFA_DEF_CFA,
- plus_constant (Pmode, temp,
+ plus_constant (Pmode, temp1,
(rounded_size + (orig_size - size))));
RTX_FRAME_RELATED_P (insn) = 1;
}
@@ -3702,7 +3718,7 @@ aarch64_allocate_and_probe_stack_space (
It also probes at a 4k interval regardless of the value of
PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL. */
insn = emit_insn (gen_probe_stack_range (stack_pointer_rtx,
- stack_pointer_rtx, temp));
+ stack_pointer_rtx, temp1));
/* Now reset the CFA register if needed. */
if (scratchreg == IP0_REGNUM || !frame_pointer_needed)
@@ -3723,7 +3739,7 @@ aarch64_allocate_and_probe_stack_space (
Note that any residual must be probed. */
if (residual)
{
- aarch64_sub_sp (scratchreg, residual, true);
+ aarch64_sub_sp (temp1, residual, true);
add_reg_note (get_last_insn (), REG_STACK_CHECK, const0_rtx);
emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
(residual - GET_MODE_SIZE (word_mode))));
@@ -3814,6 +3830,9 @@ aarch64_expand_prologue (void)
aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size);
}
+ rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
+ rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
+
/* We do not fully protect aarch64 against stack clash style attacks
as doing so would be prohibitively expensive with less utility over
time as newer compilers are deployed.
@@ -3859,9 +3878,9 @@ aarch64_expand_prologue (void)
outgoing args. */
if (flag_stack_clash_protection
&& initial_adjust >= guard_size - guard_used_by_caller)
- aarch64_allocate_and_probe_stack_space (IP0_REGNUM, initial_adjust);
+ aarch64_allocate_and_probe_stack_space (ip0_rtx, initial_adjust);
else
- aarch64_sub_sp (IP0_REGNUM, initial_adjust, true);
+ aarch64_sub_sp (ip0_rtx, initial_adjust, true);
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
@@ -3871,9 +3890,8 @@ aarch64_expand_prologue (void)
if (callee_adjust == 0)
aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM,
R30_REGNUM, false);
- insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
- stack_pointer_rtx,
- GEN_INT (callee_offset)));
+ aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
+ stack_pointer_rtx, callee_offset, ip1_rtx, true);
RTX_FRAME_RELATED_P (insn) = frame_pointer_needed;
emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
}
@@ -3890,9 +3908,9 @@ aarch64_expand_prologue (void)
less the amount of the guard reserved for use by the caller's
outgoing args. */
if (final_adjust >= guard_size - guard_used_by_caller)
- aarch64_allocate_and_probe_stack_space (IP1_REGNUM, final_adjust);
+ aarch64_allocate_and_probe_stack_space (ip1_rtx, final_adjust);
else
- aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
+ aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
/* We must also probe if the final adjustment is larger than the guard
that is assumed used by the caller. This may be sub-optimal. */
@@ -3905,7 +3923,7 @@ aarch64_expand_prologue (void)
}
}
else
- aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
+ aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
}
/* Return TRUE if we can use a simple_return insn.
@@ -3961,17 +3979,16 @@ aarch64_expand_epilogue (bool for_sibcal
/* Restore the stack pointer from the frame pointer if it may not
be the same as the stack pointer. */
+ rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
+ rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
if (frame_pointer_needed && (final_adjust || cfun->calls_alloca))
- {
- insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
- hard_frame_pointer_rtx,
- GEN_INT (-callee_offset)));
- /* If writeback is used when restoring callee-saves, the CFA
- is restored on the instruction doing the writeback. */
- RTX_FRAME_RELATED_P (insn) = callee_adjust == 0;
- }
+ /* If writeback is used when restoring callee-saves, the CFA
+ is restored on the instruction doing the writeback. */
+ aarch64_add_offset (Pmode, stack_pointer_rtx,
+ hard_frame_pointer_rtx, -callee_offset,
+ ip1_rtx, callee_adjust == 0);
else
- aarch64_add_sp (IP1_REGNUM, final_adjust,
+ aarch64_add_sp (ip1_rtx, final_adjust,
/* A stack clash protection prologue may not have
left IP1_REGNUM in a usable state. */
(flag_stack_clash_protection
@@ -4000,7 +4017,7 @@ aarch64_expand_epilogue (bool for_sibcal
/* A stack clash protection prologue may not have left IP0_REGNUM
in a usable state. */
- aarch64_add_sp (IP0_REGNUM, initial_adjust,
+ aarch64_add_sp (ip0_rtx, initial_adjust,
(flag_stack_clash_protection
|| df_regs_ever_live_p (IP0_REGNUM)));
@@ -4107,16 +4124,16 @@ aarch64_output_mi_thunk (FILE *file, tre
reload_completed = 1;
emit_note (NOTE_INSN_PROLOGUE_END);
+ this_rtx = gen_rtx_REG (Pmode, this_regno);
+ temp0 = gen_rtx_REG (Pmode, IP0_REGNUM);
+ temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
+
if (vcall_offset == 0)
- aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, false);
else
{
gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
- this_rtx = gen_rtx_REG (Pmode, this_regno);
- temp0 = gen_rtx_REG (Pmode, IP0_REGNUM);
- temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
-
addr = this_rtx;
if (delta != 0)
{
@@ -4124,7 +4141,8 @@ aarch64_output_mi_thunk (FILE *file, tre
addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
plus_constant (Pmode, this_rtx, delta));
else
- aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1,
+ false);
}
if (Pmode == ptr_mode)
Index: gcc/testsuite/gcc.target/aarch64/pr70044.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/pr70044.c 2017-10-27 14:06:52.606994276 +0100
+++ gcc/testsuite/gcc.target/aarch64/pr70044.c 2017-10-27 14:10:24.015116329 +0100
@@ -11,4 +11,4 @@ main (int argc, char **argv)
}
/* Check that the frame pointer really is created. */
-/* { dg-final { scan-lto-assembler "add x29, sp," } } */
+/* { dg-final { scan-lto-assembler "(mov|add) x29, sp" } } */
^ permalink raw reply [flat|nested] 29+ messages in thread
* [04/nn] [AArch64] Rename the internal "Upl" constraint
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (2 preceding siblings ...)
2017-10-27 13:26 ` [03/nn] [AArch64] Rework interface to add constant/offset routines Richard Sandiford
@ 2017-10-27 13:27 ` Richard Sandiford
2017-10-31 18:04 ` James Greenhalgh
2017-10-27 13:28 ` [05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate Richard Sandiford
` (7 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:27 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
The SVE port uses the public constraints "Upl" and "Upa" to mean
"low predicate register" and "any predicate register" respectively.
"Upl" was already used as an internal-only constraint by the
addition patterns, so this patch renames it to "Uaa" ("two adds
needed").
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/constraints.md (Upl): Rename to...
(Uaa): ...this.
* config/aarch64/aarch64.md
(*zero_extend<SHORT:mode><GPI:mode>2_aarch64, *addsi3_aarch64_uxtw):
Update accordingly.
Index: gcc/config/aarch64/constraints.md
===================================================================
--- gcc/config/aarch64/constraints.md 2017-10-27 14:06:16.159815485 +0100
+++ gcc/config/aarch64/constraints.md 2017-10-27 14:11:54.071011147 +0100
@@ -35,7 +35,7 @@ (define_constraint "I"
(and (match_code "const_int")
(match_test "aarch64_uimm12_shift (ival)")))
-(define_constraint "Upl"
+(define_constraint "Uaa"
"@internal A constant that matches two uses of add instructions."
(and (match_code "const_int")
(match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)")))
Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md 2017-10-27 14:07:01.875769946 +0100
+++ gcc/config/aarch64/aarch64.md 2017-10-27 14:11:54.071011147 +0100
@@ -1562,7 +1562,7 @@ (define_insn "*add<mode>3_aarch64"
(match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
(plus:GPI
(match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
- (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Upl")))]
+ (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))]
""
"@
add\\t%<w>0, %<w>1, %2
@@ -1580,7 +1580,7 @@ (define_insn "*addsi3_aarch64_uxtw"
(match_operand:DI 0 "register_operand" "=rk,rk,rk,r")
(zero_extend:DI
(plus:SI (match_operand:SI 1 "register_operand" "%rk,rk,rk,rk")
- (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Upl"))))]
+ (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Uaa"))))]
""
"@
add\\t%w0, %w1, %2
^ permalink raw reply [flat|nested] 29+ messages in thread
* [06/nn] [AArch64] Add an endian_lane_rtx helper routine
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (4 preceding siblings ...)
2017-10-27 13:28 ` [05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate Richard Sandiford
@ 2017-10-27 13:28 ` Richard Sandiford
2017-11-02 9:55 ` James Greenhalgh
2017-10-27 13:29 ` [07/nn] [AArch64] Pass number of units to aarch64_reverse_mask Richard Sandiford
` (5 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:28 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
Later patches turn the number of vector units into a poly_int.
We deliberately don't support applying GEN_INT to those (except
in target code that doesn't disguish between poly_ints and normal
constants); gen_int_mode needs to be used instead.
This patch therefore replaces instances of:
GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])))
with uses of a new endian_lane_rtx function.
2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_endian_lane_rtx): Declare.
* config/aarch64/aarch64.c (aarch64_endian_lane_rtx): New function.
* config/aarch64/aarch64.h (ENDIAN_LANE_N): Take the number
of units rather than the mode.
* config/aarch64/iterators.md (nunits): New mode attribute.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args):
Use aarch64_endian_lane_rtx instead of GEN_INT (ENDIAN_LANE_N ...).
* config/aarch64/aarch64-simd.md (aarch64_dup_lane<mode>)
(aarch64_dup_lane_<vswap_width_name><mode>, *aarch64_mul3_elt<mode>)
(*aarch64_mul3_elt_<vswap_width_name><mode>): Likewise.
(*aarch64_mul3_elt_to_64v2df, *aarch64_mla_elt<mode>): Likewise.
(*aarch64_mla_elt_<vswap_width_name><mode>, *aarch64_mls_elt<mode>)
(*aarch64_mls_elt_<vswap_width_name><mode>, *aarch64_fma4_elt<mode>)
(*aarch64_fma4_elt_<vswap_width_name><mode>):: Likewise.
(*aarch64_fma4_elt_to_64v2df, *aarch64_fnma4_elt<mode>): Likewise.
(*aarch64_fnma4_elt_<vswap_width_name><mode>): Likewise.
(*aarch64_fnma4_elt_to_64v2df, reduc_plus_scal_<mode>): Likewise.
(reduc_plus_scal_v4sf, reduc_<maxmin_uns>_scal_<mode>): Likewise.
(reduc_<maxmin_uns>_scal_<mode>): Likewise.
(*aarch64_get_lane_extend<GPI:mode><VDQQH:mode>): Likewise.
(*aarch64_get_lane_zero_extendsi<mode>): Likewise.
(aarch64_get_lane<mode>, *aarch64_mulx_elt_<vswap_width_name><mode>)
(*aarch64_mulx_elt<mode>, *aarch64_vgetfmulx<mode>): Likewise.
(aarch64_sq<r>dmulh_lane<mode>, aarch64_sq<r>dmulh_laneq<mode>)
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Likewise.
(aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Likewise.
(aarch64_sqdml<SBINQOPS:as>l_lane<mode>): Likewise.
(aarch64_sqdml<SBINQOPS:as>l_laneq<mode>): Likewise.
(aarch64_sqdml<SBINQOPS:as>l2_lane<mode>_internal): Likewise.
(aarch64_sqdml<SBINQOPS:as>l2_laneq<mode>_internal): Likewise.
(aarch64_sqdmull_lane<mode>, aarch64_sqdmull_laneq<mode>): Likewise.
(aarch64_sqdmull2_lane<mode>_internal): Likewise.
(aarch64_sqdmull2_laneq<mode>_internal): Likewise.
(aarch64_vec_load_lanesoi_lane<mode>): Likewise.
(aarch64_vec_store_lanesoi_lane<mode>): Likewise.
(aarch64_vec_load_lanesci_lane<mode>): Likewise.
(aarch64_vec_store_lanesci_lane<mode>): Likewise.
(aarch64_vec_load_lanesxi_lane<mode>): Likewise.
(aarch64_vec_store_lanesxi_lane<mode>): Likewise.
(aarch64_simd_vec_set<mode>): Update use of ENDIAN_LANE_N.
(aarch64_simd_vec_setv2di): Likewise.
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:11:56.993658452 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:00.601693018 +0100
@@ -437,6 +437,7 @@ void aarch64_simd_emit_reg_reg_move (rtx
rtx aarch64_simd_expand_builtin (int, tree, rtx);
void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
+rtx aarch64_endian_lane_rtx (machine_mode, unsigned int);
void aarch64_split_128bit_move (rtx, rtx);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:11:56.995515870 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:00.603550436 +0100
@@ -12083,6 +12083,15 @@ aarch64_simd_lane_bounds (rtx operand, H
}
}
+/* Peform endian correction on lane number N, which indexes a vector
+ of mode MODE, and return the result as an SImode rtx. */
+
+rtx
+aarch64_endian_lane_rtx (machine_mode mode, unsigned int n)
+{
+ return gen_int_mode (ENDIAN_LANE_N (GET_MODE_NUNITS (mode), n), SImode);
+}
+
/* Return TRUE if OP is a valid vector addressing mode. */
bool
aarch64_simd_mem_operand_p (rtx op)
Index: gcc/config/aarch64/aarch64.h
===================================================================
--- gcc/config/aarch64/aarch64.h 2017-10-27 14:05:38.132936808 +0100
+++ gcc/config/aarch64/aarch64.h 2017-10-27 14:12:00.603550436 +0100
@@ -910,8 +910,8 @@ #define AARCH64_VALID_SIMD_QREG_MODE(MOD
|| (MODE) == V4SFmode || (MODE) == V8HFmode || (MODE) == V2DImode \
|| (MODE) == V2DFmode)
-#define ENDIAN_LANE_N(mode, n) \
- (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
+#define ENDIAN_LANE_N(NUNITS, N) \
+ (BYTES_BIG_ENDIAN ? NUNITS - 1 - N : N)
/* Support for a configure-time default CPU, etc. We currently support
--with-arch and --with-cpu. Both are ignored if either is specified
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md 2017-10-27 14:11:56.995515870 +0100
+++ gcc/config/aarch64/iterators.md 2017-10-27 14:12:00.604479145 +0100
@@ -438,6 +438,17 @@ (define_mode_attr vw2 [(DI "") (QI "h")
(define_mode_attr rtn [(DI "d") (SI "")])
(define_mode_attr vas [(DI "") (SI ".2s")])
+;; Map a vector to the number of units in it, if the size of the mode
+;; is constant.
+(define_mode_attr nunits [(V8QI "8") (V16QI "16")
+ (V4HI "4") (V8HI "8")
+ (V2SI "2") (V4SI "4")
+ (V2DI "2")
+ (V4HF "4") (V8HF "8")
+ (V2SF "2") (V4SF "4")
+ (V1DF "1") (V2DF "2")
+ (DI "1") (DF "1")])
+
;; Map a mode to the number of bits in it, if the size of the mode
;; is constant.
(define_mode_attr bitsize [(V8QI "64") (V16QI "128")
Index: gcc/config/aarch64/aarch64-builtins.c
===================================================================
--- gcc/config/aarch64/aarch64-builtins.c 2017-10-27 14:05:38.132936808 +0100
+++ gcc/config/aarch64/aarch64-builtins.c 2017-10-27 14:12:00.601693018 +0100
@@ -1069,8 +1069,8 @@ aarch64_simd_expand_args (rtx target, in
GET_MODE_NUNITS (builtin_mode),
exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
- op[opc] =
- GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])));
+ op[opc] = aarch64_endian_lane_rtx (builtin_mode,
+ INTVAL (op[opc]));
}
goto constant_arg;
@@ -1083,7 +1083,7 @@ aarch64_simd_expand_args (rtx target, in
aarch64_simd_lane_bounds (op[opc],
0, GET_MODE_NUNITS (vmode), exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
- op[opc] = GEN_INT (ENDIAN_LANE_N (vmode, INTVAL (op[opc])));
+ op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
}
/* Fall through - if the lane index isn't a constant then
the next case will error. */
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:11:56.994587161 +0100
+++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:00.602621727 +0100
@@ -80,7 +80,7 @@ (define_insn "aarch64_dup_lane<mode>"
)))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "dup\\t%0.<Vtype>, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon_dup<q>")]
@@ -95,8 +95,7 @@ (define_insn "aarch64_dup_lane_<vswap_wi
)))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
return "dup\\t%0.<Vtype>, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon_dup<q>")]
@@ -501,7 +500,7 @@ (define_insn "*aarch64_mul3_elt<mode>"
(match_operand:VMUL 3 "register_operand" "w")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
@@ -517,8 +516,7 @@ (define_insn "*aarch64_mul3_elt_<vswap_w
(match_operand:VMUL_CHANGE_NLANES 3 "register_operand" "w")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
@@ -571,7 +569,7 @@ (define_insn "*aarch64_mul3_elt_to_64v2d
(match_operand:DF 3 "register_operand" "w")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (V2DFmode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (V2DFmode, INTVAL (operands[2]));
return "fmul\\t%0.2d, %3.2d, %1.d[%2]";
}
[(set_attr "type" "neon_fp_mul_d_scalar_q")]
@@ -706,7 +704,7 @@ (define_insn "aarch64_simd_vec_set<mode>
(match_operand:SI 2 "immediate_operand" "i,i,i")))]
"TARGET_SIMD"
{
- int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
+ int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2])));
operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
switch (which_alternative)
{
@@ -1072,7 +1070,7 @@ (define_insn "aarch64_simd_vec_setv2di"
(match_operand:SI 2 "immediate_operand" "i,i")))]
"TARGET_SIMD"
{
- int elt = ENDIAN_LANE_N (V2DImode, exact_log2 (INTVAL (operands[2])));
+ int elt = ENDIAN_LANE_N (2, exact_log2 (INTVAL (operands[2])));
operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
switch (which_alternative)
{
@@ -1109,7 +1107,7 @@ (define_insn "aarch64_simd_vec_set<mode>
(match_operand:SI 2 "immediate_operand" "i")))]
"TARGET_SIMD"
{
- int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
+ int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2])));
operands[2] = GEN_INT ((HOST_WIDE_INT)1 << elt);
return "ins\t%0.<Vetype>[%p2], %1.<Vetype>[0]";
@@ -1154,7 +1152,7 @@ (define_insn "*aarch64_mla_elt<mode>"
(match_operand:VDQHS 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "mla\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
@@ -1172,8 +1170,7 @@ (define_insn "*aarch64_mla_elt_<vswap_wi
(match_operand:VDQHS 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
return "mla\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
@@ -1213,7 +1210,7 @@ (define_insn "*aarch64_mls_elt<mode>"
(match_operand:VDQHS 3 "register_operand" "w"))))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "mls\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
@@ -1231,8 +1228,7 @@ (define_insn "*aarch64_mls_elt_<vswap_wi
(match_operand:VDQHS 3 "register_operand" "w"))))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
return "mls\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
@@ -1802,7 +1798,7 @@ (define_insn "*aarch64_fma4_elt<mode>"
(match_operand:VDQF 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "fmla\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
@@ -1819,8 +1815,7 @@ (define_insn "*aarch64_fma4_elt_<vswap_w
(match_operand:VDQSF 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
return "fmla\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
@@ -1848,7 +1843,7 @@ (define_insn "*aarch64_fma4_elt_to_64v2d
(match_operand:DF 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (V2DFmode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (V2DFmode, INTVAL (operands[2]));
return "fmla\\t%0.2d, %3.2d, %1.2d[%2]";
}
[(set_attr "type" "neon_fp_mla_d_scalar_q")]
@@ -1878,7 +1873,7 @@ (define_insn "*aarch64_fnma4_elt<mode>"
(match_operand:VDQF 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "fmls\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
@@ -1896,8 +1891,7 @@ (define_insn "*aarch64_fnma4_elt_<vswap_
(match_operand:VDQSF 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
return "fmls\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
}
[(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
@@ -1927,7 +1921,7 @@ (define_insn "*aarch64_fnma4_elt_to_64v2
(match_operand:DF 4 "register_operand" "0")))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (V2DFmode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (V2DFmode, INTVAL (operands[2]));
return "fmls\\t%0.2d, %3.2d, %1.2d[%2]";
}
[(set_attr "type" "neon_fp_mla_d_scalar_q")]
@@ -2260,7 +2254,7 @@ (define_expand "reduc_plus_scal_<mode>"
UNSPEC_ADDV)]
"TARGET_SIMD"
{
- rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
+ rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
rtx scratch = gen_reg_rtx (<MODE>mode);
emit_insn (gen_aarch64_reduc_plus_internal<mode> (scratch, operands[1]));
emit_insn (gen_aarch64_get_lane<mode> (operands[0], scratch, elt));
@@ -2311,7 +2305,7 @@ (define_expand "reduc_plus_scal_v4sf"
UNSPEC_FADDV))]
"TARGET_SIMD"
{
- rtx elt = GEN_INT (ENDIAN_LANE_N (V4SFmode, 0));
+ rtx elt = aarch64_endian_lane_rtx (V4SFmode, 0);
rtx scratch = gen_reg_rtx (V4SFmode);
emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1]));
emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch));
@@ -2353,7 +2347,7 @@ (define_expand "reduc_<maxmin_uns>_scal_
FMAXMINV)]
"TARGET_SIMD"
{
- rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
+ rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
rtx scratch = gen_reg_rtx (<MODE>mode);
emit_insn (gen_aarch64_reduc_<maxmin_uns>_internal<mode> (scratch,
operands[1]));
@@ -2369,7 +2363,7 @@ (define_expand "reduc_<maxmin_uns>_scal_
MAXMINV)]
"TARGET_SIMD"
{
- rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
+ rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
rtx scratch = gen_reg_rtx (<MODE>mode);
emit_insn (gen_aarch64_reduc_<maxmin_uns>_internal<mode> (scratch,
operands[1]));
@@ -2894,7 +2888,7 @@ (define_insn "*aarch64_get_lane_extend<G
(parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "smov\\t%<GPI:w>0, %1.<VDQQH:Vetype>[%2]";
}
[(set_attr "type" "neon_to_gp<q>")]
@@ -2908,7 +2902,7 @@ (define_insn "*aarch64_get_lane_zero_ext
(parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "umov\\t%w0, %1.<Vetype>[%2]";
}
[(set_attr "type" "neon_to_gp<q>")]
@@ -2924,7 +2918,7 @@ (define_insn "aarch64_get_lane<mode>"
(parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
switch (which_alternative)
{
case 0:
@@ -3300,8 +3294,7 @@ (define_insn "*aarch64_mulx_elt_<vswap_w
UNSPEC_FMULX))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
- INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[3]));
return "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_fp_mul_<Vetype>_scalar<q>")]
@@ -3320,7 +3313,7 @@ (define_insn "*aarch64_mulx_elt<mode>"
UNSPEC_FMULX))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
return "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_fp_mul_<Vetype><q>")]
@@ -3354,7 +3347,7 @@ (define_insn "*aarch64_vgetfmulx<mode>"
UNSPEC_FMULX))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
return "fmulx\t%<Vetype>0, %<Vetype>1, %2.<Vetype>[%3]";
}
[(set_attr "type" "fmul<Vetype>")]
@@ -3440,7 +3433,7 @@ (define_insn "aarch64_sq<r>dmulh_lane<mo
VQDMULH))]
"TARGET_SIMD"
"*
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
return \"sq<r>dmulh\\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[%3]\";"
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
)
@@ -3455,7 +3448,7 @@ (define_insn "aarch64_sq<r>dmulh_laneq<m
VQDMULH))]
"TARGET_SIMD"
"*
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
return \"sq<r>dmulh\\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[%3]\";"
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
)
@@ -3470,7 +3463,7 @@ (define_insn "aarch64_sq<r>dmulh_lane<mo
VQDMULH))]
"TARGET_SIMD"
"*
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
return \"sq<r>dmulh\\t%<v>0, %<v>1, %2.<v>[%3]\";"
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
)
@@ -3485,7 +3478,7 @@ (define_insn "aarch64_sq<r>dmulh_laneq<m
VQDMULH))]
"TARGET_SIMD"
"*
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
return \"sq<r>dmulh\\t%<v>0, %<v>1, %2.<v>[%3]\";"
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
)
@@ -3517,7 +3510,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
return
"sqrdml<SQRDMLH_AS:rdma_as>h\\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[%4]";
}
@@ -3535,7 +3528,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
return
"sqrdml<SQRDMLH_AS:rdma_as>h\\t%<v>0, %<v>2, %3.<Vetype>[%4]";
}
@@ -3555,7 +3548,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
return
"sqrdml<SQRDMLH_AS:rdma_as>h\\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[%4]";
}
@@ -3573,7 +3566,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
SQRDMLH_AS))]
"TARGET_SIMD_RDMA"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
return
"sqrdml<SQRDMLH_AS:rdma_as>h\\t%<v>0, %<v>2, %3.<v>[%4]";
}
@@ -3617,7 +3610,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
(const_int 1))))]
"TARGET_SIMD"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
return
"sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
}
@@ -3641,7 +3634,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
(const_int 1))))]
"TARGET_SIMD"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
return
"sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
}
@@ -3664,7 +3657,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
(const_int 1))))]
"TARGET_SIMD"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
return
"sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
}
@@ -3687,7 +3680,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
(const_int 1))))]
"TARGET_SIMD"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
return
"sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
}
@@ -3782,7 +3775,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
(const_int 1))))]
"TARGET_SIMD"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
return
"sqdml<SBINQOPS:as>l2\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
}
@@ -3808,7 +3801,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
(const_int 1))))]
"TARGET_SIMD"
{
- operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
+ operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
return
"sqdml<SBINQOPS:as>l2\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
}
@@ -3955,7 +3948,7 @@ (define_insn "aarch64_sqdmull_lane<mode>
(const_int 1)))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
@@ -3976,7 +3969,7 @@ (define_insn "aarch64_sqdmull_laneq<mode
(const_int 1)))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
@@ -3996,7 +3989,7 @@ (define_insn "aarch64_sqdmull_lane<mode>
(const_int 1)))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
@@ -4016,7 +4009,7 @@ (define_insn "aarch64_sqdmull_laneq<mode
(const_int 1)))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
@@ -4094,7 +4087,7 @@ (define_insn "aarch64_sqdmull2_lane<mode
(const_int 1)))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
return "sqdmull2\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
@@ -4117,7 +4110,7 @@ (define_insn "aarch64_sqdmull2_laneq<mod
(const_int 1)))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
return "sqdmull2\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
}
[(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
@@ -4623,7 +4616,7 @@ (define_insn "aarch64_vec_load_lanesoi_l
UNSPEC_LD2_LANE))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
return "ld2\\t{%S0.<Vetype> - %T0.<Vetype>}[%3], %1";
}
[(set_attr "type" "neon_load2_one_lane")]
@@ -4667,7 +4660,7 @@ (define_insn "aarch64_vec_store_lanesoi_
UNSPEC_ST2_LANE))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "st2\\t{%S1.<Vetype> - %T1.<Vetype>}[%2], %0";
}
[(set_attr "type" "neon_store2_one_lane<q>")]
@@ -4721,7 +4714,7 @@ (define_insn "aarch64_vec_load_lanesci_l
UNSPEC_LD3_LANE))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
return "ld3\\t{%S0.<Vetype> - %U0.<Vetype>}[%3], %1";
}
[(set_attr "type" "neon_load3_one_lane")]
@@ -4765,7 +4758,7 @@ (define_insn "aarch64_vec_store_lanesci_
UNSPEC_ST3_LANE))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "st3\\t{%S1.<Vetype> - %U1.<Vetype>}[%2], %0";
}
[(set_attr "type" "neon_store3_one_lane<q>")]
@@ -4819,7 +4812,7 @@ (define_insn "aarch64_vec_load_lanesxi_l
UNSPEC_LD4_LANE))]
"TARGET_SIMD"
{
- operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
+ operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
return "ld4\\t{%S0.<Vetype> - %V0.<Vetype>}[%3], %1";
}
[(set_attr "type" "neon_load4_one_lane")]
@@ -4863,7 +4856,7 @@ (define_insn "aarch64_vec_store_lanesxi_
UNSPEC_ST4_LANE))]
"TARGET_SIMD"
{
- operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
+ operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
return "st4\\t{%S1.<Vetype> - %V1.<Vetype>}[%2], %0";
}
[(set_attr "type" "neon_store4_one_lane<q>")]
^ permalink raw reply [flat|nested] 29+ messages in thread
* [05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (3 preceding siblings ...)
2017-10-27 13:27 ` [04/nn] [AArch64] Rename the internal "Upl" constraint Richard Sandiford
@ 2017-10-27 13:28 ` Richard Sandiford
2017-11-10 11:20 ` James Greenhalgh
2017-10-27 13:28 ` [06/nn] [AArch64] Add an endian_lane_rtx helper routine Richard Sandiford
` (6 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:28 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch reworks aarch64_simd_valid_immediate so that
it's easier to add SVE support. The main changes are:
- make simd_immediate_info easier to construct
- replace the while (1) { ... break; } blocks with checks that use
the full 64-bit value of the constant
- treat floating-point modes as integers if they aren't valid
as floating-point values
2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_output_simd_mov_immediate):
Remove the mode argument.
(aarch64_simd_valid_immediate): Remove the mode and inverse
arguments.
* config/aarch64/iterators.md (bitsize): New iterator.
* config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>, and<mode>3)
(ior<mode>3): Update calls to aarch64_output_simd_mov_immediate.
* config/aarch64/constraints.md (Do, Db, Dn): Update calls to
aarch64_simd_valid_immediate.
* config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Likewise.
(aarch64_reg_or_bic_imm): Likewise.
* config/aarch64/aarch64.c (simd_immediate_info): Replace mvn
with an insn_type enum and msl with a modifier_type enum.
Replace element_width with a scalar_mode. Change the shift
to unsigned int. Add constructors for scalar_float_mode and
scalar_int_mode elements.
(aarch64_vect_float_const_representable_p): Delete.
(aarch64_can_const_movi_rtx_p, aarch64_legitimate_constant_p)
(aarch64_simd_scalar_immediate_valid_for_move)
(aarch64_simd_make_constant): Update call to
aarch64_simd_valid_immediate.
(aarch64_advsimd_valid_immediate_hs): New function.
(aarch64_advsimd_valid_immediate): Likewise.
(aarch64_simd_valid_immediate): Remove mode and inverse
arguments. Rewrite to use the above. Use const_vec_duplicate_p
to detect duplicated constants and use aarch64_float_const_zero_rtx_p
and aarch64_float_const_representable_p on the result.
(aarch64_output_simd_mov_immediate): Remove mode argument.
Update call to aarch64_simd_valid_immediate and use of
simd_immediate_info.
(aarch64_output_scalar_simd_mov_immediate): Update call
accordingly.
gcc/testsuite/
* gcc.target/aarch64/vect-movi.c (movi_float_lsl24): New function.
(main): Call it.
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
*** gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:06:16.157803281 +0100
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:26:40.949165813 +0100
*************** bool aarch64_mov_operand_p (rtx, machine
*** 368,374 ****
rtx aarch64_reverse_mask (machine_mode);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
! char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
bool aarch64_regno_ok_for_base_p (int, bool);
--- 368,374 ----
rtx aarch64_reverse_mask (machine_mode);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
! char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
bool aarch64_pad_reg_upward (machine_mode, const_tree, bool);
bool aarch64_regno_ok_for_base_p (int, bool);
*************** bool aarch64_simd_check_vect_par_cnst_ha
*** 379,386 ****
bool aarch64_simd_imm_zero_p (rtx, machine_mode);
bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
! bool aarch64_simd_valid_immediate (rtx, machine_mode, bool,
! struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
bool aarch64_split_dimode_const_store (rtx, rtx);
bool aarch64_symbolic_address_p (rtx);
--- 379,385 ----
bool aarch64_simd_imm_zero_p (rtx, machine_mode);
bool aarch64_simd_scalar_immediate_valid_for_move (rtx, scalar_int_mode);
bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool);
! bool aarch64_simd_valid_immediate (rtx, struct simd_immediate_info *,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
bool aarch64_split_dimode_const_store (rtx, rtx);
bool aarch64_symbolic_address_p (rtx);
Index: gcc/config/aarch64/iterators.md
===================================================================
*** gcc/config/aarch64/iterators.md 2017-10-27 14:05:38.185854661 +0100
--- gcc/config/aarch64/iterators.md 2017-10-27 14:26:40.949165813 +0100
*************** (define_mode_attr vw2 [(DI "") (QI "h")
*** 438,443 ****
--- 438,450 ----
(define_mode_attr rtn [(DI "d") (SI "")])
(define_mode_attr vas [(DI "") (SI ".2s")])
+ ;; Map a mode to the number of bits in it, if the size of the mode
+ ;; is constant.
+ (define_mode_attr bitsize [(V8QI "64") (V16QI "128")
+ (V4HI "64") (V8HI "128")
+ (V2SI "64") (V4SI "128")
+ (V2DI "128")])
+
;; Map a floating point mode to the appropriate register name prefix
(define_mode_attr s [(HF "h") (SF "s") (DF "d")])
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
*** gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:05:38.185854661 +0100
--- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:26:40.949165813 +0100
*************** (define_insn "*aarch64_simd_mov<mode>"
*** 121,128 ****
case 5: return "fmov\t%d0, %1";
case 6: return "mov\t%0, %1";
case 7:
! return aarch64_output_simd_mov_immediate (operands[1],
! <MODE>mode, 64);
default: gcc_unreachable ();
}
}
--- 121,127 ----
case 5: return "fmov\t%d0, %1";
case 6: return "mov\t%0, %1";
case 7:
! return aarch64_output_simd_mov_immediate (operands[1], 64);
default: gcc_unreachable ();
}
}
*************** (define_insn "*aarch64_simd_mov<mode>"
*** 155,161 ****
case 6:
return "#";
case 7:
! return aarch64_output_simd_mov_immediate (operands[1], <MODE>mode, 128);
default:
gcc_unreachable ();
}
--- 154,160 ----
case 6:
return "#";
case 7:
! return aarch64_output_simd_mov_immediate (operands[1], 128);
default:
gcc_unreachable ();
}
*************** (define_insn "and<mode>3"
*** 651,658 ****
case 0:
return "and\t%0.<Vbtype>, %1.<Vbtype>, %2.<Vbtype>";
case 1:
! return aarch64_output_simd_mov_immediate (operands[2],
! <MODE>mode, GET_MODE_BITSIZE (<MODE>mode), AARCH64_CHECK_BIC);
default:
gcc_unreachable ();
}
--- 650,657 ----
case 0:
return "and\t%0.<Vbtype>, %1.<Vbtype>, %2.<Vbtype>";
case 1:
! return aarch64_output_simd_mov_immediate (operands[2], <bitsize>,
! AARCH64_CHECK_BIC);
default:
gcc_unreachable ();
}
*************** (define_insn "ior<mode>3"
*** 672,679 ****
case 0:
return "orr\t%0.<Vbtype>, %1.<Vbtype>, %2.<Vbtype>";
case 1:
! return aarch64_output_simd_mov_immediate (operands[2],
! <MODE>mode, GET_MODE_BITSIZE (<MODE>mode), AARCH64_CHECK_ORR);
default:
gcc_unreachable ();
}
--- 671,678 ----
case 0:
return "orr\t%0.<Vbtype>, %1.<Vbtype>, %2.<Vbtype>";
case 1:
! return aarch64_output_simd_mov_immediate (operands[2], <bitsize>,
! AARCH64_CHECK_ORR);
default:
gcc_unreachable ();
}
Index: gcc/config/aarch64/constraints.md
===================================================================
*** gcc/config/aarch64/constraints.md 2017-10-27 14:11:54.071011147 +0100
--- gcc/config/aarch64/constraints.md 2017-10-27 14:11:56.995515870 +0100
*************** (define_constraint "Do"
*** 194,215 ****
"@internal
A constraint that matches vector of immediates for orr."
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, mode, false,
! NULL, AARCH64_CHECK_ORR)")))
(define_constraint "Db"
"@internal
A constraint that matches vector of immediates for bic."
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, mode, false,
! NULL, AARCH64_CHECK_BIC)")))
(define_constraint "Dn"
"@internal
A constraint that matches vector of immediates."
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, GET_MODE (op),
! false, NULL)")))
(define_constraint "Dh"
"@internal
--- 194,214 ----
"@internal
A constraint that matches vector of immediates for orr."
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, NULL,
! AARCH64_CHECK_ORR)")))
(define_constraint "Db"
"@internal
A constraint that matches vector of immediates for bic."
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, NULL,
! AARCH64_CHECK_BIC)")))
(define_constraint "Dn"
"@internal
A constraint that matches vector of immediates."
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, NULL)")))
(define_constraint "Dh"
"@internal
Index: gcc/config/aarch64/predicates.md
===================================================================
*** gcc/config/aarch64/predicates.md 2017-10-27 14:06:16.159815485 +0100
--- gcc/config/aarch64/predicates.md 2017-10-27 14:11:56.995515870 +0100
*************** (define_predicate "aarch64_reg_zero_or_m
*** 72,85 ****
(define_predicate "aarch64_reg_or_orr_imm"
(ior (match_operand 0 "register_operand")
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, mode, false,
! NULL, AARCH64_CHECK_ORR)"))))
(define_predicate "aarch64_reg_or_bic_imm"
(ior (match_operand 0 "register_operand")
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, mode, false,
! NULL, AARCH64_CHECK_BIC)"))))
(define_predicate "aarch64_fp_compare_operand"
(ior (match_operand 0 "register_operand")
--- 72,85 ----
(define_predicate "aarch64_reg_or_orr_imm"
(ior (match_operand 0 "register_operand")
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, NULL,
! AARCH64_CHECK_ORR)"))))
(define_predicate "aarch64_reg_or_bic_imm"
(ior (match_operand 0 "register_operand")
(and (match_code "const_vector")
! (match_test "aarch64_simd_valid_immediate (op, NULL,
! AARCH64_CHECK_BIC)"))))
(define_predicate "aarch64_fp_compare_operand"
(ior (match_operand 0 "register_operand")
Index: gcc/config/aarch64/aarch64.c
===================================================================
*** gcc/config/aarch64/aarch64.c 2017-10-27 14:11:14.425034427 +0100
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:26:40.949165813 +0100
*************** struct aarch64_address_info {
*** 117,130 ****
enum aarch64_symbol_type symbol_type;
};
struct simd_immediate_info
{
rtx value;
! int shift;
! int element_width;
! bool mvn;
! bool msl;
! };
/* The current code model. */
enum aarch64_code_model aarch64_cmodel;
--- 117,168 ----
enum aarch64_symbol_type symbol_type;
};
+ /* Information about a legitimate vector immediate operand. */
struct simd_immediate_info
{
+ enum insn_type { MOV, MVN };
+ enum modifier_type { LSL, MSL };
+
+ simd_immediate_info () {}
+ simd_immediate_info (scalar_float_mode, rtx);
+ simd_immediate_info (scalar_int_mode, unsigned HOST_WIDE_INT,
+ insn_type = MOV, modifier_type = LSL,
+ unsigned int = 0);
+
+ /* The mode of the elements. */
+ scalar_mode elt_mode;
+
+ /* The value of each element. */
rtx value;
!
! /* The instruction to use to move the immediate into a vector. */
! insn_type insn;
!
! /* The kind of shift modifier to use, and the number of bits to shift.
! This is (LSL, 0) if no shift is needed. */
! modifier_type modifier;
! unsigned int shift;
! };
!
! /* Construct a floating-point immediate in which each element has mode
! ELT_MODE_IN and value VALUE_IN. */
! inline simd_immediate_info
! ::simd_immediate_info (scalar_float_mode elt_mode_in, rtx value_in)
! : elt_mode (elt_mode_in), value (value_in), insn (MOV),
! modifier (LSL), shift (0)
! {}
!
! /* Construct an integer immediate in which each element has mode ELT_MODE_IN
! and value VALUE_IN. The other parameters are as for the structure
! fields. */
! inline simd_immediate_info
! ::simd_immediate_info (scalar_int_mode elt_mode_in,
! unsigned HOST_WIDE_INT value_in,
! insn_type insn_in, modifier_type modifier_in,
! unsigned int shift_in)
! : elt_mode (elt_mode_in), value (gen_int_mode (value_in, elt_mode_in)),
! insn (insn_in), modifier (modifier_in), shift (shift_in)
! {}
/* The current code model. */
enum aarch64_code_model aarch64_cmodel;
*************** aarch64_can_const_movi_rtx_p (rtx x, mac
*** 5083,5089 ****
vmode = aarch64_simd_container_mode (imode, width);
rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, ival);
! return aarch64_simd_valid_immediate (v_op, vmode, false, NULL);
}
--- 5121,5127 ----
vmode = aarch64_simd_container_mode (imode, width);
rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, ival);
! return aarch64_simd_valid_immediate (v_op, NULL);
}
*************** aarch64_legitimate_constant_p (machine_m
*** 10623,10629 ****
As such we have to prevent the compiler from forcing these
to memory. */
if ((GET_CODE (x) == CONST_VECTOR
! && aarch64_simd_valid_immediate (x, mode, false, NULL))
|| CONST_INT_P (x)
|| aarch64_valid_floating_const (x)
|| aarch64_can_const_movi_rtx_p (x, mode)
--- 10661,10667 ----
As such we have to prevent the compiler from forcing these
to memory. */
if ((GET_CODE (x) == CONST_VECTOR
! && aarch64_simd_valid_immediate (x, NULL))
|| CONST_INT_P (x)
|| aarch64_valid_floating_const (x)
|| aarch64_can_const_movi_rtx_p (x, mode)
*************** sizetochar (int size)
*** 11698,11897 ****
}
}
! /* Return true iff x is a uniform vector of floating-point
! constants, and the constant can be represented in
! quarter-precision form. Note, as aarch64_float_const_representable
! rejects both +0.0 and -0.0, we will also reject +0.0 and -0.0. */
! static bool
! aarch64_vect_float_const_representable_p (rtx x)
! {
! rtx elt;
! return (GET_MODE_CLASS (GET_MODE (x)) == MODE_VECTOR_FLOAT
! && const_vec_duplicate_p (x, &elt)
! && aarch64_float_const_representable_p (elt));
! }
!
! /* Return true for valid and false for invalid. */
! bool
! aarch64_simd_valid_immediate (rtx op, machine_mode mode, bool inverse,
! struct simd_immediate_info *info,
! enum simd_immediate_check which)
! {
! #define CHECK(STRIDE, ELSIZE, CLASS, TEST, SHIFT, NEG) \
! matches = 1; \
! for (i = 0; i < idx; i += (STRIDE)) \
! if (!(TEST)) \
! matches = 0; \
! if (matches) \
! { \
! immtype = (CLASS); \
! elsize = (ELSIZE); \
! eshift = (SHIFT); \
! emvn = (NEG); \
! break; \
! }
!
! unsigned int i, elsize = 0, idx = 0, n_elts = CONST_VECTOR_NUNITS (op);
! unsigned int innersize = GET_MODE_UNIT_SIZE (mode);
! unsigned char bytes[16];
! int immtype = -1, matches;
! unsigned int invmask = inverse ? 0xff : 0;
! int eshift, emvn;
!
! if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
! {
! if (! (aarch64_simd_imm_zero_p (op, mode)
! || aarch64_vect_float_const_representable_p (op)))
! return false;
! if (info)
! {
! rtx elt = CONST_VECTOR_ELT (op, 0);
! scalar_float_mode elt_mode
! = as_a <scalar_float_mode> (GET_MODE (elt));
!
! info->value = elt;
! info->element_width = GET_MODE_BITSIZE (elt_mode);
! info->mvn = false;
! info->shift = 0;
}
! return true;
! }
! /* Splat vector constant out into a byte vector. */
! for (i = 0; i < n_elts; i++)
! {
! /* The vector is provided in gcc endian-neutral fashion. For aarch64_be,
! it must be laid out in the vector register in reverse order. */
! rtx el = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i);
! unsigned HOST_WIDE_INT elpart;
! gcc_assert (CONST_INT_P (el));
! elpart = INTVAL (el);
! for (unsigned int byte = 0; byte < innersize; byte++)
{
! bytes[idx++] = (elpart & 0xff) ^ invmask;
! elpart >>= BITS_PER_UNIT;
}
-
}
! /* Sanity check. */
! gcc_assert (idx == GET_MODE_SIZE (mode));
!
! do
{
! if (which & AARCH64_CHECK_ORR)
{
! CHECK (4, 32, 0, bytes[i] == bytes[0] && bytes[i + 1] == 0
! && bytes[i + 2] == 0 && bytes[i + 3] == 0, 0, 0);
!
! CHECK (4, 32, 1, bytes[i] == 0 && bytes[i + 1] == bytes[1]
! && bytes[i + 2] == 0 && bytes[i + 3] == 0, 8, 0);
!
! CHECK (4, 32, 2, bytes[i] == 0 && bytes[i + 1] == 0
! && bytes[i + 2] == bytes[2] && bytes[i + 3] == 0, 16, 0);
!
! CHECK (4, 32, 3, bytes[i] == 0 && bytes[i + 1] == 0
! && bytes[i + 2] == 0 && bytes[i + 3] == bytes[3], 24, 0);
!
! CHECK (2, 16, 4, bytes[i] == bytes[0] && bytes[i + 1] == 0, 0, 0);
!
! CHECK (2, 16, 5, bytes[i] == 0 && bytes[i + 1] == bytes[1], 8, 0);
! }
!
! if (which & AARCH64_CHECK_BIC)
! {
! CHECK (4, 32, 6, bytes[i] == bytes[0] && bytes[i + 1] == 0xff
! && bytes[i + 2] == 0xff && bytes[i + 3] == 0xff, 0, 1);
!
! CHECK (4, 32, 7, bytes[i] == 0xff && bytes[i + 1] == bytes[1]
! && bytes[i + 2] == 0xff && bytes[i + 3] == 0xff, 8, 1);
!
! CHECK (4, 32, 8, bytes[i] == 0xff && bytes[i + 1] == 0xff
! && bytes[i + 2] == bytes[2] && bytes[i + 3] == 0xff, 16, 1);
!
! CHECK (4, 32, 9, bytes[i] == 0xff && bytes[i + 1] == 0xff
! && bytes[i + 2] == 0xff && bytes[i + 3] == bytes[3], 24, 1);
!
! CHECK (2, 16, 10, bytes[i] == bytes[0] && bytes[i + 1] == 0xff, 0, 1);
!
! CHECK (2, 16, 11, bytes[i] == 0xff && bytes[i + 1] == bytes[1], 8, 1);
}
!
! /* Shifting ones / 8-bit / 64-bit variants only checked
! for 'ALL' (MOVI/MVNI). */
! if (which == AARCH64_CHECK_MOV)
{
! CHECK (4, 32, 12, bytes[i] == 0xff && bytes[i + 1] == bytes[1]
! && bytes[i + 2] == 0 && bytes[i + 3] == 0, 8, 0);
!
! CHECK (4, 32, 13, bytes[i] == 0 && bytes[i + 1] == bytes[1]
! && bytes[i + 2] == 0xff && bytes[i + 3] == 0xff, 8, 1);
!
! CHECK (4, 32, 14, bytes[i] == 0xff && bytes[i + 1] == 0xff
! && bytes[i + 2] == bytes[2] && bytes[i + 3] == 0, 16, 0);
!
! CHECK (4, 32, 15, bytes[i] == 0 && bytes[i + 1] == 0
! && bytes[i + 2] == bytes[2] && bytes[i + 3] == 0xff, 16, 1);
!
! CHECK (1, 8, 16, bytes[i] == bytes[0], 0, 0);
!
! CHECK (1, 64, 17, (bytes[i] == 0 || bytes[i] == 0xff)
! && bytes[i] == bytes[(i + 8) % idx], 0, 0);
}
}
! while (0);
! if (immtype == -1)
return false;
! if (info)
{
! info->element_width = elsize;
! info->mvn = emvn != 0;
! info->shift = eshift;
!
! unsigned HOST_WIDE_INT imm = 0;
! if (immtype >= 12 && immtype <= 15)
! info->msl = true;
! /* Un-invert bytes of recognized vector, if necessary. */
! if (invmask != 0)
! for (i = 0; i < idx; i++)
! bytes[i] ^= invmask;
! if (immtype == 17)
! {
! /* FIXME: Broken on 32-bit H_W_I hosts. */
! gcc_assert (sizeof (HOST_WIDE_INT) == 8);
! for (i = 0; i < 8; i++)
! imm |= (unsigned HOST_WIDE_INT) (bytes[i] ? 0xff : 0)
! << (i * BITS_PER_UNIT);
! info->value = GEN_INT (imm);
! }
! else
{
! for (i = 0; i < elsize / BITS_PER_UNIT; i++)
! imm |= (unsigned HOST_WIDE_INT) bytes[i] << (i * BITS_PER_UNIT);
!
! /* Construct 'abcdefgh' because the assembler cannot handle
! generic constants. */
! if (info->mvn)
! imm = ~imm;
! imm = (imm >> info->shift) & 0xff;
! info->value = GEN_INT (imm);
}
}
! return true;
! #undef CHECK
}
/* Check of immediate shift constants are within range. */
--- 11736,11920 ----
}
}
! /* Return true if replicating VAL32 is a valid 2-byte or 4-byte immediate
! for the Advanced SIMD operation described by WHICH and INSN. If INFO
! is nonnull, use it to describe valid immediates. */
! static bool
! aarch64_advsimd_valid_immediate_hs (unsigned int val32,
! simd_immediate_info *info,
! enum simd_immediate_check which,
! simd_immediate_info::insn_type insn)
! {
! /* Try a 4-byte immediate with LSL. */
! for (unsigned int shift = 0; shift < 32; shift += 8)
! if ((val32 & (0xff << shift)) == val32)
! {
! if (info)
! *info = simd_immediate_info (SImode, val32 >> shift, insn,
! simd_immediate_info::LSL, shift);
! return true;
! }
! /* Try a 2-byte immediate with LSL. */
! unsigned int imm16 = val32 & 0xffff;
! if (imm16 == (val32 >> 16))
! for (unsigned int shift = 0; shift < 16; shift += 8)
! if ((imm16 & (0xff << shift)) == imm16)
! {
! if (info)
! *info = simd_immediate_info (HImode, imm16 >> shift, insn,
! simd_immediate_info::LSL, shift);
! return true;
}
! /* Try a 4-byte immediate with MSL, except for cases that MVN
! can handle. */
! if (which == AARCH64_CHECK_MOV)
! for (unsigned int shift = 8; shift < 24; shift += 8)
! {
! unsigned int low = (1 << shift) - 1;
! if (((val32 & (0xff << shift)) | low) == val32)
! {
! if (info)
! *info = simd_immediate_info (SImode, val32 >> shift, insn,
! simd_immediate_info::MSL, shift);
! return true;
! }
! }
! return false;
! }
! /* Return true if replicating VAL64 is a valid immediate for the
! Advanced SIMD operation described by WHICH. If INFO is nonnull,
! use it to describe valid immediates. */
! static bool
! aarch64_advsimd_valid_immediate (unsigned HOST_WIDE_INT val64,
! simd_immediate_info *info,
! enum simd_immediate_check which)
! {
! unsigned int val32 = val64 & 0xffffffff;
! unsigned int val16 = val64 & 0xffff;
! unsigned int val8 = val64 & 0xff;
!
! if (val32 == (val64 >> 32))
! {
! if ((which & AARCH64_CHECK_ORR) != 0
! && aarch64_advsimd_valid_immediate_hs (val32, info, which,
! simd_immediate_info::MOV))
! return true;
!
! if ((which & AARCH64_CHECK_BIC) != 0
! && aarch64_advsimd_valid_immediate_hs (~val32, info, which,
! simd_immediate_info::MVN))
! return true;
! /* Try using a replicated byte. */
! if (which == AARCH64_CHECK_MOV
! && val16 == (val32 >> 16)
! && val8 == (val16 >> 8))
{
! if (info)
! *info = simd_immediate_info (QImode, val8);
! return true;
}
}
! /* Try using a bit-to-bytemask. */
! if (which == AARCH64_CHECK_MOV)
{
! unsigned int i;
! for (i = 0; i < 64; i += 8)
{
! unsigned char byte = (val64 >> i) & 0xff;
! if (byte != 0 && byte != 0xff)
! break;
}
! if (i == 64)
{
! if (info)
! *info = simd_immediate_info (DImode, val64);
! return true;
}
}
! return false;
! }
! /* Return true if OP is a valid SIMD immediate for the operation
! described by WHICH. If INFO is nonnull, use it to describe valid
! immediates. */
! bool
! aarch64_simd_valid_immediate (rtx op, simd_immediate_info *info,
! enum simd_immediate_check which)
! {
! rtx elt = NULL;
! unsigned int n_elts;
! if (const_vec_duplicate_p (op, &elt))
! n_elts = 1;
! else if (GET_CODE (op) == CONST_VECTOR)
! n_elts = CONST_VECTOR_NUNITS (op);
! else
return false;
! machine_mode mode = GET_MODE (op);
! scalar_mode elt_mode = GET_MODE_INNER (mode);
! scalar_float_mode elt_float_mode;
! if (elt
! && is_a <scalar_float_mode> (elt_mode, &elt_float_mode)
! && (aarch64_float_const_zero_rtx_p (elt)
! || aarch64_float_const_representable_p (elt)))
{
! if (info)
! *info = simd_immediate_info (elt_float_mode, elt);
! return true;
! }
! unsigned int elt_size = GET_MODE_SIZE (elt_mode);
! if (elt_size > 8)
! return false;
! scalar_int_mode elt_int_mode = int_mode_for_mode (elt_mode).require ();
! /* Expand the vector constant out into a byte vector, with the least
! significant byte of the register first. */
! auto_vec<unsigned char, 16> bytes;
! bytes.reserve (n_elts * elt_size);
! for (unsigned int i = 0; i < n_elts; i++)
! {
! if (!elt || n_elts != 1)
! /* The vector is provided in gcc endian-neutral fashion.
! For aarch64_be, it must be laid out in the vector register
! in reverse order. */
! elt = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i);
! if (elt_mode != elt_int_mode)
! elt = gen_lowpart (elt_int_mode, elt);
+ if (!CONST_INT_P (elt))
+ return false;
! unsigned HOST_WIDE_INT elt_val = INTVAL (elt);
! for (unsigned int byte = 0; byte < elt_size; byte++)
{
! bytes.quick_push (elt_val & 0xff);
! elt_val >>= BITS_PER_UNIT;
}
}
! /* The immediate must repeat every eight bytes. */
! unsigned int nbytes = bytes.length ();
! for (unsigned i = 8; i < nbytes; ++i)
! if (bytes[i] != bytes[i - 8])
! return false;
!
! /* Get the repeating 8-byte value as an integer. No endian correction
! is needed here because bytes is already in lsb-first order. */
! unsigned HOST_WIDE_INT val64 = 0;
! for (unsigned int i = 0; i < 8; i++)
! val64 |= ((unsigned HOST_WIDE_INT) bytes[i % nbytes]
! << (i * BITS_PER_UNIT));
!
! return aarch64_advsimd_valid_immediate (val64, info, which);
}
/* Check of immediate shift constants are within range. */
*************** aarch64_simd_scalar_immediate_valid_for_
*** 11963,11969 ****
vmode = aarch64_preferred_simd_mode (mode);
rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op));
! return aarch64_simd_valid_immediate (op_v, vmode, false, NULL);
}
/* Construct and return a PARALLEL RTX vector with elements numbering the
--- 11986,11992 ----
vmode = aarch64_preferred_simd_mode (mode);
rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op));
! return aarch64_simd_valid_immediate (op_v, NULL);
}
/* Construct and return a PARALLEL RTX vector with elements numbering the
*************** aarch64_simd_make_constant (rtx vals)
*** 12201,12207 ****
gcc_unreachable ();
if (const_vec != NULL_RTX
! && aarch64_simd_valid_immediate (const_vec, mode, false, NULL))
/* Load using MOVI/MVNI. */
return const_vec;
else if ((const_dup = aarch64_simd_dup_constant (vals)) != NULL_RTX)
--- 12224,12230 ----
gcc_unreachable ();
if (const_vec != NULL_RTX
! && aarch64_simd_valid_immediate (const_vec, NULL))
/* Load using MOVI/MVNI. */
return const_vec;
else if ((const_dup = aarch64_simd_dup_constant (vals)) != NULL_RTX)
*************** aarch64_float_const_representable_p (rtx
*** 13239,13247 ****
immediate with a CONST_VECTOR of MODE and WIDTH. WHICH selects whether to
output MOVI/MVNI, ORR or BIC immediate. */
char*
! aarch64_output_simd_mov_immediate (rtx const_vector,
! machine_mode mode,
! unsigned width,
enum simd_immediate_check which)
{
bool is_valid;
--- 13262,13268 ----
immediate with a CONST_VECTOR of MODE and WIDTH. WHICH selects whether to
output MOVI/MVNI, ORR or BIC immediate. */
char*
! aarch64_output_simd_mov_immediate (rtx const_vector, unsigned width,
enum simd_immediate_check which)
{
bool is_valid;
*************** aarch64_output_simd_mov_immediate (rtx c
*** 13251,13273 ****
unsigned int lane_count = 0;
char element_char;
! struct simd_immediate_info info = { NULL_RTX, 0, 0, false, false };
/* This will return true to show const_vector is legal for use as either
a AdvSIMD MOVI instruction (or, implicitly, MVNI), ORR or BIC immediate.
It will also update INFO to show how the immediate should be generated.
WHICH selects whether to check for MOVI/MVNI, ORR or BIC. */
! is_valid = aarch64_simd_valid_immediate (const_vector, mode, false,
! &info, which);
gcc_assert (is_valid);
! element_char = sizetochar (info.element_width);
! lane_count = width / info.element_width;
! mode = GET_MODE_INNER (mode);
! if (GET_MODE_CLASS (mode) == MODE_FLOAT)
{
! gcc_assert (info.shift == 0 && ! info.mvn);
/* For FP zero change it to a CONST_INT 0 and use the integer SIMD
move immediate path. */
if (aarch64_float_const_zero_rtx_p (info.value))
--- 13272,13292 ----
unsigned int lane_count = 0;
char element_char;
! struct simd_immediate_info info;
/* This will return true to show const_vector is legal for use as either
a AdvSIMD MOVI instruction (or, implicitly, MVNI), ORR or BIC immediate.
It will also update INFO to show how the immediate should be generated.
WHICH selects whether to check for MOVI/MVNI, ORR or BIC. */
! is_valid = aarch64_simd_valid_immediate (const_vector, &info, which);
gcc_assert (is_valid);
! element_char = sizetochar (GET_MODE_BITSIZE (info.elt_mode));
! lane_count = width / GET_MODE_BITSIZE (info.elt_mode);
! if (GET_MODE_CLASS (info.elt_mode) == MODE_FLOAT)
{
! gcc_assert (info.shift == 0 && info.insn == simd_immediate_info::MOV);
/* For FP zero change it to a CONST_INT 0 and use the integer SIMD
move immediate path. */
if (aarch64_float_const_zero_rtx_p (info.value))
*************** aarch64_output_simd_mov_immediate (rtx c
*** 13278,13284 ****
char float_buf[buf_size] = {'\0'};
real_to_decimal_for_mode (float_buf,
CONST_DOUBLE_REAL_VALUE (info.value),
! buf_size, buf_size, 1, mode);
if (lane_count == 1)
snprintf (templ, sizeof (templ), "fmov\t%%d0, %s", float_buf);
--- 13297,13303 ----
char float_buf[buf_size] = {'\0'};
real_to_decimal_for_mode (float_buf,
CONST_DOUBLE_REAL_VALUE (info.value),
! buf_size, buf_size, 1, info.elt_mode);
if (lane_count == 1)
snprintf (templ, sizeof (templ), "fmov\t%%d0, %s", float_buf);
*************** aarch64_output_simd_mov_immediate (rtx c
*** 13293,13300 ****
if (which == AARCH64_CHECK_MOV)
{
! mnemonic = info.mvn ? "mvni" : "movi";
! shift_op = info.msl ? "msl" : "lsl";
if (lane_count == 1)
snprintf (templ, sizeof (templ), "%s\t%%d0, " HOST_WIDE_INT_PRINT_HEX,
mnemonic, UINTVAL (info.value));
--- 13312,13319 ----
if (which == AARCH64_CHECK_MOV)
{
! mnemonic = info.insn == simd_immediate_info::MVN ? "mvni" : "movi";
! shift_op = info.modifier == simd_immediate_info::MSL ? "msl" : "lsl";
if (lane_count == 1)
snprintf (templ, sizeof (templ), "%s\t%%d0, " HOST_WIDE_INT_PRINT_HEX,
mnemonic, UINTVAL (info.value));
*************** aarch64_output_simd_mov_immediate (rtx c
*** 13310,13316 ****
else
{
/* For AARCH64_CHECK_BIC and AARCH64_CHECK_ORR. */
! mnemonic = info.mvn ? "bic" : "orr";
if (info.shift)
snprintf (templ, sizeof (templ), "%s\t%%0.%d%c, #"
HOST_WIDE_INT_PRINT_DEC ", %s #%d", mnemonic, lane_count,
--- 13329,13335 ----
else
{
/* For AARCH64_CHECK_BIC and AARCH64_CHECK_ORR. */
! mnemonic = info.insn == simd_immediate_info::MVN ? "bic" : "orr";
if (info.shift)
snprintf (templ, sizeof (templ), "%s\t%%0.%d%c, #"
HOST_WIDE_INT_PRINT_DEC ", %s #%d", mnemonic, lane_count,
*************** aarch64_output_scalar_simd_mov_immediate
*** 13344,13350 ****
vmode = aarch64_simd_container_mode (mode, width);
rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (immediate));
! return aarch64_output_simd_mov_immediate (v_op, vmode, width);
}
/* Split operands into moves from op[1] + op[2] into op[0]. */
--- 13363,13369 ----
vmode = aarch64_simd_container_mode (mode, width);
rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (immediate));
! return aarch64_output_simd_mov_immediate (v_op, width);
}
/* Split operands into moves from op[1] + op[2] into op[0]. */
Index: gcc/testsuite/gcc.target/aarch64/vect-movi.c
===================================================================
*** gcc/testsuite/gcc.target/aarch64/vect-movi.c 2017-10-27 14:05:38.185854661 +0100
--- gcc/testsuite/gcc.target/aarch64/vect-movi.c 2017-10-27 14:11:56.995515870 +0100
*************** mvni_msl16 (int *__restrict a)
*** 45,54 ****
--- 45,65 ----
a[i] = 0xff540000;
}
+ static void
+ movi_float_lsl24 (float * a)
+ {
+ int i;
+
+ /* { dg-final { scan-assembler {\tmovi\tv[0-9]+\.[42]s, 0x43, lsl 24\n} } } */
+ for (i = 0; i < N; i++)
+ a[i] = 128.0;
+ }
+
int
main (void)
{
int a[N] = { 0 };
+ float b[N] = { 0 };
int i;
#define CHECK_ARRAY(a, val) \
*************** #define CHECK_ARRAY(a, val) \
*** 68,73 ****
--- 79,87 ----
mvni_msl16 (a);
CHECK_ARRAY (a, 0xff540000);
+ movi_float_lsl24 (b);
+ CHECK_ARRAY (b, 128.0);
+
return 0;
}
^ permalink raw reply [flat|nested] 29+ messages in thread
* [08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (6 preceding siblings ...)
2017-10-27 13:29 ` [07/nn] [AArch64] Pass number of units to aarch64_reverse_mask Richard Sandiford
@ 2017-10-27 13:29 ` Richard Sandiford
2017-11-02 9:59 ` James Greenhalgh
2017-10-27 13:30 ` [09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const) Richard Sandiford
` (3 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:29 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch passes the number of units to aarch64_simd_vect_par_cnst_half,
which avoids a to_constant () once GET_MODE_NUNITS is variable.
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_simd_vect_par_cnst_half):
Take the number of units too.
* config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Likewise.
(aarch64_simd_check_vect_par_cnst_half): Update call accordingly,
but check for a vector mode before rather than after the call.
* config/aarch64/aarch64-simd.md (aarch64_split_simd_mov<mode>)
(move_hi_quad_<mode>, vec_unpack<su>_hi_<mode>)
(vec_unpack<su>_lo_<mode, vec_widen_<su>mult_lo_<mode>)
(vec_widen_<su>mult_hi_<mode>, vec_unpacks_lo_<mode>)
(vec_unpacks_hi_<mode>, aarch64_saddl2<mode>, aarch64_uaddl2<mode>)
(aarch64_ssubl2<mode>, aarch64_usubl2<mode>, widen_ssum<mode>3)
(widen_usum<mode>3, aarch64_saddw2<mode>, aarch64_uaddw2<mode>)
(aarch64_ssubw2<mode>, aarch64_usubw2<mode>, aarch64_sqdmlal2<mode>)
(aarch64_sqdmlsl2<mode>, aarch64_sqdmlal2_lane<mode>)
(aarch64_sqdmlal2_laneq<mode>, aarch64_sqdmlsl2_lane<mode>)
(aarch64_sqdmlsl2_laneq<mode>, aarch64_sqdmlal2_n<mode>)
(aarch64_sqdmlsl2_n<mode>, aarch64_sqdmull2<mode>)
(aarch64_sqdmull2_lane<mode>, aarch64_sqdmull2_laneq<mode>)
(aarch64_sqdmull2_n<mode>): Update accordingly.
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:04.192082112 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:07.203885483 +0100
@@ -403,7 +403,7 @@ const char *aarch64_output_move_struct (
rtx aarch64_return_addr (int, rtx);
rtx aarch64_simd_gen_const_vector_dup (machine_mode, HOST_WIDE_INT);
bool aarch64_simd_mem_operand_p (rtx);
-rtx aarch64_simd_vect_par_cnst_half (machine_mode, bool);
+rtx aarch64_simd_vect_par_cnst_half (machine_mode, int, bool);
rtx aarch64_tls_get_addr (void);
tree aarch64_fold_builtin (tree, int, tree *, bool);
unsigned aarch64_dbx_register_number (unsigned);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:12:04.193939530 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:07.205742901 +0100
@@ -12007,12 +12007,12 @@ aarch64_simd_scalar_immediate_valid_for_
Low Mask: { 2, 3 } { 0, 1 }
High Mask: { 0, 1 } { 2, 3 }
-*/
+
+ MODE Is the mode of the vector and NUNITS is the number of units in it. */
rtx
-aarch64_simd_vect_par_cnst_half (machine_mode mode, bool high)
+aarch64_simd_vect_par_cnst_half (machine_mode mode, int nunits, bool high)
{
- int nunits = GET_MODE_NUNITS (mode);
rtvec v = rtvec_alloc (nunits / 2);
int high_base = nunits / 2;
int low_base = 0;
@@ -12041,14 +12041,15 @@ aarch64_simd_vect_par_cnst_half (machine
aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
bool high)
{
- rtx ideal = aarch64_simd_vect_par_cnst_half (mode, high);
+ if (!VECTOR_MODE_P (mode))
+ return false;
+
+ rtx ideal = aarch64_simd_vect_par_cnst_half (mode, GET_MODE_NUNITS (mode),
+ high);
HOST_WIDE_INT count_op = XVECLEN (op, 0);
HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
int i = 0;
- if (!VECTOR_MODE_P (mode))
- return false;
-
if (count_op != count_ideal)
return false;
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:04.193010821 +0100
+++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:07.203885483 +0100
@@ -252,8 +252,8 @@ (define_expand "aarch64_split_simd_mov<m
{
rtx dst_low_part = gen_lowpart (<VHALF>mode, dst);
rtx dst_high_part = gen_highpart (<VHALF>mode, dst);
- rtx lo = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
- rtx hi = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx lo = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
+ rtx hi = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn
(gen_aarch64_simd_mov_from_<mode>low (dst_low_part, src, lo));
@@ -1436,7 +1436,7 @@ (define_expand "move_hi_quad_<mode>"
(match_operand:<VHALF> 1 "register_operand" "")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
if (BYTES_BIG_ENDIAN)
emit_insn (gen_aarch64_simd_move_hi_quad_be_<mode> (operands[0],
operands[1], p));
@@ -1520,7 +1520,7 @@ (define_expand "vec_unpack<su>_hi_<mode>
(ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand"))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_simd_vec_unpack<su>_hi_<mode> (operands[0],
operands[1], p));
DONE;
@@ -1532,7 +1532,7 @@ (define_expand "vec_unpack<su>_lo_<mode>
(ANY_EXTEND:<VWIDE> (match_operand:VQW 1 "register_operand" ""))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
emit_insn (gen_aarch64_simd_vec_unpack<su>_lo_<mode> (operands[0],
operands[1], p));
DONE;
@@ -1652,7 +1652,7 @@ (define_expand "vec_widen_<su>mult_lo_<m
(ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand" ""))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
emit_insn (gen_aarch64_simd_vec_<su>mult_lo_<mode> (operands[0],
operands[1],
operands[2], p));
@@ -1679,7 +1679,7 @@ (define_expand "vec_widen_<su>mult_hi_<m
(ANY_EXTEND:<VWIDE> (match_operand:VQW 2 "register_operand" ""))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_simd_vec_<su>mult_hi_<mode> (operands[0],
operands[1],
operands[2], p));
@@ -2083,7 +2083,7 @@ (define_expand "vec_unpacks_lo_<mode>"
(match_operand:VQ_HSF 1 "register_operand" "")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
emit_insn (gen_aarch64_simd_vec_unpacks_lo_<mode> (operands[0],
operands[1], p));
DONE;
@@ -2106,7 +2106,7 @@ (define_expand "vec_unpacks_hi_<mode>"
(match_operand:VQ_HSF 1 "register_operand" "")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_simd_vec_unpacks_lo_<mode> (operands[0],
operands[1], p));
DONE;
@@ -3027,7 +3027,7 @@ (define_expand "aarch64_saddl2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_saddl<mode>_hi_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3039,7 +3039,7 @@ (define_expand "aarch64_uaddl2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_uaddl<mode>_hi_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3051,7 +3051,7 @@ (define_expand "aarch64_ssubl2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_ssubl<mode>_hi_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3063,7 +3063,7 @@ (define_expand "aarch64_usubl2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_usubl<mode>_hi_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3089,7 +3089,7 @@ (define_expand "widen_ssum<mode>3"
(match_operand:<VDBLW> 2 "register_operand" "")))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
emit_insn (gen_aarch64_saddw<mode>_internal (temp, operands[2],
@@ -3117,7 +3117,7 @@ (define_expand "widen_usum<mode>3"
(match_operand:<VDBLW> 2 "register_operand" "")))]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, false);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, false);
rtx temp = gen_reg_rtx (GET_MODE (operands[0]));
emit_insn (gen_aarch64_uaddw<mode>_internal (temp, operands[2],
@@ -3178,7 +3178,7 @@ (define_expand "aarch64_saddw2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_saddw2<mode>_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3190,7 +3190,7 @@ (define_expand "aarch64_uaddw2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_uaddw2<mode>_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3203,7 +3203,7 @@ (define_expand "aarch64_ssubw2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_ssubw2<mode>_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3215,7 +3215,7 @@ (define_expand "aarch64_usubw2<mode>"
(match_operand:VQW 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_usubw2<mode>_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -3735,7 +3735,7 @@ (define_expand "aarch64_sqdmlal2<mode>"
(match_operand:VQ_HSI 3 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlal2<mode>_internal (operands[0], operands[1],
operands[2], operands[3], p));
DONE;
@@ -3748,7 +3748,7 @@ (define_expand "aarch64_sqdmlsl2<mode>"
(match_operand:VQ_HSI 3 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlsl2<mode>_internal (operands[0], operands[1],
operands[2], operands[3], p));
DONE;
@@ -3816,7 +3816,7 @@ (define_expand "aarch64_sqdmlal2_lane<mo
(match_operand:SI 4 "immediate_operand" "i")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlal2_lane<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
operands[4], p));
@@ -3831,7 +3831,7 @@ (define_expand "aarch64_sqdmlal2_laneq<m
(match_operand:SI 4 "immediate_operand" "i")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlal2_laneq<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
operands[4], p));
@@ -3846,7 +3846,7 @@ (define_expand "aarch64_sqdmlsl2_lane<mo
(match_operand:SI 4 "immediate_operand" "i")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlsl2_lane<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
operands[4], p));
@@ -3861,7 +3861,7 @@ (define_expand "aarch64_sqdmlsl2_laneq<m
(match_operand:SI 4 "immediate_operand" "i")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlsl2_laneq<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
operands[4], p));
@@ -3894,7 +3894,7 @@ (define_expand "aarch64_sqdmlal2_n<mode>
(match_operand:<VEL> 3 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlal2_n<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
p));
@@ -3908,7 +3908,7 @@ (define_expand "aarch64_sqdmlsl2_n<mode>
(match_operand:<VEL> 3 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmlsl2_n<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
p));
@@ -4062,7 +4062,7 @@ (define_expand "aarch64_sqdmull2<mode>"
(match_operand:VQ_HSI 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmull2<mode>_internal (operands[0], operands[1],
operands[2], p));
DONE;
@@ -4123,7 +4123,7 @@ (define_expand "aarch64_sqdmull2_lane<mo
(match_operand:SI 3 "immediate_operand" "i")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmull2_lane<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
p));
@@ -4137,7 +4137,7 @@ (define_expand "aarch64_sqdmull2_laneq<m
(match_operand:SI 3 "immediate_operand" "i")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmull2_laneq<mode>_internal (operands[0], operands[1],
operands[2], operands[3],
p));
@@ -4170,7 +4170,7 @@ (define_expand "aarch64_sqdmull2_n<mode>
(match_operand:<VEL> 2 "register_operand" "w")]
"TARGET_SIMD"
{
- rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, true);
+ rtx p = aarch64_simd_vect_par_cnst_half (<MODE>mode, <nunits>, true);
emit_insn (gen_aarch64_sqdmull2_n<mode>_internal (operands[0], operands[1],
operands[2], p));
DONE;
^ permalink raw reply [flat|nested] 29+ messages in thread
* [07/nn] [AArch64] Pass number of units to aarch64_reverse_mask
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (5 preceding siblings ...)
2017-10-27 13:28 ` [06/nn] [AArch64] Add an endian_lane_rtx helper routine Richard Sandiford
@ 2017-10-27 13:29 ` Richard Sandiford
2017-11-02 9:56 ` James Greenhalgh
2017-10-27 13:29 ` [08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half Richard Sandiford
` (4 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:29 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch passes the number of units to aarch64_reverse_mask,
which avoids a to_constant () once GET_MODE_NUNITS is variable.
2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_reverse_mask): Take
the number of units too.
* config/aarch64/aarch64.c (aarch64_reverse_mask): Likewise.
* config/aarch64/aarch64-simd.md (vec_load_lanesoi<mode>)
(vec_store_lanesoi<mode>, vec_load_lanesci<mode>)
(vec_store_lanesci<mode>, vec_load_lanesxi<mode>)
(vec_store_lanesxi<mode>): Update accordingly.
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:00.601693018 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:04.192082112 +0100
@@ -365,7 +365,7 @@ bool aarch64_mask_and_shift_for_ubfiz_p
bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
bool aarch64_mov_operand_p (rtx, machine_mode);
-rtx aarch64_reverse_mask (machine_mode);
+rtx aarch64_reverse_mask (machine_mode, unsigned int);
bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
char *aarch64_output_simd_mov_immediate (rtx, unsigned,
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:12:00.603550436 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:04.193939530 +0100
@@ -13945,16 +13945,18 @@ aarch64_vectorize_vec_perm_const_ok (mac
return ret;
}
+/* Generate a byte permute mask for a register of mode MODE,
+ which has NUNITS units. */
+
rtx
-aarch64_reverse_mask (machine_mode mode)
+aarch64_reverse_mask (machine_mode mode, unsigned int nunits)
{
/* We have to reverse each vector because we dont have
a permuted load that can reverse-load according to ABI rules. */
rtx mask;
rtvec v = rtvec_alloc (16);
- int i, j;
- int nunits = GET_MODE_NUNITS (mode);
- int usize = GET_MODE_UNIT_SIZE (mode);
+ unsigned int i, j;
+ unsigned int usize = GET_MODE_UNIT_SIZE (mode);
gcc_assert (BYTES_BIG_ENDIAN);
gcc_assert (AARCH64_VALID_SIMD_QREG_MODE (mode));
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:00.602621727 +0100
+++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:04.193010821 +0100
@@ -4632,7 +4632,7 @@ (define_expand "vec_load_lanesoi<mode>"
if (BYTES_BIG_ENDIAN)
{
rtx tmp = gen_reg_rtx (OImode);
- rtx mask = aarch64_reverse_mask (<MODE>mode);
+ rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
emit_insn (gen_aarch64_simd_ld2<mode> (tmp, operands[1]));
emit_insn (gen_aarch64_rev_reglistoi (operands[0], tmp, mask));
}
@@ -4676,7 +4676,7 @@ (define_expand "vec_store_lanesoi<mode>"
if (BYTES_BIG_ENDIAN)
{
rtx tmp = gen_reg_rtx (OImode);
- rtx mask = aarch64_reverse_mask (<MODE>mode);
+ rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
emit_insn (gen_aarch64_rev_reglistoi (tmp, operands[1], mask));
emit_insn (gen_aarch64_simd_st2<mode> (operands[0], tmp));
}
@@ -4730,7 +4730,7 @@ (define_expand "vec_load_lanesci<mode>"
if (BYTES_BIG_ENDIAN)
{
rtx tmp = gen_reg_rtx (CImode);
- rtx mask = aarch64_reverse_mask (<MODE>mode);
+ rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
emit_insn (gen_aarch64_simd_ld3<mode> (tmp, operands[1]));
emit_insn (gen_aarch64_rev_reglistci (operands[0], tmp, mask));
}
@@ -4774,7 +4774,7 @@ (define_expand "vec_store_lanesci<mode>"
if (BYTES_BIG_ENDIAN)
{
rtx tmp = gen_reg_rtx (CImode);
- rtx mask = aarch64_reverse_mask (<MODE>mode);
+ rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
emit_insn (gen_aarch64_rev_reglistci (tmp, operands[1], mask));
emit_insn (gen_aarch64_simd_st3<mode> (operands[0], tmp));
}
@@ -4828,7 +4828,7 @@ (define_expand "vec_load_lanesxi<mode>"
if (BYTES_BIG_ENDIAN)
{
rtx tmp = gen_reg_rtx (XImode);
- rtx mask = aarch64_reverse_mask (<MODE>mode);
+ rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
emit_insn (gen_aarch64_simd_ld4<mode> (tmp, operands[1]));
emit_insn (gen_aarch64_rev_reglistxi (operands[0], tmp, mask));
}
@@ -4872,7 +4872,7 @@ (define_expand "vec_store_lanesxi<mode>"
if (BYTES_BIG_ENDIAN)
{
rtx tmp = gen_reg_rtx (XImode);
- rtx mask = aarch64_reverse_mask (<MODE>mode);
+ rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
emit_insn (gen_aarch64_rev_reglistxi (tmp, operands[1], mask));
emit_insn (gen_aarch64_simd_st4<mode> (operands[0], tmp));
}
^ permalink raw reply [flat|nested] 29+ messages in thread
* [09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const)
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (7 preceding siblings ...)
2017-10-27 13:29 ` [08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half Richard Sandiford
@ 2017-10-27 13:30 ` Richard Sandiford
2017-11-02 10:00 ` James Greenhalgh
2017-10-27 13:31 ` [10/nn] [AArch64] Minor rtx costs tweak Richard Sandiford
` (2 subsequent siblings)
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:30 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch passes the number of units to aarch64_expand_vec_perm
and aarch64_expand_vec_perm_const, which avoids a to_constant ()
once GET_MODE_NUNITS is variable.
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm)
(aarch64_expand_vec_perm_const): Take the number of units too.
* config/aarch64/aarch64.c (aarch64_expand_vec_perm)
(aarch64_expand_vec_perm_const): Likewise.
* config/aarch64/aarch64-simd.md (vec_perm_const<mode>)
(vec_perm<mode>): Update accordingly.
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:07.203885483 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:11.042239887 +0100
@@ -484,11 +484,11 @@ tree aarch64_builtin_rsqrt (unsigned int
tree aarch64_builtin_vectorized_function (unsigned int, tree, tree);
extern void aarch64_split_combinev16qi (rtx operands[3]);
-extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel);
+extern void aarch64_expand_vec_perm (rtx, rtx, rtx, rtx, unsigned int);
extern bool aarch64_madd_needs_nop (rtx_insn *);
extern void aarch64_final_prescan_insn (rtx_insn *);
extern bool
-aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel);
+aarch64_expand_vec_perm_const (rtx, rtx, rtx, rtx, unsigned int);
void aarch64_atomic_assign_expand_fenv (tree *, tree *, tree *);
int aarch64_ccmp_mode_to_code (machine_mode mode);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:12:07.205742901 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:11.045026014 +0100
@@ -13488,11 +13488,14 @@ aarch64_expand_vec_perm_1 (rtx target, r
}
}
+/* Expand a vec_perm with the operands given by TARGET, OP0, OP1 and SEL.
+ NELT is the number of elements in the vector. */
+
void
-aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel)
+aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel,
+ unsigned int nelt)
{
machine_mode vmode = GET_MODE (target);
- unsigned int nelt = GET_MODE_NUNITS (vmode);
bool one_vector_p = rtx_equal_p (op0, op1);
rtx mask;
@@ -13848,13 +13851,15 @@ aarch64_expand_vec_perm_const_1 (struct
return false;
}
-/* Expand a vec_perm_const pattern. */
+/* Expand a vec_perm_const pattern with the operands given by TARGET,
+ OP0, OP1 and SEL. NELT is the number of elements in the vector. */
bool
-aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel)
+aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel,
+ unsigned int nelt)
{
struct expand_vec_perm_d d;
- int i, nelt, which;
+ unsigned int i, which;
d.target = target;
d.op0 = op0;
@@ -13864,12 +13869,11 @@ aarch64_expand_vec_perm_const (rtx targe
gcc_assert (VECTOR_MODE_P (d.vmode));
d.testing_p = false;
- nelt = GET_MODE_NUNITS (d.vmode);
d.perm.reserve (nelt);
for (i = which = 0; i < nelt; ++i)
{
rtx e = XVECEXP (sel, 0, i);
- int ei = INTVAL (e) & (2 * nelt - 1);
+ unsigned int ei = INTVAL (e) & (2 * nelt - 1);
which |= (ei < nelt ? 1 : 2);
d.perm.quick_push (ei);
}
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:07.203885483 +0100
+++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:11.043168596 +0100
@@ -5238,7 +5238,7 @@ (define_expand "vec_perm_const<mode>"
"TARGET_SIMD"
{
if (aarch64_expand_vec_perm_const (operands[0], operands[1],
- operands[2], operands[3]))
+ operands[2], operands[3], <nunits>))
DONE;
else
FAIL;
@@ -5252,7 +5252,7 @@ (define_expand "vec_perm<mode>"
"TARGET_SIMD"
{
aarch64_expand_vec_perm (operands[0], operands[1],
- operands[2], operands[3]);
+ operands[2], operands[3], <nunits>);
DONE;
})
^ permalink raw reply [flat|nested] 29+ messages in thread
* [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (9 preceding siblings ...)
2017-10-27 13:31 ` [10/nn] [AArch64] Minor rtx costs tweak Richard Sandiford
@ 2017-10-27 13:31 ` Richard Sandiford
2018-01-05 11:27 ` PING: " Richard Sandiford
2017-10-27 13:37 ` [12/nn] [AArch64] Add const_offset field to aarch64_address_info Richard Sandiford
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:31 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch switches the AArch64 port to use 2 poly_int coefficients
and updates code as necessary to keep it compiling.
One potentially-significant change is to
aarch64_hard_regno_caller_save_mode. The old implementation
was written in a pretty conservative way: it changed the default
behaviour for single-register values, but used the default handling
for multi-register values.
I don't think that's necessary, since the interesting cases for this
macro are usually the single-register ones. Multi-register modes take
up the whole of the constituent registers and the move patterns for all
multi-register modes should be equally good.
Using the original mode for multi-register cases stops us from using
SVE modes to spill multi-register NEON values. This was caught by
gcc.c-torture/execute/pr47538.c.
Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
(which are all scalars), and I think it's more obvious, since if we ever
do use this for elementwise shifts of vector modes, the mask will depend
on the number of bits in each element rather than the number of bits in
the whole vector.
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
Return a poly_int64 rather than a HOST_WIDE_INT.
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* config/aarch64/aarch64.h (aarch64_frame): Protect with
HAVE_POLY_INT_H rather than HOST_WIDE_INT. Change locals_offset,
hard_fp_offset, frame_size, initial_adjust, callee_offset and
final_offset from HOST_WIDE_INT to poly_int64.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
to_constant when getting the number of units in an Advanced SIMD
mode.
(aarch64_builtin_vectorized_function): Check for a constant number
of units.
* config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial
GET_MODE_SIZE.
(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits
attribute instead of GET_MODE_NUNITS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_class_max_nregs): Use the constant_lowest_bound of the
GET_MODE_SIZE for fixed-size registers.
(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
(aarch64_print_operand_address, aarch64_address_cost)
(aarch64_register_move_cost, aarch64_short_vector_p)
(aapcs_vfp_sub_candidate, aarch64_simd_attr_length_rglist)
(aarch64_operands_ok_for_ldpstp): Handle polynomial GET_MODE_SIZE.
(aarch64_hard_regno_caller_save_mode): Likewise. Return modes
wider than SImode without modification.
(tls_symbolic_operand_type): Use strip_offset instead of split_const.
(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
passing and returning SVE modes.
(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
rather than GEN_INT.
(aarch64_emit_probe_stack_range): Take the size as a poly_int64
rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_layout_frame): Cope with polynomial offsets.
(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
start_offset as a poly_int64 rather than a HOST_WIDE_INT. Track
polynomial offsets.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(aarch64_get_separate_components, aarch64_process_components)
(aarch64_expand_prologue, aarch64_expand_epilogue)
(aarch64_use_return_insn_p): Handle polynomial frame offsets.
(aarch64_anchor_offset): New function, split out from...
(aarch64_legitimize_address): ...here.
(aarch64_builtin_vectorization_cost): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
GET_MODE_NUNITS.
(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
number of elements from the PARALLEL rather than the mode.
(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
rather than GET_MODE_BITSIZE.
(aarch64_evpc_tbl): Use nelt rather than GET_MODE_NUNITS.
(aarch64_move_pointer): Take amount as a poly_int64 rather
than an int.
(aarch64_progress_pointer): Avoid temporary variable.
* config/aarch64/aarch64.md (aarch64_<crc_variant>): Use
the mode attribute instead of GET_MODE.
Index: gcc/config/aarch64/aarch64-modes.def
===================================================================
--- gcc/config/aarch64/aarch64-modes.def 2017-10-27 13:55:34.246963419 +0100
+++ gcc/config/aarch64/aarch64-modes.def 2017-10-27 14:12:17.397395751 +0100
@@ -46,3 +46,7 @@ INT_MODE (XI, 64);
/* Quad float: 128-bit floating mode for long doubles. */
FLOAT_MODE (TF, 16, ieee_quad_format);
+
+/* Coefficient 1 is multiplied by the number of 128-bit chunks in an
+ SVE vector (referred to as "VQ") minus one. */
+#define NUM_POLY_INT_COEFFS 2
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:11.042239887 +0100
+++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:17.398324460 +0100
@@ -333,7 +333,7 @@ enum simd_immediate_check {
extern struct tune_params aarch64_tune_params;
-HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
+poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
int aarch64_get_condition_code (rtx);
bool aarch64_address_valid_for_prefetch_p (rtx, bool);
bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
@@ -366,7 +366,7 @@ bool aarch64_zero_extend_const_eq (machi
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
bool aarch64_mov_operand_p (rtx, machine_mode);
rtx aarch64_reverse_mask (machine_mode, unsigned int);
-bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
+bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
Index: gcc/config/aarch64/aarch64.h
===================================================================
--- gcc/config/aarch64/aarch64.h 2017-10-27 14:12:00.603550436 +0100
+++ gcc/config/aarch64/aarch64.h 2017-10-27 14:12:17.402039296 +0100
@@ -551,7 +551,7 @@ #define LIBCALL_VALUE(MODE) \
#define DEFAULT_PCC_STRUCT_RETURN 0
-#ifdef HOST_WIDE_INT
+#ifdef HAVE_POLY_INT_H
struct GTY (()) aarch64_frame
{
HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER];
@@ -569,20 +569,20 @@ struct GTY (()) aarch64_frame
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
STACK_BOUNDARY. */
- HOST_WIDE_INT locals_offset;
+ poly_int64 locals_offset;
/* Offset from the base of the frame (incomming SP) to the
hard_frame_pointer. This value is always a multiple of
STACK_BOUNDARY. */
- HOST_WIDE_INT hard_fp_offset;
+ poly_int64 hard_fp_offset;
/* The size of the frame. This value is the offset from base of the
- * frame (incomming SP) to the stack_pointer. This value is always
- * a multiple of STACK_BOUNDARY. */
- HOST_WIDE_INT frame_size;
+ frame (incomming SP) to the stack_pointer. This value is always
+ a multiple of STACK_BOUNDARY. */
+ poly_int64 frame_size;
/* The size of the initial stack adjustment before saving callee-saves. */
- HOST_WIDE_INT initial_adjust;
+ poly_int64 initial_adjust;
/* The writeback value when pushing callee-save registers.
It is zero when no push is used. */
@@ -590,10 +590,10 @@ struct GTY (()) aarch64_frame
/* The offset from SP to the callee-save registers after initial_adjust.
It may be non-zero if no push is used (ie. callee_adjust == 0). */
- HOST_WIDE_INT callee_offset;
+ poly_int64 callee_offset;
/* The size of the stack adjustment after saving callee-saves. */
- HOST_WIDE_INT final_adjust;
+ poly_int64 final_adjust;
/* Store FP,LR and setup a frame pointer. */
bool emit_frame_chain;
Index: gcc/config/aarch64/aarch64-builtins.c
===================================================================
--- gcc/config/aarch64/aarch64-builtins.c 2017-10-27 14:12:00.601693018 +0100
+++ gcc/config/aarch64/aarch64-builtins.c 2017-10-27 14:12:17.397395751 +0100
@@ -1065,9 +1065,9 @@ aarch64_simd_expand_args (rtx target, in
gcc_assert (opc > 1);
if (CONST_INT_P (op[opc]))
{
- aarch64_simd_lane_bounds (op[opc], 0,
- GET_MODE_NUNITS (builtin_mode),
- exp);
+ unsigned int nunits
+ = GET_MODE_NUNITS (builtin_mode).to_constant ();
+ aarch64_simd_lane_bounds (op[opc], 0, nunits, exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
op[opc] = aarch64_endian_lane_rtx (builtin_mode,
INTVAL (op[opc]));
@@ -1080,8 +1080,9 @@ aarch64_simd_expand_args (rtx target, in
if (CONST_INT_P (op[opc]))
{
machine_mode vmode = insn_data[icode].operand[opc - 1].mode;
- aarch64_simd_lane_bounds (op[opc],
- 0, GET_MODE_NUNITS (vmode), exp);
+ unsigned int nunits
+ = GET_MODE_NUNITS (vmode).to_constant ();
+ aarch64_simd_lane_bounds (op[opc], 0, nunits, exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
}
@@ -1400,16 +1401,17 @@ aarch64_builtin_vectorized_function (uns
tree type_in)
{
machine_mode in_mode, out_mode;
- int in_n, out_n;
+ unsigned HOST_WIDE_INT in_n, out_n;
if (TREE_CODE (type_out) != VECTOR_TYPE
|| TREE_CODE (type_in) != VECTOR_TYPE)
return NULL_TREE;
out_mode = TYPE_MODE (TREE_TYPE (type_out));
- out_n = TYPE_VECTOR_SUBPARTS (type_out);
in_mode = TYPE_MODE (TREE_TYPE (type_in));
- in_n = TYPE_VECTOR_SUBPARTS (type_in);
+ if (!TYPE_VECTOR_SUBPARTS (type_out).is_constant (&out_n)
+ || !TYPE_VECTOR_SUBPARTS (type_in).is_constant (&in_n))
+ return NULL_TREE;
#undef AARCH64_CHECK_BUILTIN_MODE
#define AARCH64_CHECK_BUILTIN_MODE(C, N) 1
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:11.043168596 +0100
+++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:17.398324460 +0100
@@ -31,9 +31,9 @@ (define_expand "mov<mode>"
normal str, so the check need not apply. */
if (GET_CODE (operands[0]) == MEM
&& !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
- && ((GET_MODE_SIZE (<MODE>mode) == 16
+ && ((must_eq (GET_MODE_SIZE (<MODE>mode), 16)
&& aarch64_mem_pair_operand (operands[0], DImode))
- || GET_MODE_SIZE (<MODE>mode) == 8)))
+ || must_eq (GET_MODE_SIZE (<MODE>mode), 8))))
operands[1] = force_reg (<MODE>mode, operands[1]);
"
)
@@ -5180,9 +5180,7 @@ (define_expand "aarch64_ld<VSTRUCT:nregs
set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (<VALLDIF:MODE>mode))
* <VSTRUCT:nregs>);
- aarch64_simd_lane_bounds (operands[3], 0,
- GET_MODE_NUNITS (<VALLDIF:MODE>mode),
- NULL);
+ aarch64_simd_lane_bounds (operands[3], 0, <VALLDIF:nunits>, NULL);
emit_insn (gen_aarch64_vec_load_lanes<VSTRUCT:mode>_lane<VALLDIF:mode> (
operands[0], mem, operands[2], operands[3]));
DONE;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:12:14.533257115 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:13:59.548121066 +0100
@@ -1112,13 +1112,18 @@ aarch64_array_mode_supported_p (machine_
static unsigned int
aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
{
+ /* ??? Logically we should only need to provide a value when
+ HARD_REGNO_MODE_OK says that the combination is valid,
+ but at the moment we need to handle all modes. Just ignore
+ any runtime parts for registers that can't store them. */
+ HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
switch (aarch64_regno_regclass (regno))
{
case FP_REGS:
case FP_LO_REGS:
- return (GET_MODE_SIZE (mode) + UNITS_PER_VREG - 1) / UNITS_PER_VREG;
+ return CEIL (lowest_size, UNITS_PER_VREG);
default:
- return (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ return CEIL (lowest_size, UNITS_PER_WORD);
}
gcc_unreachable ();
}
@@ -1161,25 +1166,17 @@ aarch64_hard_regno_mode_ok (unsigned reg
static bool
aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
{
- return FP_REGNUM_P (regno) && GET_MODE_SIZE (mode) > 8;
+ return FP_REGNUM_P (regno) && may_gt (GET_MODE_SIZE (mode), 8);
}
/* Implement HARD_REGNO_CALLER_SAVE_MODE. */
machine_mode
-aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned nregs,
- machine_mode mode)
+aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode mode)
{
- /* Handle modes that fit within single registers. */
- if (nregs == 1 && GET_MODE_SIZE (mode) <= 16)
- {
- if (GET_MODE_SIZE (mode) >= 4)
- return mode;
- else
- return SImode;
- }
- /* Fall back to generic for multi-reg and very large modes. */
+ if (must_ge (GET_MODE_SIZE (mode), 4))
+ return mode;
else
- return choose_hard_reg_mode (regno, nregs, false);
+ return SImode;
}
/* Implement TARGET_CONSTANT_ALIGNMENT. Make strings word-aligned so
@@ -1292,11 +1289,10 @@ aarch64_tls_get_addr (void)
tls_symbolic_operand_type (rtx addr)
{
enum tls_model tls_kind = TLS_MODEL_NONE;
- rtx sym, addend;
-
if (GET_CODE (addr) == CONST)
{
- split_const (addr, &sym, &addend);
+ poly_int64 addend;
+ rtx sym = strip_offset (addr, &addend);
if (GET_CODE (sym) == SYMBOL_REF)
tls_kind = SYMBOL_REF_TLS_MODEL (sym);
}
@@ -2235,8 +2231,12 @@ aarch64_pass_by_reference (cumulative_ar
int nregs;
/* GET_MODE_SIZE (BLKmode) is useless since it is 0. */
- size = (mode == BLKmode && type)
- ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode);
+ if (mode == BLKmode && type)
+ size = int_size_in_bytes (type);
+ else
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ size = GET_MODE_SIZE (mode).to_constant ();
/* Aggregates are passed by reference based on their size. */
if (type && AGGREGATE_TYPE_P (type))
@@ -2333,8 +2333,8 @@ aarch64_function_value (const_tree type,
for (i = 0; i < count; i++)
{
rtx tmp = gen_rtx_REG (ag_mode, V0_REGNUM + i);
- tmp = gen_rtx_EXPR_LIST (VOIDmode, tmp,
- GEN_INT (i * GET_MODE_SIZE (ag_mode)));
+ rtx offset = gen_int_mode (i * GET_MODE_SIZE (ag_mode), Pmode);
+ tmp = gen_rtx_EXPR_LIST (VOIDmode, tmp, offset);
XVECEXP (par, 0, i) = tmp;
}
return par;
@@ -2461,9 +2461,13 @@ aarch64_layout_arg (cumulative_args_t pc
pcum->aapcs_arg_processed = true;
/* Size in bytes, rounded to the nearest multiple of 8 bytes. */
- size
- = ROUND_UP (type ? int_size_in_bytes (type) : GET_MODE_SIZE (mode),
- UNITS_PER_WORD);
+ if (type)
+ size = int_size_in_bytes (type);
+ else
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ size = GET_MODE_SIZE (mode).to_constant ();
+ size = ROUND_UP (size, UNITS_PER_WORD);
allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
allocate_nvrn = aarch64_vfp_is_call_candidate (pcum_v,
@@ -2500,9 +2504,9 @@ aarch64_layout_arg (cumulative_args_t pc
{
rtx tmp = gen_rtx_REG (pcum->aapcs_vfp_rmode,
V0_REGNUM + nvrn + i);
- tmp = gen_rtx_EXPR_LIST
- (VOIDmode, tmp,
- GEN_INT (i * GET_MODE_SIZE (pcum->aapcs_vfp_rmode)));
+ rtx offset = gen_int_mode
+ (i * GET_MODE_SIZE (pcum->aapcs_vfp_rmode), Pmode);
+ tmp = gen_rtx_EXPR_LIST (VOIDmode, tmp, offset);
XVECEXP (par, 0, i) = tmp;
}
pcum->aapcs_reg = par;
@@ -2727,8 +2731,13 @@ aarch64_pad_reg_upward (machine_mode mod
/* Small composite types are always padded upward. */
if (BYTES_BIG_ENDIAN && aarch64_composite_type_p (type, mode))
{
- HOST_WIDE_INT size = (type ? int_size_in_bytes (type)
- : GET_MODE_SIZE (mode));
+ HOST_WIDE_INT size;
+ if (type)
+ size = int_size_in_bytes (type);
+ else
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ size = GET_MODE_SIZE (mode).to_constant ();
if (size < 2 * UNITS_PER_WORD)
return true;
}
@@ -2757,12 +2766,19 @@ #define ARITH_FACTOR 4096
#define PROBE_STACK_FIRST_REG 9
#define PROBE_STACK_SECOND_REG 10
-/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
+/* Emit code to probe a range of stack addresses from FIRST to FIRST+POLY_SIZE,
inclusive. These are offsets from the current stack pointer. */
static void
-aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
+aarch64_emit_probe_stack_range (HOST_WIDE_INT first, poly_int64 poly_size)
{
+ HOST_WIDE_INT size;
+ if (!poly_size.is_constant (&size))
+ {
+ sorry ("stack probes for SVE frames");
+ return;
+ }
+
rtx reg1 = gen_rtx_REG (Pmode, PROBE_STACK_FIRST_REG);
/* See the same assertion on PROBE_INTERVAL above. */
@@ -3055,13 +3071,16 @@ #define SLOT_REQUIRED (-1)
= offset + cfun->machine->frame.saved_varargs_size;
cfun->machine->frame.hard_fp_offset
- = ROUND_UP (varargs_and_saved_regs_size + get_frame_size (),
- STACK_BOUNDARY / BITS_PER_UNIT);
+ = aligned_upper_bound (varargs_and_saved_regs_size
+ + get_frame_size (),
+ STACK_BOUNDARY / BITS_PER_UNIT);
+ /* Both these values are already aligned. */
+ gcc_assert (multiple_p (crtl->outgoing_args_size,
+ STACK_BOUNDARY / BITS_PER_UNIT));
cfun->machine->frame.frame_size
- = ROUND_UP (cfun->machine->frame.hard_fp_offset
- + crtl->outgoing_args_size,
- STACK_BOUNDARY / BITS_PER_UNIT);
+ = (cfun->machine->frame.hard_fp_offset
+ + crtl->outgoing_args_size);
cfun->machine->frame.locals_offset = cfun->machine->frame.saved_varargs_size;
@@ -3076,18 +3095,21 @@ #define SLOT_REQUIRED (-1)
else if (cfun->machine->frame.wb_candidate1 != INVALID_REGNUM)
max_push_offset = 256;
- if (cfun->machine->frame.frame_size < max_push_offset
- && crtl->outgoing_args_size == 0)
+ HOST_WIDE_INT const_size, const_fp_offset;
+ if (cfun->machine->frame.frame_size.is_constant (&const_size)
+ && const_size < max_push_offset
+ && must_eq (crtl->outgoing_args_size, 0))
{
/* Simple, small frame with no outgoing arguments:
stp reg1, reg2, [sp, -frame_size]!
stp reg3, reg4, [sp, 16] */
- cfun->machine->frame.callee_adjust = cfun->machine->frame.frame_size;
+ cfun->machine->frame.callee_adjust = const_size;
}
- else if ((crtl->outgoing_args_size
- + cfun->machine->frame.saved_regs_size < 512)
+ else if (must_lt (crtl->outgoing_args_size
+ + cfun->machine->frame.saved_regs_size, 512)
&& !(cfun->calls_alloca
- && cfun->machine->frame.hard_fp_offset < max_push_offset))
+ && must_lt (cfun->machine->frame.hard_fp_offset,
+ max_push_offset)))
{
/* Frame with small outgoing arguments:
sub sp, sp, frame_size
@@ -3097,13 +3119,14 @@ #define SLOT_REQUIRED (-1)
cfun->machine->frame.callee_offset
= cfun->machine->frame.frame_size - cfun->machine->frame.hard_fp_offset;
}
- else if (cfun->machine->frame.hard_fp_offset < max_push_offset)
+ else if (cfun->machine->frame.hard_fp_offset.is_constant (&const_fp_offset)
+ && const_fp_offset < max_push_offset)
{
/* Frame with large outgoing arguments but a small local area:
stp reg1, reg2, [sp, -hard_fp_offset]!
stp reg3, reg4, [sp, 16]
sub sp, sp, outgoing_args_size */
- cfun->machine->frame.callee_adjust = cfun->machine->frame.hard_fp_offset;
+ cfun->machine->frame.callee_adjust = const_fp_offset;
cfun->machine->frame.final_adjust
= cfun->machine->frame.frame_size - cfun->machine->frame.callee_adjust;
}
@@ -3316,7 +3339,7 @@ aarch64_return_address_signing_enabled (
skipping any write-back candidates if SKIP_WB is true. */
static void
-aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset,
+aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
unsigned start, unsigned limit, bool skip_wb)
{
rtx_insn *insn;
@@ -3328,7 +3351,7 @@ aarch64_save_callee_saves (machine_mode
regno = aarch64_next_callee_save (regno + 1, limit))
{
rtx reg, mem;
- HOST_WIDE_INT offset;
+ poly_int64 offset;
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
@@ -3381,13 +3404,13 @@ aarch64_save_callee_saves (machine_mode
static void
aarch64_restore_callee_saves (machine_mode mode,
- HOST_WIDE_INT start_offset, unsigned start,
+ poly_int64 start_offset, unsigned start,
unsigned limit, bool skip_wb, rtx *cfi_ops)
{
rtx base_rtx = stack_pointer_rtx;
unsigned regno;
unsigned regno2;
- HOST_WIDE_INT offset;
+ poly_int64 offset;
for (regno = aarch64_next_callee_save (start, limit);
regno <= limit;
@@ -3432,25 +3455,27 @@ aarch64_restore_callee_saves (machine_mo
static inline bool
offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
- HOST_WIDE_INT offset)
+ poly_int64 offset)
{
- return offset >= -256 && offset < 256;
+ HOST_WIDE_INT const_offset;
+ return (offset.is_constant (&const_offset)
+ && IN_RANGE (const_offset, -256, 255));
}
static inline bool
-offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
+offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
{
- return (offset >= 0
- && offset < 4096 * GET_MODE_SIZE (mode)
- && offset % GET_MODE_SIZE (mode) == 0);
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, 0, 4095));
}
bool
-aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
+aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
{
- return (offset >= -64 * GET_MODE_SIZE (mode)
- && offset < 64 * GET_MODE_SIZE (mode)
- && offset % GET_MODE_SIZE (mode) == 0);
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -64, 63));
}
/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */
@@ -3467,7 +3492,7 @@ aarch64_get_separate_components (void)
for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
if (aarch64_register_saved_on_entry (regno))
{
- HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+ poly_int64 offset = cfun->machine->frame.reg_offset[regno];
if (!frame_pointer_needed)
offset += cfun->machine->frame.frame_size
- cfun->machine->frame.hard_fp_offset;
@@ -3571,7 +3596,7 @@ aarch64_process_components (sbitmap comp
so DFmode for the vector registers is enough. */
machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
rtx reg = gen_rtx_REG (mode, regno);
- HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+ poly_int64 offset = cfun->machine->frame.reg_offset[regno];
if (!frame_pointer_needed)
offset += cfun->machine->frame.frame_size
- cfun->machine->frame.hard_fp_offset;
@@ -3593,13 +3618,13 @@ aarch64_process_components (sbitmap comp
break;
}
- HOST_WIDE_INT offset2 = cfun->machine->frame.reg_offset[regno2];
+ poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
/* The next register is not of the same class or its offset is not
mergeable with the current one into a pair. */
if (!satisfies_constraint_Ump (mem)
|| GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
- || (offset2 - cfun->machine->frame.reg_offset[regno])
- != GET_MODE_SIZE (mode))
+ || may_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
+ GET_MODE_SIZE (mode)))
{
insn = emit_insn (set);
RTX_FRAME_RELATED_P (insn) = 1;
@@ -3669,11 +3694,19 @@ aarch64_set_handled_components (sbitmap
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
-/* Allocate SIZE bytes of stack space using TEMP1 as a scratch register. */
+/* Allocate POLY_SIZE bytes of stack space using TEMP1 as a scratch
+ register. */
static void
-aarch64_allocate_and_probe_stack_space (rtx temp1, HOST_WIDE_INT size)
+aarch64_allocate_and_probe_stack_space (rtx temp1, poly_int64 poly_size)
{
+ HOST_WIDE_INT size;
+ if (!poly_size.is_constant (&size))
+ {
+ sorry ("stack probes for SVE frames");
+ return;
+ }
+
HOST_WIDE_INT probe_interval
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
HOST_WIDE_INT guard_size
@@ -3833,11 +3866,11 @@ aarch64_expand_prologue (void)
{
aarch64_layout_frame ();
- HOST_WIDE_INT frame_size = cfun->machine->frame.frame_size;
- HOST_WIDE_INT initial_adjust = cfun->machine->frame.initial_adjust;
+ poly_int64 frame_size = cfun->machine->frame.frame_size;
+ poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
- HOST_WIDE_INT final_adjust = cfun->machine->frame.final_adjust;
- HOST_WIDE_INT callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+ poly_int64 callee_offset = cfun->machine->frame.callee_offset;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
@@ -3852,19 +3885,19 @@ aarch64_expand_prologue (void)
}
if (flag_stack_usage_info)
- current_function_static_stack_size = frame_size;
+ current_function_static_stack_size = constant_lower_bound (frame_size);
if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
{
if (crtl->is_leaf && !cfun->calls_alloca)
{
- if (frame_size > PROBE_INTERVAL
- && frame_size > get_stack_check_protect ())
+ if (may_gt (frame_size, PROBE_INTERVAL)
+ && may_gt (frame_size, get_stack_check_protect ()))
aarch64_emit_probe_stack_range (get_stack_check_protect (),
(frame_size
- get_stack_check_protect ()));
}
- else if (frame_size > 0)
+ else if (may_gt (frame_size, 0))
aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size);
}
@@ -3899,23 +3932,23 @@ aarch64_expand_prologue (void)
HOST_WIDE_INT guard_used_by_caller = 1024;
if (flag_stack_clash_protection)
{
- if (frame_size == 0)
+ if (must_eq (frame_size, 0))
dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
- else if (initial_adjust < guard_size - guard_used_by_caller
- && final_adjust < guard_size - guard_used_by_caller)
+ else if (must_lt (initial_adjust, guard_size - guard_used_by_caller)
+ && must_lt (final_adjust, guard_size - guard_used_by_caller))
dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
}
/* In theory we should never have both an initial adjustment
and a callee save adjustment. Verify that is the case since the
code below does not handle it for -fstack-clash-protection. */
- gcc_assert (initial_adjust == 0 || callee_adjust == 0);
+ gcc_assert (must_eq (initial_adjust, 0) || callee_adjust == 0);
/* Only probe if the initial adjustment is larger than the guard
less the amount of the guard reserved for use by the caller's
outgoing args. */
if (flag_stack_clash_protection
- && initial_adjust >= guard_size - guard_used_by_caller)
+ && may_ge (initial_adjust, guard_size - guard_used_by_caller))
aarch64_allocate_and_probe_stack_space (ip0_rtx, initial_adjust);
else
aarch64_sub_sp (ip0_rtx, initial_adjust, true);
@@ -3940,19 +3973,19 @@ aarch64_expand_prologue (void)
callee_adjust != 0 || emit_frame_chain);
/* We may need to probe the final adjustment as well. */
- if (flag_stack_clash_protection && final_adjust != 0)
+ if (flag_stack_clash_protection && may_ne (final_adjust, 0))
{
/* First probe if the final adjustment is larger than the guard size
less the amount of the guard reserved for use by the caller's
outgoing args. */
- if (final_adjust >= guard_size - guard_used_by_caller)
+ if (may_ge (final_adjust, guard_size - guard_used_by_caller))
aarch64_allocate_and_probe_stack_space (ip1_rtx, final_adjust);
else
aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
/* We must also probe if the final adjustment is larger than the guard
that is assumed used by the caller. This may be sub-optimal. */
- if (final_adjust >= guard_used_by_caller)
+ if (may_ge (final_adjust, guard_used_by_caller))
{
if (dump_file)
fprintf (dump_file,
@@ -3981,7 +4014,7 @@ aarch64_use_return_insn_p (void)
aarch64_layout_frame ();
- return cfun->machine->frame.frame_size == 0;
+ return must_eq (cfun->machine->frame.frame_size, 0);
}
/* Generate the epilogue instructions for returning from a function.
@@ -3994,21 +4027,22 @@ aarch64_expand_epilogue (bool for_sibcal
{
aarch64_layout_frame ();
- HOST_WIDE_INT initial_adjust = cfun->machine->frame.initial_adjust;
+ poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
- HOST_WIDE_INT final_adjust = cfun->machine->frame.final_adjust;
- HOST_WIDE_INT callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+ poly_int64 callee_offset = cfun->machine->frame.callee_offset;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
rtx cfi_ops = NULL;
rtx_insn *insn;
/* We need to add memory barrier to prevent read from deallocated stack. */
- bool need_barrier_p = (get_frame_size ()
- + cfun->machine->frame.saved_varargs_size) != 0;
+ bool need_barrier_p = may_ne (get_frame_size ()
+ + cfun->machine->frame.saved_varargs_size, 0);
/* Emit a barrier to prevent loads from a deallocated stack. */
- if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
+ if (may_gt (final_adjust, crtl->outgoing_args_size)
+ || cfun->calls_alloca
|| crtl->calls_eh_return)
{
emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
@@ -4019,7 +4053,7 @@ aarch64_expand_epilogue (bool for_sibcal
be the same as the stack pointer. */
rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
- if (frame_pointer_needed && (final_adjust || cfun->calls_alloca))
+ if (frame_pointer_needed && (may_ne (final_adjust, 0) || cfun->calls_alloca))
/* If writeback is used when restoring callee-saves, the CFA
is restored on the instruction doing the writeback. */
aarch64_add_offset (Pmode, stack_pointer_rtx,
@@ -4043,7 +4077,7 @@ aarch64_expand_epilogue (bool for_sibcal
if (callee_adjust != 0)
aarch64_pop_regs (reg1, reg2, callee_adjust, &cfi_ops);
- if (callee_adjust != 0 || initial_adjust > 65536)
+ if (callee_adjust != 0 || may_gt (initial_adjust, 65536))
{
/* Emit delayed restores and set the CFA to be SP + initial_adjust. */
insn = get_last_insn ();
@@ -4644,9 +4678,9 @@ aarch64_classify_index (struct aarch64_a
&& contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
index = SUBREG_REG (index);
- if ((shift == 0 ||
- (shift > 0 && shift <= 3
- && (1 << shift) == GET_MODE_SIZE (mode)))
+ if ((shift == 0
+ || (shift > 0 && shift <= 3
+ && must_eq (1 << shift, GET_MODE_SIZE (mode))))
&& REG_P (index)
&& aarch64_regno_ok_for_index_p (REGNO (index), strict_p))
{
@@ -4668,7 +4702,7 @@ aarch64_mode_valid_for_sched_fusion_p (m
return mode == SImode || mode == DImode
|| mode == SFmode || mode == DFmode
|| (aarch64_vector_mode_supported_p (mode)
- && GET_MODE_SIZE (mode) == 8);
+ && must_eq (GET_MODE_SIZE (mode), 8));
}
/* Return true if REGNO is a virtual pointer register, or an eliminable
@@ -4694,6 +4728,7 @@ aarch64_classify_address (struct aarch64
{
enum rtx_code code = GET_CODE (x);
rtx op0, op1;
+ HOST_WIDE_INT const_size;
/* On BE, we use load/store pair for all large int mode load/stores.
TI/TFmode may also use a load/store pair. */
@@ -4703,10 +4738,10 @@ aarch64_classify_address (struct aarch64
|| (BYTES_BIG_ENDIAN
&& aarch64_vect_struct_mode_p (mode)));
- bool allow_reg_index_p =
- !load_store_pair_p
- && (GET_MODE_SIZE (mode) != 16 || aarch64_vector_mode_supported_p (mode))
- && !aarch64_vect_struct_mode_p (mode);
+ bool allow_reg_index_p = (!load_store_pair_p
+ && (may_ne (GET_MODE_SIZE (mode), 16)
+ || aarch64_vector_mode_supported_p (mode))
+ && !aarch64_vect_struct_mode_p (mode));
/* On LE, for AdvSIMD, don't support anything other than POST_INC or
REG addressing. */
@@ -4739,7 +4774,7 @@ aarch64_classify_address (struct aarch64
return true;
}
- if (GET_MODE_SIZE (mode) != 0
+ if (may_ne (GET_MODE_SIZE (mode), 0)
&& CONST_INT_P (op1)
&& aarch64_base_register_rtx_p (op0, strict_p))
{
@@ -4786,7 +4821,8 @@ aarch64_classify_address (struct aarch64
offset + 32));
if (load_store_pair_p)
- return ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
+ return ((must_eq (GET_MODE_SIZE (mode), 4)
+ || must_eq (GET_MODE_SIZE (mode), 8))
&& aarch64_offset_7bit_signed_scaled_p (mode, offset));
else
return (offset_9bit_signed_unscaled_p (mode, offset)
@@ -4846,7 +4882,8 @@ aarch64_classify_address (struct aarch64
&& offset_9bit_signed_unscaled_p (mode, offset));
if (load_store_pair_p)
- return ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
+ return ((must_eq (GET_MODE_SIZE (mode), 4)
+ || must_eq (GET_MODE_SIZE (mode), 8))
&& aarch64_offset_7bit_signed_scaled_p (mode, offset));
else
return offset_9bit_signed_unscaled_p (mode, offset);
@@ -4860,7 +4897,9 @@ aarch64_classify_address (struct aarch64
for SI mode or larger. */
info->type = ADDRESS_SYMBOLIC;
- if (!load_store_pair_p && GET_MODE_SIZE (mode) >= 4)
+ if (!load_store_pair_p
+ && GET_MODE_SIZE (mode).is_constant (&const_size)
+ && const_size >= 4)
{
rtx sym, addend;
@@ -4886,7 +4925,6 @@ aarch64_classify_address (struct aarch64
{
/* The symbol and offset must be aligned to the access size. */
unsigned int align;
- unsigned int ref_size;
if (CONSTANT_POOL_ADDRESS_P (sym))
align = GET_MODE_ALIGNMENT (get_pool_mode (sym));
@@ -4904,12 +4942,12 @@ aarch64_classify_address (struct aarch64
else
align = BITS_PER_UNIT;
- ref_size = GET_MODE_SIZE (mode);
- if (ref_size == 0)
+ poly_int64 ref_size = GET_MODE_SIZE (mode);
+ if (must_eq (ref_size, 0))
ref_size = GET_MODE_SIZE (DImode);
- return ((INTVAL (offs) & (ref_size - 1)) == 0
- && ((align / BITS_PER_UNIT) & (ref_size - 1)) == 0);
+ return (multiple_p (INTVAL (offs), ref_size)
+ && multiple_p (align / BITS_PER_UNIT, ref_size));
}
}
return false;
@@ -4987,19 +5025,24 @@ aarch64_legitimate_address_p (machine_mo
static bool
aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode mode)
{
- HOST_WIDE_INT offset = INTVAL (*disp);
- HOST_WIDE_INT base;
+ HOST_WIDE_INT size;
+ if (GET_MODE_SIZE (mode).is_constant (&size))
+ {
+ HOST_WIDE_INT offset = INTVAL (*disp);
+ HOST_WIDE_INT base;
- if (mode == TImode || mode == TFmode)
- base = (offset + 0x100) & ~0x1f8;
- else if ((offset & (GET_MODE_SIZE (mode) - 1)) != 0)
- base = (offset + 0x100) & ~0x1ff;
- else
- base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc);
+ if (mode == TImode || mode == TFmode)
+ base = (offset + 0x100) & ~0x1f8;
+ else if ((offset & (size - 1)) != 0)
+ base = (offset + 0x100) & ~0x1ff;
+ else
+ base = offset & ~(size < 4 ? 0xfff : 0x3ffc);
- *off = GEN_INT (base);
- *disp = GEN_INT (offset - base);
- return true;
+ *off = GEN_INT (base);
+ *disp = GEN_INT (offset - base);
+ return true;
+ }
+ return false;
}
/* Return the binary representation of floating point constant VALUE in INTVAL.
@@ -5850,6 +5893,7 @@ #define buf_size 20
aarch64_print_operand_address (FILE *f, machine_mode mode, rtx x)
{
struct aarch64_address_info addr;
+ unsigned int size;
if (aarch64_classify_address (&addr, x, mode, true))
switch (addr.type)
@@ -5890,30 +5934,28 @@ aarch64_print_operand_address (FILE *f,
return;
case ADDRESS_REG_WB:
+ /* Writeback is only supported for fixed-width modes. */
+ size = GET_MODE_SIZE (mode).to_constant ();
switch (GET_CODE (x))
{
case PRE_INC:
- asm_fprintf (f, "[%s, %d]!", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s, %d]!", reg_names[REGNO (addr.base)], size);
return;
case POST_INC:
- asm_fprintf (f, "[%s], %d", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s], %d", reg_names[REGNO (addr.base)], size);
return;
case PRE_DEC:
- asm_fprintf (f, "[%s, -%d]!", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s, -%d]!", reg_names[REGNO (addr.base)], size);
return;
case POST_DEC:
- asm_fprintf (f, "[%s], -%d", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s], -%d", reg_names[REGNO (addr.base)], size);
return;
case PRE_MODIFY:
- asm_fprintf (f, "[%s, %wd]!", reg_names [REGNO (addr.base)],
+ asm_fprintf (f, "[%s, %wd]!", reg_names[REGNO (addr.base)],
INTVAL (addr.offset));
return;
case POST_MODIFY:
- asm_fprintf (f, "[%s], %wd", reg_names [REGNO (addr.base)],
+ asm_fprintf (f, "[%s], %wd", reg_names[REGNO (addr.base)],
INTVAL (addr.offset));
return;
default:
@@ -5988,6 +6030,39 @@ aarch64_regno_regclass (unsigned regno)
return NO_REGS;
}
+/* OFFSET is an address offset for mode MODE, which has SIZE bytes.
+ If OFFSET is out of range, return an offset of an anchor point
+ that is in range. Return 0 otherwise. */
+
+static HOST_WIDE_INT
+aarch64_anchor_offset (HOST_WIDE_INT offset, HOST_WIDE_INT size,
+ machine_mode mode)
+{
+ /* Does it look like we'll need a 16-byte load/store-pair operation? */
+ if (size > 16)
+ return (offset + 0x400) & ~0x7f0;
+
+ /* For offsets that aren't a multiple of the access size, the limit is
+ -256...255. */
+ if (offset & (size - 1))
+ {
+ /* BLKmode typically uses LDP of X-registers. */
+ if (mode == BLKmode)
+ return (offset + 512) & ~0x3ff;
+ return (offset + 0x100) & ~0x1ff;
+ }
+
+ /* Small negative offsets are supported. */
+ if (IN_RANGE (offset, -256, 0))
+ return 0;
+
+ if (mode == TImode || mode == TFmode)
+ return (offset + 0x100) & ~0x1ff;
+
+ /* Use 12-bit offset by access size. */
+ return offset & (~0xfff * size);
+}
+
static rtx
aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
{
@@ -6037,34 +6112,17 @@ aarch64_legitimize_address (rtx x, rtx /
x = gen_rtx_PLUS (Pmode, base, offset_rtx);
}
- /* Does it look like we'll need a 16-byte load/store-pair operation? */
- HOST_WIDE_INT base_offset;
- if (GET_MODE_SIZE (mode) > 16)
- base_offset = (offset + 0x400) & ~0x7f0;
- /* For offsets aren't a multiple of the access size, the limit is
- -256...255. */
- else if (offset & (GET_MODE_SIZE (mode) - 1))
- {
- base_offset = (offset + 0x100) & ~0x1ff;
-
- /* BLKmode typically uses LDP of X-registers. */
- if (mode == BLKmode)
- base_offset = (offset + 512) & ~0x3ff;
- }
- /* Small negative offsets are supported. */
- else if (IN_RANGE (offset, -256, 0))
- base_offset = 0;
- else if (mode == TImode || mode == TFmode)
- base_offset = (offset + 0x100) & ~0x1ff;
- /* Use 12-bit offset by access size. */
- else
- base_offset = offset & (~0xfff * GET_MODE_SIZE (mode));
-
- if (base_offset != 0)
+ HOST_WIDE_INT size;
+ if (GET_MODE_SIZE (mode).is_constant (&size))
{
- base = plus_constant (Pmode, base, base_offset);
- base = force_operand (base, NULL_RTX);
- return plus_constant (Pmode, base, offset - base_offset);
+ HOST_WIDE_INT base_offset = aarch64_anchor_offset (offset, size,
+ mode);
+ if (base_offset != 0)
+ {
+ base = plus_constant (Pmode, base, base_offset);
+ base = force_operand (base, NULL_RTX);
+ return plus_constant (Pmode, base, offset - base_offset);
+ }
}
}
@@ -6151,7 +6209,7 @@ aarch64_secondary_reload (bool in_p ATTR
because AArch64 has richer addressing modes for LDR/STR instructions
than LDP/STP instructions. */
if (TARGET_FLOAT && rclass == GENERAL_REGS
- && GET_MODE_SIZE (mode) == 16 && MEM_P (x))
+ && must_eq (GET_MODE_SIZE (mode), 16) && MEM_P (x))
return FP_REGS;
if (rclass == FP_REGS && (mode == TImode || mode == TFmode) && CONSTANT_P(x))
@@ -6195,7 +6253,7 @@ aarch64_can_eliminate (const int from, c
return true;
}
-HOST_WIDE_INT
+poly_int64
aarch64_initial_elimination_offset (unsigned from, unsigned to)
{
aarch64_layout_frame ();
@@ -6281,6 +6339,11 @@ aarch64_trampoline_init (rtx m_tramp, tr
static unsigned char
aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
{
+ /* ??? Logically we should only need to provide a value when
+ HARD_REGNO_MODE_OK says that at least one register in REGCLASS
+ can hold MODE, but at the moment we need to handle all modes.
+ Just ignore any runtime parts for registers that can't store them. */
+ HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
switch (regclass)
{
case CALLER_SAVE_REGS:
@@ -6290,10 +6353,9 @@ aarch64_class_max_nregs (reg_class_t reg
case POINTER_AND_FP_REGS:
case FP_REGS:
case FP_LO_REGS:
- return
- aarch64_vector_mode_p (mode)
- ? (GET_MODE_SIZE (mode) + UNITS_PER_VREG - 1) / UNITS_PER_VREG
- : (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ return (aarch64_vector_mode_p (mode)
+ ? CEIL (lowest_size, UNITS_PER_VREG)
+ : CEIL (lowest_size, UNITS_PER_WORD));
case STACK_REG:
return 1;
@@ -6844,25 +6906,15 @@ aarch64_address_cost (rtx x,
{
/* For the sake of calculating the cost of the shifted register
component, we can treat same sized modes in the same way. */
- switch (GET_MODE_BITSIZE (mode))
- {
- case 16:
- cost += addr_cost->addr_scale_costs.hi;
- break;
-
- case 32:
- cost += addr_cost->addr_scale_costs.si;
- break;
-
- case 64:
- cost += addr_cost->addr_scale_costs.di;
- break;
-
- /* We can't tell, or this is a 128-bit vector. */
- default:
- cost += addr_cost->addr_scale_costs.ti;
- break;
- }
+ if (must_eq (GET_MODE_BITSIZE (mode), 16))
+ cost += addr_cost->addr_scale_costs.hi;
+ else if (must_eq (GET_MODE_BITSIZE (mode), 32))
+ cost += addr_cost->addr_scale_costs.si;
+ else if (must_eq (GET_MODE_BITSIZE (mode), 64))
+ cost += addr_cost->addr_scale_costs.di;
+ else
+ /* We can't tell, or this is a 128-bit vector. */
+ cost += addr_cost->addr_scale_costs.ti;
}
return cost;
@@ -7991,7 +8043,8 @@ aarch64_rtx_costs (rtx x, machine_mode m
if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
&& CONST_INT_P (XEXP (op1, 1))
- && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
+ && must_eq (INTVAL (XEXP (op1, 1)),
+ GET_MODE_BITSIZE (mode) - 1))
{
*cost += rtx_cost (op0, mode, (rtx_code) code, 0, speed);
/* We already demanded XEXP (op1, 0) to be REG_P, so
@@ -8039,7 +8092,8 @@ aarch64_rtx_costs (rtx x, machine_mode m
if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
&& CONST_INT_P (XEXP (op1, 1))
- && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
+ && must_eq (INTVAL (XEXP (op1, 1)),
+ GET_MODE_BITSIZE (mode) - 1))
{
*cost += rtx_cost (op0, mode, (rtx_code) code, 0, speed);
/* We already demanded XEXP (op1, 0) to be REG_P, so
@@ -8465,7 +8519,7 @@ aarch64_register_move_cost (machine_mode
return aarch64_register_move_cost (mode, from, GENERAL_REGS)
+ aarch64_register_move_cost (mode, GENERAL_REGS, to);
- if (GET_MODE_SIZE (mode) == 16)
+ if (must_eq (GET_MODE_SIZE (mode), 16))
{
/* 128-bit operations on general registers require 2 instructions. */
if (from == GENERAL_REGS && to == GENERAL_REGS)
@@ -8838,7 +8892,7 @@ aarch64_builtin_vectorization_cost (enum
return fp ? costs->vec_fp_stmt_cost : costs->vec_int_stmt_cost;
case vec_construct:
- elements = TYPE_VECTOR_SUBPARTS (vectype);
+ elements = estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
return elements / 2 + 1;
default:
@@ -10900,6 +10954,10 @@ aarch64_gimplify_va_arg_expr (tree valis
&nregs,
&is_ha))
{
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ unsigned int ag_size = GET_MODE_SIZE (ag_mode).to_constant ();
+
/* TYPE passed in fp/simd registers. */
if (!TARGET_FLOAT)
aarch64_err_no_fpadvsimd (mode, "varargs");
@@ -10913,8 +10971,8 @@ aarch64_gimplify_va_arg_expr (tree valis
if (is_ha)
{
- if (BYTES_BIG_ENDIAN && GET_MODE_SIZE (ag_mode) < UNITS_PER_VREG)
- adjust = UNITS_PER_VREG - GET_MODE_SIZE (ag_mode);
+ if (BYTES_BIG_ENDIAN && ag_size < UNITS_PER_VREG)
+ adjust = UNITS_PER_VREG - ag_size;
}
else if (BLOCK_REG_PADDING (mode, type, 1) == PAD_DOWNWARD
&& size < UNITS_PER_VREG)
@@ -11302,8 +11360,8 @@ aapcs_vfp_sub_candidate (const_tree type
- tree_to_uhwi (TYPE_MIN_VALUE (index)));
/* There must be no padding. */
- if (wi::to_wide (TYPE_SIZE (type))
- != count * GET_MODE_BITSIZE (*modep))
+ if (may_ne (wi::to_poly_wide (TYPE_SIZE (type)),
+ count * GET_MODE_BITSIZE (*modep)))
return -1;
return count;
@@ -11333,8 +11391,8 @@ aapcs_vfp_sub_candidate (const_tree type
}
/* There must be no padding. */
- if (wi::to_wide (TYPE_SIZE (type))
- != count * GET_MODE_BITSIZE (*modep))
+ if (may_ne (wi::to_poly_wide (TYPE_SIZE (type)),
+ count * GET_MODE_BITSIZE (*modep)))
return -1;
return count;
@@ -11366,8 +11424,8 @@ aapcs_vfp_sub_candidate (const_tree type
}
/* There must be no padding. */
- if (wi::to_wide (TYPE_SIZE (type))
- != count * GET_MODE_BITSIZE (*modep))
+ if (may_ne (wi::to_poly_wide (TYPE_SIZE (type)),
+ count * GET_MODE_BITSIZE (*modep)))
return -1;
return count;
@@ -11389,7 +11447,7 @@ aapcs_vfp_sub_candidate (const_tree type
aarch64_short_vector_p (const_tree type,
machine_mode mode)
{
- HOST_WIDE_INT size = -1;
+ poly_int64 size = -1;
if (type && TREE_CODE (type) == VECTOR_TYPE)
size = int_size_in_bytes (type);
@@ -11397,7 +11455,7 @@ aarch64_short_vector_p (const_tree type,
|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
size = GET_MODE_SIZE (mode);
- return (size == 8 || size == 16);
+ return must_eq (size, 8) || must_eq (size, 16);
}
/* Return TRUE if the type, as described by TYPE and MODE, is a composite
@@ -12039,11 +12097,11 @@ aarch64_simd_vect_par_cnst_half (machine
aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
bool high)
{
- if (!VECTOR_MODE_P (mode))
+ int nelts;
+ if (!VECTOR_MODE_P (mode) || !GET_MODE_NUNITS (mode).is_constant (&nelts))
return false;
- rtx ideal = aarch64_simd_vect_par_cnst_half (mode, GET_MODE_NUNITS (mode),
- high);
+ rtx ideal = aarch64_simd_vect_par_cnst_half (mode, nelts, high);
HOST_WIDE_INT count_op = XVECLEN (op, 0);
HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
int i = 0;
@@ -12128,7 +12186,8 @@ aarch64_simd_emit_reg_reg_move (rtx *ope
int
aarch64_simd_attr_length_rglist (machine_mode mode)
{
- return (GET_MODE_SIZE (mode) / UNITS_PER_VREG) * 4;
+ /* This is only used (and only meaningful) for Advanced SIMD, not SVE. */
+ return (GET_MODE_SIZE (mode).to_constant () / UNITS_PER_VREG) * 4;
}
/* Implement target hook TARGET_VECTOR_ALIGNMENT. The AAPCS64 sets the maximum
@@ -12208,7 +12267,6 @@ aarch64_simd_make_constant (rtx vals)
machine_mode mode = GET_MODE (vals);
rtx const_dup;
rtx const_vec = NULL_RTX;
- int n_elts = GET_MODE_NUNITS (mode);
int n_const = 0;
int i;
@@ -12219,6 +12277,7 @@ aarch64_simd_make_constant (rtx vals)
/* A CONST_VECTOR must contain only CONST_INTs and
CONST_DOUBLEs, but CONSTANT_P allows more (e.g. SYMBOL_REF).
Only store valid constants in a CONST_VECTOR. */
+ int n_elts = XVECLEN (vals, 0);
for (i = 0; i < n_elts; ++i)
{
rtx x = XVECEXP (vals, 0, i);
@@ -12257,7 +12316,7 @@ aarch64_expand_vector_init (rtx target,
machine_mode mode = GET_MODE (target);
scalar_mode inner_mode = GET_MODE_INNER (mode);
/* The number of vector elements. */
- int n_elts = GET_MODE_NUNITS (mode);
+ int n_elts = XVECLEN (vals, 0);
/* The number of vector elements which are not constant. */
int n_var = 0;
rtx any_const = NULL_RTX;
@@ -12397,7 +12456,9 @@ aarch64_shift_truncation_mask (machine_m
return
(!SHIFT_COUNT_TRUNCATED
|| aarch64_vector_mode_supported_p (mode)
- || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
+ || aarch64_vect_struct_mode_p (mode))
+ ? 0
+ : (GET_MODE_UNIT_BITSIZE (mode) - 1);
}
/* Select a format to encode pointers in exception handling data. */
@@ -13798,15 +13859,13 @@ aarch64_evpc_tbl (struct expand_vec_perm
return false;
for (i = 0; i < nelt; ++i)
- {
- int nunits = GET_MODE_NUNITS (vmode);
+ /* If big-endian and two vectors we end up with a weird mixed-endian
+ mode on NEON. Reverse the index within each word but not the word
+ itself. */
+ rperm[i] = GEN_INT (BYTES_BIG_ENDIAN
+ ? d->perm[i] ^ (nelt - 1)
+ : d->perm[i]);
- /* If big-endian and two vectors we end up with a weird mixed-endian
- mode on NEON. Reverse the index within each word but not the word
- itself. */
- rperm[i] = GEN_INT (BYTES_BIG_ENDIAN ? d->perm[i] ^ (nunits - 1)
- : d->perm[i]);
- }
sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
sel = force_reg (vmode, sel);
@@ -14003,7 +14062,7 @@ aarch64_modes_tieable_p (machine_mode mo
AMOUNT bytes. */
static rtx
-aarch64_move_pointer (rtx pointer, int amount)
+aarch64_move_pointer (rtx pointer, poly_int64 amount)
{
rtx next = plus_constant (Pmode, XEXP (pointer, 0), amount);
@@ -14017,9 +14076,7 @@ aarch64_move_pointer (rtx pointer, int a
static rtx
aarch64_progress_pointer (rtx pointer)
{
- HOST_WIDE_INT amount = GET_MODE_SIZE (GET_MODE (pointer));
-
- return aarch64_move_pointer (pointer, amount);
+ return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
}
/* Copy one MODE sized block from SRC to DST, then progress SRC and DST by
@@ -14846,7 +14903,9 @@ aarch64_operands_ok_for_ldpstp (rtx *ope
offval_1 = INTVAL (offset_1);
offval_2 = INTVAL (offset_2);
- msize = GET_MODE_SIZE (mode);
+ /* We should only be trying this for fixed-sized modes. There is no
+ SVE LDP/STP instruction. */
+ msize = GET_MODE_SIZE (mode).to_constant ();
/* Check if the offsets are consecutive. */
if (offval_1 != (offval_2 + msize) && offval_2 != (offval_1 + msize))
return false;
Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md 2017-10-27 14:11:54.071011147 +0100
+++ gcc/config/aarch64/aarch64.md 2017-10-27 14:12:17.402039296 +0100
@@ -3328,7 +3328,7 @@ (define_insn "aarch64_<crc_variant>"
CRC))]
"TARGET_CRC32"
{
- if (GET_MODE_BITSIZE (GET_MODE (operands[2])) >= 64)
+ if (GET_MODE_BITSIZE (<crc_mode>mode) >= 64)
return "<crc_variant>\\t%w0, %w1, %x2";
else
return "<crc_variant>\\t%w0, %w1, %w2";
^ permalink raw reply [flat|nested] 29+ messages in thread
* [10/nn] [AArch64] Minor rtx costs tweak
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (8 preceding siblings ...)
2017-10-27 13:30 ` [09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const) Richard Sandiford
@ 2017-10-27 13:31 ` Richard Sandiford
2017-10-31 18:25 ` James Greenhalgh
2017-10-27 13:31 ` [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2 Richard Sandiford
2017-10-27 13:37 ` [12/nn] [AArch64] Add const_offset field to aarch64_address_info Richard Sandiford
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:31 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
aarch64_rtx_costs uses the number of registers in a mode as the basis
of SET costs. This patch makes it get the number of registers from
aarch64_hard_regno_nregs rather than repeating the calcalation inline.
Handling SVE modes in aarch64_hard_regno_nregs is then enough to get
the correct SET cost as well.
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_rtx_costs): Use
aarch64_hard_regno_nregs to get the number of registers
in a mode.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:12:11.045026014 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:14.533257115 +0100
@@ -7200,18 +7200,16 @@ aarch64_rtx_costs (rtx x, machine_mode m
/* The cost is one per vector-register copied. */
if (VECTOR_MODE_P (GET_MODE (op0)) && REG_P (op1))
{
- int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
- / GET_MODE_SIZE (V4SImode);
- *cost = COSTS_N_INSNS (n_minus_1 + 1);
+ int nregs = aarch64_hard_regno_nregs (V0_REGNUM, GET_MODE (op0));
+ *cost = COSTS_N_INSNS (nregs);
}
/* const0_rtx is in general free, but we will use an
instruction to set a register to 0. */
else if (REG_P (op1) || op1 == const0_rtx)
{
/* The cost is 1 per register copied. */
- int n_minus_1 = (GET_MODE_SIZE (GET_MODE (op0)) - 1)
- / UNITS_PER_WORD;
- *cost = COSTS_N_INSNS (n_minus_1 + 1);
+ int nregs = aarch64_hard_regno_nregs (R0_REGNUM, GET_MODE (op0));
+ *cost = COSTS_N_INSNS (nregs);
}
else
/* Cost is just the cost of the RHS of the set. */
^ permalink raw reply [flat|nested] 29+ messages in thread
* [12/nn] [AArch64] Add const_offset field to aarch64_address_info
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
` (10 preceding siblings ...)
2017-10-27 13:31 ` [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2 Richard Sandiford
@ 2017-10-27 13:37 ` Richard Sandiford
2017-11-02 10:09 ` James Greenhalgh
11 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-27 13:37 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
This patch records the integer value of the address offset in
aarch64_address_info, so that it doesn't need to be re-extracted
from the rtx. The SVE port will make more use of this. The patch
also uses poly_int64 routines to manipulate the offset, rather than
just handling CONST_INTs.
2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_address_info): Add a const_offset
field.
(aarch64_classify_address): Initialize it. Track polynomial offsets.
(aarch64_print_operand_address): Use it to check for a zero offset.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-27 14:13:59.548121066 +0100
+++ gcc/config/aarch64/aarch64.c 2017-10-27 14:14:17.047874812 +0100
@@ -113,6 +113,7 @@ struct aarch64_address_info {
enum aarch64_address_type type;
rtx base;
rtx offset;
+ poly_int64 const_offset;
int shift;
enum aarch64_symbol_type symbol_type;
};
@@ -4728,6 +4729,8 @@ aarch64_classify_address (struct aarch64
{
enum rtx_code code = GET_CODE (x);
rtx op0, op1;
+ poly_int64 offset;
+
HOST_WIDE_INT const_size;
/* On BE, we use load/store pair for all large int mode load/stores.
@@ -4756,6 +4759,7 @@ aarch64_classify_address (struct aarch64
info->type = ADDRESS_REG_IMM;
info->base = x;
info->offset = const0_rtx;
+ info->const_offset = 0;
return aarch64_base_register_rtx_p (x, strict_p);
case PLUS:
@@ -4765,24 +4769,24 @@ aarch64_classify_address (struct aarch64
if (! strict_p
&& REG_P (op0)
&& virt_or_elim_regno_p (REGNO (op0))
- && CONST_INT_P (op1))
+ && poly_int_rtx_p (op1, &offset))
{
info->type = ADDRESS_REG_IMM;
info->base = op0;
info->offset = op1;
+ info->const_offset = offset;
return true;
}
if (may_ne (GET_MODE_SIZE (mode), 0)
- && CONST_INT_P (op1)
- && aarch64_base_register_rtx_p (op0, strict_p))
+ && aarch64_base_register_rtx_p (op0, strict_p)
+ && poly_int_rtx_p (op1, &offset))
{
- HOST_WIDE_INT offset = INTVAL (op1);
-
info->type = ADDRESS_REG_IMM;
info->base = op0;
info->offset = op1;
+ info->const_offset = offset;
/* TImode and TFmode values are allowed in both pairs of X
registers and individual Q registers. The available
@@ -4862,13 +4866,12 @@ aarch64_classify_address (struct aarch64
info->type = ADDRESS_REG_WB;
info->base = XEXP (x, 0);
if (GET_CODE (XEXP (x, 1)) == PLUS
- && CONST_INT_P (XEXP (XEXP (x, 1), 1))
+ && poly_int_rtx_p (XEXP (XEXP (x, 1), 1), &offset)
&& rtx_equal_p (XEXP (XEXP (x, 1), 0), info->base)
&& aarch64_base_register_rtx_p (info->base, strict_p))
{
- HOST_WIDE_INT offset;
info->offset = XEXP (XEXP (x, 1), 1);
- offset = INTVAL (info->offset);
+ info->const_offset = offset;
/* TImode and TFmode values are allowed in both pairs of X
registers and individual Q registers. The available
@@ -5899,7 +5902,7 @@ aarch64_print_operand_address (FILE *f,
switch (addr.type)
{
case ADDRESS_REG_IMM:
- if (addr.offset == const0_rtx)
+ if (must_eq (addr.const_offset, 0))
asm_fprintf (f, "[%s]", reg_names [REGNO (addr.base)]);
else
asm_fprintf (f, "[%s, %wd]", reg_names [REGNO (addr.base)],
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [03/nn] [AArch64] Rework interface to add constant/offset routines
2017-10-27 13:26 ` [03/nn] [AArch64] Rework interface to add constant/offset routines Richard Sandiford
@ 2017-10-30 11:03 ` Richard Sandiford
2017-11-10 15:43 ` James Greenhalgh
0 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2017-10-30 11:03 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
Richard Sandiford <richard.sandiford@linaro.org> writes:
> The port had aarch64_add_offset and aarch64_add_constant routines
> that did similar things. This patch replaces them with an expanded
> version of aarch64_add_offset that takes separate source and
> destination registers. The new routine also takes a poly_int64 offset
> instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT
> case to aarch64_add_offset_1, which is basically a repurposed
> aarch64_add_constant_internal. The SVE patch will put the handling
> of VL-based constants in aarch64_add_offset, while still using
> aarch64_add_offset_1 for the constant part.
>
> The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0
> as well as temp1 once SVE is added.
>
> A side-effect of the patch is that we now generate:
>
> mov x29, sp
>
> instead of:
>
> add x29, sp, 0
>
> in the pr70044.c test.
Sorry, I stupidly rebased the patch just before posting and so
introduced a last-minute bug. Here's a fixed version that survives
testing on aarch64-linux-gnu.
Thanks,
Richard
2017-10-30 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
x exists before using it.
(aarch64_add_constant_internal): Rename to...
(aarch64_add_offset_1): ...this. Replace regnum with separate
src and dest rtxes. Handle the case in which they're different,
including when the offset is zero. Replace scratchreg with an rtx.
Use 2 additions if there is no spare register into which we can
move a 16-bit constant.
(aarch64_add_constant): Delete.
(aarch64_add_offset): Replace reg with separate src and dest
rtxes. Take a poly_int64 offset instead of a HOST_WIDE_INT.
Use aarch64_add_offset_1.
(aarch64_add_sp, aarch64_sub_sp): Take the scratch register as
an rtx rather than an int. Take the delta as a poly_int64
rather than a HOST_WIDE_INT. Use aarch64_add_offset.
(aarch64_expand_mov_immediate): Update uses of aarch64_add_offset.
(aarch64_allocate_and_probe_stack_space): Take the scratch register
as an rtx rather than an int. Use Pmode rather than word_mode
in the loop code. Update calls to aarch64_sub_sp.
(aarch64_expand_prologue): Update calls to aarch64_sub_sp,
aarch64_allocate_and_probe_stack_space and aarch64_add_offset.
(aarch64_expand_epilogue): Update calls to aarch64_add_offset
and aarch64_add_sp.
(aarch64_output_mi_thunk): Use aarch64_add_offset rather than
aarch64_add_constant.
gcc/testsuite/
* gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-10-30 10:45:23.806582919 +0000
+++ gcc/config/aarch64/aarch64.c 2017-10-30 10:46:47.278836421 +0000
@@ -1818,30 +1818,13 @@ aarch64_force_temporary (machine_mode mo
return force_reg (mode, value);
else
{
- x = aarch64_emit_move (x, value);
+ gcc_assert (x);
+ aarch64_emit_move (x, value);
return x;
}
}
-static rtx
-aarch64_add_offset (scalar_int_mode mode, rtx temp, rtx reg,
- HOST_WIDE_INT offset)
-{
- if (!aarch64_plus_immediate (GEN_INT (offset), mode))
- {
- rtx high;
- /* Load the full offset into a register. This
- might be improvable in the future. */
- high = GEN_INT (offset);
- offset = 0;
- high = aarch64_force_temporary (mode, temp, high);
- reg = aarch64_force_temporary (mode, temp,
- gen_rtx_PLUS (mode, high, reg));
- }
- return plus_constant (mode, reg, offset);
-}
-
static int
aarch64_internal_mov_immediate (rtx dest, rtx imm, bool generate,
scalar_int_mode mode)
@@ -1966,86 +1949,123 @@ aarch64_internal_mov_immediate (rtx dest
return num_insns;
}
-/* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to hold a
- temporary value if necessary. FRAME_RELATED_P should be true if
- the RTX_FRAME_RELATED flag should be set and CFA adjustments added
- to the generated instructions. If SCRATCHREG is known to hold
- abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
- immediate again.
-
- Since this function may be used to adjust the stack pointer, we must
- ensure that it cannot cause transient stack deallocation (for example
- by first incrementing SP and then decrementing when adjusting by a
- large immediate). */
+/* A subroutine of aarch64_add_offset that handles the case in which
+ OFFSET is known at compile time. The arguments are otherwise the same. */
static void
-aarch64_add_constant_internal (scalar_int_mode mode, int regnum,
- int scratchreg, HOST_WIDE_INT delta,
- bool frame_related_p, bool emit_move_imm)
+aarch64_add_offset_1 (scalar_int_mode mode, rtx dest,
+ rtx src, HOST_WIDE_INT offset, rtx temp1,
+ bool frame_related_p, bool emit_move_imm)
{
- HOST_WIDE_INT mdelta = abs_hwi (delta);
- rtx this_rtx = gen_rtx_REG (mode, regnum);
+ gcc_assert (emit_move_imm || temp1 != NULL_RTX);
+ gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+
+ HOST_WIDE_INT moffset = abs_hwi (offset);
rtx_insn *insn;
- if (!mdelta)
- return;
+ if (!moffset)
+ {
+ if (!rtx_equal_p (dest, src))
+ {
+ insn = emit_insn (gen_rtx_SET (dest, src));
+ RTX_FRAME_RELATED_P (insn) = frame_related_p;
+ }
+ return;
+ }
/* Single instruction adjustment. */
- if (aarch64_uimm12_shift (mdelta))
+ if (aarch64_uimm12_shift (moffset))
{
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta)));
+ insn = emit_insn (gen_add3_insn (dest, src, GEN_INT (offset)));
RTX_FRAME_RELATED_P (insn) = frame_related_p;
return;
}
- /* Emit 2 additions/subtractions if the adjustment is less than 24 bits.
- Only do this if mdelta is not a 16-bit move as adjusting using a move
- is better. */
- if (mdelta < 0x1000000 && !aarch64_move_imm (mdelta, mode))
+ /* Emit 2 additions/subtractions if the adjustment is less than 24 bits
+ and either:
+
+ a) the offset cannot be loaded by a 16-bit move or
+ b) there is no spare register into which we can move it. */
+ if (moffset < 0x1000000
+ && ((!temp1 && !can_create_pseudo_p ())
+ || !aarch64_move_imm (moffset, mode)))
{
- HOST_WIDE_INT low_off = mdelta & 0xfff;
+ HOST_WIDE_INT low_off = moffset & 0xfff;
- low_off = delta < 0 ? -low_off : low_off;
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (low_off)));
+ low_off = offset < 0 ? -low_off : low_off;
+ insn = emit_insn (gen_add3_insn (dest, src, GEN_INT (low_off)));
RTX_FRAME_RELATED_P (insn) = frame_related_p;
- insn = emit_insn (gen_add2_insn (this_rtx, GEN_INT (delta - low_off)));
+ insn = emit_insn (gen_add2_insn (dest, GEN_INT (offset - low_off)));
RTX_FRAME_RELATED_P (insn) = frame_related_p;
return;
}
/* Emit a move immediate if required and an addition/subtraction. */
- rtx scratch_rtx = gen_rtx_REG (mode, scratchreg);
if (emit_move_imm)
- aarch64_internal_mov_immediate (scratch_rtx, GEN_INT (mdelta), true, mode);
- insn = emit_insn (delta < 0 ? gen_sub2_insn (this_rtx, scratch_rtx)
- : gen_add2_insn (this_rtx, scratch_rtx));
+ {
+ gcc_assert (temp1 != NULL_RTX || can_create_pseudo_p ());
+ temp1 = aarch64_force_temporary (mode, temp1, GEN_INT (moffset));
+ }
+ insn = emit_insn (offset < 0
+ ? gen_sub3_insn (dest, src, temp1)
+ : gen_add3_insn (dest, src, temp1));
if (frame_related_p)
{
RTX_FRAME_RELATED_P (insn) = frame_related_p;
- rtx adj = plus_constant (mode, this_rtx, delta);
- add_reg_note (insn , REG_CFA_ADJUST_CFA, gen_rtx_SET (this_rtx, adj));
+ rtx adj = plus_constant (mode, src, offset);
+ add_reg_note (insn, REG_CFA_ADJUST_CFA, gen_rtx_SET (dest, adj));
}
}
-static inline void
-aarch64_add_constant (scalar_int_mode mode, int regnum, int scratchreg,
- HOST_WIDE_INT delta)
-{
- aarch64_add_constant_internal (mode, regnum, scratchreg, delta, false, true);
-}
+/* Set DEST to SRC + OFFSET. MODE is the mode of the addition.
+ FRAME_RELATED_P is true if the RTX_FRAME_RELATED flag should
+ be set and CFA adjustments added to the generated instructions.
+
+ TEMP1, if nonnull, is a register of mode MODE that can be used as a
+ temporary if register allocation is already complete. This temporary
+ register may overlap DEST but must not overlap SRC. If TEMP1 is known
+ to hold abs (OFFSET), EMIT_MOVE_IMM can be set to false to avoid emitting
+ the immediate again.
+
+ Since this function may be used to adjust the stack pointer, we must
+ ensure that it cannot cause transient stack deallocation (for example
+ by first incrementing SP and then decrementing when adjusting by a
+ large immediate). */
+
+static void
+aarch64_add_offset (scalar_int_mode mode, rtx dest, rtx src,
+ poly_int64 offset, rtx temp1, bool frame_related_p,
+ bool emit_move_imm = true)
+{
+ gcc_assert (emit_move_imm || temp1 != NULL_RTX);
+ gcc_assert (temp1 == NULL_RTX || !reg_overlap_mentioned_p (temp1, src));
+
+ /* SVE support will go here. */
+ HOST_WIDE_INT constant = offset.to_constant ();
+ aarch64_add_offset_1 (mode, dest, src, constant, temp1,
+ frame_related_p, emit_move_imm);
+}
+
+/* Add DELTA to the stack pointer, marking the instructions frame-related.
+ TEMP1 is available as a temporary if nonnull. EMIT_MOVE_IMM is false
+ if TEMP1 already contains abs (DELTA). */
static inline void
-aarch64_add_sp (int scratchreg, HOST_WIDE_INT delta, bool emit_move_imm)
+aarch64_add_sp (rtx temp1, poly_int64 delta, bool emit_move_imm)
{
- aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, delta,
- true, emit_move_imm);
+ aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, delta,
+ temp1, true, emit_move_imm);
}
+/* Subtract DELTA from the stack pointer, marking the instructions
+ frame-related if FRAME_RELATED_P. TEMP1 is available as a temporary
+ if nonnull. */
+
static inline void
-aarch64_sub_sp (int scratchreg, HOST_WIDE_INT delta, bool frame_related_p)
+aarch64_sub_sp (rtx temp1, poly_int64 delta, bool frame_related_p)
{
- aarch64_add_constant_internal (Pmode, SP_REGNUM, scratchreg, -delta,
- frame_related_p, true);
+ aarch64_add_offset (Pmode, stack_pointer_rtx, stack_pointer_rtx, -delta,
+ temp1, frame_related_p);
}
void
@@ -2078,9 +2098,8 @@ aarch64_expand_mov_immediate (rtx dest,
{
gcc_assert (can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- base = aarch64_add_offset (int_mode, NULL, base,
- INTVAL (offset));
- aarch64_emit_move (dest, base);
+ aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
+ NULL_RTX, false);
return;
}
@@ -2119,9 +2138,8 @@ aarch64_expand_mov_immediate (rtx dest,
{
gcc_assert(can_create_pseudo_p ());
base = aarch64_force_temporary (int_mode, dest, base);
- base = aarch64_add_offset (int_mode, NULL, base,
- INTVAL (offset));
- aarch64_emit_move (dest, base);
+ aarch64_add_offset (int_mode, dest, base, INTVAL (offset),
+ NULL_RTX, false);
return;
}
/* FALLTHRU */
@@ -3613,11 +3631,10 @@ aarch64_set_handled_components (sbitmap
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
-/* Allocate SIZE bytes of stack space using SCRATCH_REG as a scratch
- register. */
+/* Allocate SIZE bytes of stack space using TEMP1 as a scratch register. */
static void
-aarch64_allocate_and_probe_stack_space (int scratchreg, HOST_WIDE_INT size)
+aarch64_allocate_and_probe_stack_space (rtx temp1, HOST_WIDE_INT size)
{
HOST_WIDE_INT probe_interval
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
@@ -3631,7 +3648,7 @@ aarch64_allocate_and_probe_stack_space (
We can allocate GUARD_SIZE - GUARD_USED_BY_CALLER as a single chunk
without any probing. */
gcc_assert (size >= guard_size - guard_used_by_caller);
- aarch64_sub_sp (scratchreg, guard_size - guard_used_by_caller, true);
+ aarch64_sub_sp (temp1, guard_size - guard_used_by_caller, true);
HOST_WIDE_INT orig_size = size;
size -= (guard_size - guard_used_by_caller);
@@ -3643,17 +3660,16 @@ aarch64_allocate_and_probe_stack_space (
if (rounded_size && rounded_size <= 4 * probe_interval)
{
/* We don't use aarch64_sub_sp here because we don't want to
- repeatedly load SCRATCHREG. */
- rtx scratch_rtx = gen_rtx_REG (Pmode, scratchreg);
+ repeatedly load TEMP1. */
if (probe_interval > ARITH_FACTOR)
- emit_move_insn (scratch_rtx, GEN_INT (-probe_interval));
+ emit_move_insn (temp1, GEN_INT (-probe_interval));
else
- scratch_rtx = GEN_INT (-probe_interval);
+ temp1 = GEN_INT (-probe_interval);
for (HOST_WIDE_INT i = 0; i < rounded_size; i += probe_interval)
{
rtx_insn *insn = emit_insn (gen_add2_insn (stack_pointer_rtx,
- scratch_rtx));
+ temp1));
add_reg_note (insn, REG_STACK_CHECK, const0_rtx);
if (probe_interval > ARITH_FACTOR)
@@ -3674,10 +3690,10 @@ aarch64_allocate_and_probe_stack_space (
else if (rounded_size)
{
/* Compute the ending address. */
- rtx temp = gen_rtx_REG (word_mode, scratchreg);
- emit_move_insn (temp, GEN_INT (-rounded_size));
+ unsigned int scratchreg = REGNO (temp1);
+ emit_move_insn (temp1, GEN_INT (-rounded_size));
rtx_insn *insn
- = emit_insn (gen_add3_insn (temp, stack_pointer_rtx, temp));
+ = emit_insn (gen_add3_insn (temp1, stack_pointer_rtx, temp1));
/* For the initial allocation, we don't have a frame pointer
set up, so we always need CFI notes. If we're doing the
@@ -3692,7 +3708,7 @@ aarch64_allocate_and_probe_stack_space (
/* We want the CFA independent of the stack pointer for the
duration of the loop. */
add_reg_note (insn, REG_CFA_DEF_CFA,
- plus_constant (Pmode, temp,
+ plus_constant (Pmode, temp1,
(rounded_size + (orig_size - size))));
RTX_FRAME_RELATED_P (insn) = 1;
}
@@ -3702,7 +3718,7 @@ aarch64_allocate_and_probe_stack_space (
It also probes at a 4k interval regardless of the value of
PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL. */
insn = emit_insn (gen_probe_stack_range (stack_pointer_rtx,
- stack_pointer_rtx, temp));
+ stack_pointer_rtx, temp1));
/* Now reset the CFA register if needed. */
if (scratchreg == IP0_REGNUM || !frame_pointer_needed)
@@ -3723,7 +3739,7 @@ aarch64_allocate_and_probe_stack_space (
Note that any residual must be probed. */
if (residual)
{
- aarch64_sub_sp (scratchreg, residual, true);
+ aarch64_sub_sp (temp1, residual, true);
add_reg_note (get_last_insn (), REG_STACK_CHECK, const0_rtx);
emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
(residual - GET_MODE_SIZE (word_mode))));
@@ -3814,6 +3830,9 @@ aarch64_expand_prologue (void)
aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size);
}
+ rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
+ rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
+
/* We do not fully protect aarch64 against stack clash style attacks
as doing so would be prohibitively expensive with less utility over
time as newer compilers are deployed.
@@ -3859,9 +3878,9 @@ aarch64_expand_prologue (void)
outgoing args. */
if (flag_stack_clash_protection
&& initial_adjust >= guard_size - guard_used_by_caller)
- aarch64_allocate_and_probe_stack_space (IP0_REGNUM, initial_adjust);
+ aarch64_allocate_and_probe_stack_space (ip0_rtx, initial_adjust);
else
- aarch64_sub_sp (IP0_REGNUM, initial_adjust, true);
+ aarch64_sub_sp (ip0_rtx, initial_adjust, true);
if (callee_adjust != 0)
aarch64_push_regs (reg1, reg2, callee_adjust);
@@ -3871,10 +3890,9 @@ aarch64_expand_prologue (void)
if (callee_adjust == 0)
aarch64_save_callee_saves (DImode, callee_offset, R29_REGNUM,
R30_REGNUM, false);
- insn = emit_insn (gen_add3_insn (hard_frame_pointer_rtx,
- stack_pointer_rtx,
- GEN_INT (callee_offset)));
- RTX_FRAME_RELATED_P (insn) = frame_pointer_needed;
+ aarch64_add_offset (Pmode, hard_frame_pointer_rtx,
+ stack_pointer_rtx, callee_offset, ip1_rtx,
+ frame_pointer_needed);
emit_insn (gen_stack_tie (stack_pointer_rtx, hard_frame_pointer_rtx));
}
@@ -3890,9 +3908,9 @@ aarch64_expand_prologue (void)
less the amount of the guard reserved for use by the caller's
outgoing args. */
if (final_adjust >= guard_size - guard_used_by_caller)
- aarch64_allocate_and_probe_stack_space (IP1_REGNUM, final_adjust);
+ aarch64_allocate_and_probe_stack_space (ip1_rtx, final_adjust);
else
- aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
+ aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
/* We must also probe if the final adjustment is larger than the guard
that is assumed used by the caller. This may be sub-optimal. */
@@ -3905,7 +3923,7 @@ aarch64_expand_prologue (void)
}
}
else
- aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed);
+ aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
}
/* Return TRUE if we can use a simple_return insn.
@@ -3961,17 +3979,16 @@ aarch64_expand_epilogue (bool for_sibcal
/* Restore the stack pointer from the frame pointer if it may not
be the same as the stack pointer. */
+ rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
+ rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
if (frame_pointer_needed && (final_adjust || cfun->calls_alloca))
- {
- insn = emit_insn (gen_add3_insn (stack_pointer_rtx,
- hard_frame_pointer_rtx,
- GEN_INT (-callee_offset)));
- /* If writeback is used when restoring callee-saves, the CFA
- is restored on the instruction doing the writeback. */
- RTX_FRAME_RELATED_P (insn) = callee_adjust == 0;
- }
+ /* If writeback is used when restoring callee-saves, the CFA
+ is restored on the instruction doing the writeback. */
+ aarch64_add_offset (Pmode, stack_pointer_rtx,
+ hard_frame_pointer_rtx, -callee_offset,
+ ip1_rtx, callee_adjust == 0);
else
- aarch64_add_sp (IP1_REGNUM, final_adjust,
+ aarch64_add_sp (ip1_rtx, final_adjust,
/* A stack clash protection prologue may not have
left IP1_REGNUM in a usable state. */
(flag_stack_clash_protection
@@ -4000,7 +4017,7 @@ aarch64_expand_epilogue (bool for_sibcal
/* A stack clash protection prologue may not have left IP0_REGNUM
in a usable state. */
- aarch64_add_sp (IP0_REGNUM, initial_adjust,
+ aarch64_add_sp (ip0_rtx, initial_adjust,
(flag_stack_clash_protection
|| df_regs_ever_live_p (IP0_REGNUM)));
@@ -4107,16 +4124,16 @@ aarch64_output_mi_thunk (FILE *file, tre
reload_completed = 1;
emit_note (NOTE_INSN_PROLOGUE_END);
+ this_rtx = gen_rtx_REG (Pmode, this_regno);
+ temp0 = gen_rtx_REG (Pmode, IP0_REGNUM);
+ temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
+
if (vcall_offset == 0)
- aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1, false);
else
{
gcc_assert ((vcall_offset & (POINTER_BYTES - 1)) == 0);
- this_rtx = gen_rtx_REG (Pmode, this_regno);
- temp0 = gen_rtx_REG (Pmode, IP0_REGNUM);
- temp1 = gen_rtx_REG (Pmode, IP1_REGNUM);
-
addr = this_rtx;
if (delta != 0)
{
@@ -4124,7 +4141,8 @@ aarch64_output_mi_thunk (FILE *file, tre
addr = gen_rtx_PRE_MODIFY (Pmode, this_rtx,
plus_constant (Pmode, this_rtx, delta));
else
- aarch64_add_constant (Pmode, this_regno, IP1_REGNUM, delta);
+ aarch64_add_offset (Pmode, this_rtx, this_rtx, delta, temp1,
+ false);
}
if (Pmode == ptr_mode)
Index: gcc/testsuite/gcc.target/aarch64/pr70044.c
===================================================================
--- gcc/testsuite/gcc.target/aarch64/pr70044.c 2017-10-30 10:45:23.806582919 +0000
+++ gcc/testsuite/gcc.target/aarch64/pr70044.c 2017-10-30 10:46:47.278836421 +0000
@@ -11,4 +11,4 @@ main (int argc, char **argv)
}
/* Check that the frame pointer really is created. */
-/* { dg-final { scan-lto-assembler "add x29, sp," } } */
+/* { dg-final { scan-lto-assembler "(mov|add) x29, sp" } } */
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [01/nn] [AArch64] Generate permute patterns using rtx builders
2017-10-27 13:23 ` [01/nn] [AArch64] Generate permute patterns using rtx builders Richard Sandiford
@ 2017-10-31 18:02 ` James Greenhalgh
2017-11-02 9:03 ` Richard Sandiford
0 siblings, 1 reply; 29+ messages in thread
From: James Greenhalgh @ 2017-10-31 18:02 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:22:39PM +0100, Richard Sandiford wrote:
> This patch replaces switch statements that call specific generator
> functions with code that constructs the rtl pattern directly.
> This seemed to scale better to SVE and also seems less error-prone.
>
> As a side-effect, the patch fixes the REV handling for diff==1,
> vmode==E_V4HFmode and adds missing support for diff==3,
> vmode==E_V4HFmode.
>
> To compensate for the lack of switches that check for specific modes,
> the patch makes aarch64_expand_vec_perm_const_1 reject permutes on
> single-element vectors (specifically V1DImode).
OK.
Would you mind placing a comment somewhere near both the unspecs, and the
patterns using these unspecs to warn that the calls constructing the
RTX here *MUST* be kept in sync?
Some of these patterns are probably used rarely enough that we could easily
miss an unreconizable insn.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
>
>
> 2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp)
> (aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev)
> (aarch64_evpc_dup): Generate rtl direcly, rather than using
> named expanders.
> (aarch64_expand_vec_perm_const_1): Explicitly check for permutes
> of a single element.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [02/nn] [AArch64] Move code around
2017-10-27 13:25 ` [02/nn] [AArch64] Move code around Richard Sandiford
@ 2017-10-31 18:03 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-10-31 18:03 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:23:30PM +0100, Richard Sandiford wrote:
> This patch simply moves code around, in order to make the later
> patches easier to read, and to avoid forward declarations.
> It doesn't add the missing function comments because the interfaces
> will change in a later patch.
OK.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
>
>
> 2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64.c (aarch64_add_constant_internal)
> (aarch64_add_constant, aarch64_add_sp, aarch64_sub_sp): Move
> earlier in file.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [04/nn] [AArch64] Rename the internal "Upl" constraint
2017-10-27 13:27 ` [04/nn] [AArch64] Rename the internal "Upl" constraint Richard Sandiford
@ 2017-10-31 18:04 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-10-31 18:04 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:25:56PM +0100, Richard Sandiford wrote:
> The SVE port uses the public constraints "Upl" and "Upa" to mean
> "low predicate register" and "any predicate register" respectively.
> "Upl" was already used as an internal-only constraint by the
> addition patterns, so this patch renames it to "Uaa" ("two adds
> needed").
OK.
Reviewed-By: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
>
>
> 2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/constraints.md (Upl): Rename to...
> (Uaa): ...this.
> * config/aarch64/aarch64.md
> (*zero_extend<SHORT:mode><GPI:mode>2_aarch64, *addsi3_aarch64_uxtw):
> Update accordingly.
>
> Index: gcc/config/aarch64/constraints.md
> ===================================================================
> --- gcc/config/aarch64/constraints.md 2017-10-27 14:06:16.159815485 +0100
> +++ gcc/config/aarch64/constraints.md 2017-10-27 14:11:54.071011147 +0100
> @@ -35,7 +35,7 @@ (define_constraint "I"
> (and (match_code "const_int")
> (match_test "aarch64_uimm12_shift (ival)")))
>
> -(define_constraint "Upl"
> +(define_constraint "Uaa"
> "@internal A constant that matches two uses of add instructions."
> (and (match_code "const_int")
> (match_test "aarch64_pluslong_strict_immedate (op, VOIDmode)")))
> Index: gcc/config/aarch64/aarch64.md
> ===================================================================
> --- gcc/config/aarch64/aarch64.md 2017-10-27 14:07:01.875769946 +0100
> +++ gcc/config/aarch64/aarch64.md 2017-10-27 14:11:54.071011147 +0100
> @@ -1562,7 +1562,7 @@ (define_insn "*add<mode>3_aarch64"
> (match_operand:GPI 0 "register_operand" "=rk,rk,w,rk,r")
> (plus:GPI
> (match_operand:GPI 1 "register_operand" "%rk,rk,w,rk,rk")
> - (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Upl")))]
> + (match_operand:GPI 2 "aarch64_pluslong_operand" "I,r,w,J,Uaa")))]
> ""
> "@
> add\\t%<w>0, %<w>1, %2
> @@ -1580,7 +1580,7 @@ (define_insn "*addsi3_aarch64_uxtw"
> (match_operand:DI 0 "register_operand" "=rk,rk,rk,r")
> (zero_extend:DI
> (plus:SI (match_operand:SI 1 "register_operand" "%rk,rk,rk,rk")
> - (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Upl"))))]
> + (match_operand:SI 2 "aarch64_pluslong_operand" "I,r,J,Uaa"))))]
> ""
> "@
> add\\t%w0, %w1, %2
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [10/nn] [AArch64] Minor rtx costs tweak
2017-10-27 13:31 ` [10/nn] [AArch64] Minor rtx costs tweak Richard Sandiford
@ 2017-10-31 18:25 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-10-31 18:25 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:30:18PM +0100, Richard Sandiford wrote:
> aarch64_rtx_costs uses the number of registers in a mode as the basis
> of SET costs. This patch makes it get the number of registers from
> aarch64_hard_regno_nregs rather than repeating the calcalation inline.
> Handling SVE modes in aarch64_hard_regno_nregs is then enough to get
> the correct SET cost as well.
OK.
Reviewed-By: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
>
>
> 2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64.c (aarch64_rtx_costs): Use
> aarch64_hard_regno_nregs to get the number of registers
> in a mode.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [01/nn] [AArch64] Generate permute patterns using rtx builders
2017-10-31 18:02 ` James Greenhalgh
@ 2017-11-02 9:03 ` Richard Sandiford
0 siblings, 0 replies; 29+ messages in thread
From: Richard Sandiford @ 2017-11-02 9:03 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, richard.earnshaw, marcus.shawcroft, nd
James Greenhalgh <james.greenhalgh@arm.com> writes:
> On Fri, Oct 27, 2017 at 02:22:39PM +0100, Richard Sandiford wrote:
>> This patch replaces switch statements that call specific generator
>> functions with code that constructs the rtl pattern directly.
>> This seemed to scale better to SVE and also seems less error-prone.
>>
>> As a side-effect, the patch fixes the REV handling for diff==1,
>> vmode==E_V4HFmode and adds missing support for diff==3,
>> vmode==E_V4HFmode.
>>
>> To compensate for the lack of switches that check for specific modes,
>> the patch makes aarch64_expand_vec_perm_const_1 reject permutes on
>> single-element vectors (specifically V1DImode).
>
> OK.
>
> Would you mind placing a comment somewhere near both the unspecs, and the
> patterns using these unspecs to warn that the calls constructing the
> RTX here *MUST* be kept in sync?
OK, here's what I committed.
Thanks for the reviews,
Richard
2017-11-01 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64.c (aarch64_evpc_trn, aarch64_evpc_uzp)
(aarch64_evpc_zip, aarch64_evpc_ext, aarch64_evpc_rev)
(aarch64_evpc_dup): Generate rtl direcly, rather than using
named expanders.
(aarch64_expand_vec_perm_const_1): Explicitly check for permutes
of a single element.
* config/aarch64/iterators.md: Add a comment above the permute
unspecs to say that they are generated directly by
aarch64_expand_vec_perm_const.
* config/aarch64/aarch64-simd.md: Likewise the permute instructions.
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2017-11-01 09:20:07.343478870 +0000
+++ gcc/config/aarch64/aarch64.c 2017-11-01 20:35:54.431165938 +0000
@@ -13263,7 +13263,6 @@ aarch64_evpc_trn (struct expand_vec_perm
{
unsigned int i, odd, mask, nelt = d->perm.length ();
rtx out, in0, in1, x;
- rtx (*gen) (rtx, rtx, rtx);
machine_mode vmode = d->vmode;
if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13300,48 +13299,8 @@ aarch64_evpc_trn (struct expand_vec_perm
}
out = d->target;
- if (odd)
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_trn2v16qi; break;
- case E_V8QImode: gen = gen_aarch64_trn2v8qi; break;
- case E_V8HImode: gen = gen_aarch64_trn2v8hi; break;
- case E_V4HImode: gen = gen_aarch64_trn2v4hi; break;
- case E_V4SImode: gen = gen_aarch64_trn2v4si; break;
- case E_V2SImode: gen = gen_aarch64_trn2v2si; break;
- case E_V2DImode: gen = gen_aarch64_trn2v2di; break;
- case E_V4HFmode: gen = gen_aarch64_trn2v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_trn2v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_trn2v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_trn2v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_trn2v2df; break;
- default:
- return false;
- }
- }
- else
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_trn1v16qi; break;
- case E_V8QImode: gen = gen_aarch64_trn1v8qi; break;
- case E_V8HImode: gen = gen_aarch64_trn1v8hi; break;
- case E_V4HImode: gen = gen_aarch64_trn1v4hi; break;
- case E_V4SImode: gen = gen_aarch64_trn1v4si; break;
- case E_V2SImode: gen = gen_aarch64_trn1v2si; break;
- case E_V2DImode: gen = gen_aarch64_trn1v2di; break;
- case E_V4HFmode: gen = gen_aarch64_trn1v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_trn1v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_trn1v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_trn1v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_trn1v2df; break;
- default:
- return false;
- }
- }
-
- emit_insn (gen (out, in0, in1));
+ emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ odd ? UNSPEC_TRN2 : UNSPEC_TRN1));
return true;
}
@@ -13351,7 +13310,6 @@ aarch64_evpc_uzp (struct expand_vec_perm
{
unsigned int i, odd, mask, nelt = d->perm.length ();
rtx out, in0, in1, x;
- rtx (*gen) (rtx, rtx, rtx);
machine_mode vmode = d->vmode;
if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13387,48 +13345,8 @@ aarch64_evpc_uzp (struct expand_vec_perm
}
out = d->target;
- if (odd)
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_uzp2v16qi; break;
- case E_V8QImode: gen = gen_aarch64_uzp2v8qi; break;
- case E_V8HImode: gen = gen_aarch64_uzp2v8hi; break;
- case E_V4HImode: gen = gen_aarch64_uzp2v4hi; break;
- case E_V4SImode: gen = gen_aarch64_uzp2v4si; break;
- case E_V2SImode: gen = gen_aarch64_uzp2v2si; break;
- case E_V2DImode: gen = gen_aarch64_uzp2v2di; break;
- case E_V4HFmode: gen = gen_aarch64_uzp2v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_uzp2v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_uzp2v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_uzp2v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_uzp2v2df; break;
- default:
- return false;
- }
- }
- else
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_uzp1v16qi; break;
- case E_V8QImode: gen = gen_aarch64_uzp1v8qi; break;
- case E_V8HImode: gen = gen_aarch64_uzp1v8hi; break;
- case E_V4HImode: gen = gen_aarch64_uzp1v4hi; break;
- case E_V4SImode: gen = gen_aarch64_uzp1v4si; break;
- case E_V2SImode: gen = gen_aarch64_uzp1v2si; break;
- case E_V2DImode: gen = gen_aarch64_uzp1v2di; break;
- case E_V4HFmode: gen = gen_aarch64_uzp1v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_uzp1v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_uzp1v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_uzp1v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_uzp1v2df; break;
- default:
- return false;
- }
- }
-
- emit_insn (gen (out, in0, in1));
+ emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ odd ? UNSPEC_UZP2 : UNSPEC_UZP1));
return true;
}
@@ -13438,7 +13356,6 @@ aarch64_evpc_zip (struct expand_vec_perm
{
unsigned int i, high, mask, nelt = d->perm.length ();
rtx out, in0, in1, x;
- rtx (*gen) (rtx, rtx, rtx);
machine_mode vmode = d->vmode;
if (GET_MODE_UNIT_SIZE (vmode) > 8)
@@ -13479,48 +13396,8 @@ aarch64_evpc_zip (struct expand_vec_perm
}
out = d->target;
- if (high)
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_zip2v16qi; break;
- case E_V8QImode: gen = gen_aarch64_zip2v8qi; break;
- case E_V8HImode: gen = gen_aarch64_zip2v8hi; break;
- case E_V4HImode: gen = gen_aarch64_zip2v4hi; break;
- case E_V4SImode: gen = gen_aarch64_zip2v4si; break;
- case E_V2SImode: gen = gen_aarch64_zip2v2si; break;
- case E_V2DImode: gen = gen_aarch64_zip2v2di; break;
- case E_V4HFmode: gen = gen_aarch64_zip2v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_zip2v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_zip2v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_zip2v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_zip2v2df; break;
- default:
- return false;
- }
- }
- else
- {
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_zip1v16qi; break;
- case E_V8QImode: gen = gen_aarch64_zip1v8qi; break;
- case E_V8HImode: gen = gen_aarch64_zip1v8hi; break;
- case E_V4HImode: gen = gen_aarch64_zip1v4hi; break;
- case E_V4SImode: gen = gen_aarch64_zip1v4si; break;
- case E_V2SImode: gen = gen_aarch64_zip1v2si; break;
- case E_V2DImode: gen = gen_aarch64_zip1v2di; break;
- case E_V4HFmode: gen = gen_aarch64_zip1v4hf; break;
- case E_V8HFmode: gen = gen_aarch64_zip1v8hf; break;
- case E_V4SFmode: gen = gen_aarch64_zip1v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_zip1v2sf; break;
- case E_V2DFmode: gen = gen_aarch64_zip1v2df; break;
- default:
- return false;
- }
- }
-
- emit_insn (gen (out, in0, in1));
+ emit_set_insn (out, gen_rtx_UNSPEC (vmode, gen_rtvec (2, in0, in1),
+ high ? UNSPEC_ZIP2 : UNSPEC_ZIP1));
return true;
}
@@ -13530,7 +13407,6 @@ aarch64_evpc_zip (struct expand_vec_perm
aarch64_evpc_ext (struct expand_vec_perm_d *d)
{
unsigned int i, nelt = d->perm.length ();
- rtx (*gen) (rtx, rtx, rtx, rtx);
rtx offset;
unsigned int location = d->perm[0]; /* Always < nelt. */
@@ -13548,24 +13424,6 @@ aarch64_evpc_ext (struct expand_vec_perm
return false;
}
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_extv16qi; break;
- case E_V8QImode: gen = gen_aarch64_extv8qi; break;
- case E_V4HImode: gen = gen_aarch64_extv4hi; break;
- case E_V8HImode: gen = gen_aarch64_extv8hi; break;
- case E_V2SImode: gen = gen_aarch64_extv2si; break;
- case E_V4SImode: gen = gen_aarch64_extv4si; break;
- case E_V4HFmode: gen = gen_aarch64_extv4hf; break;
- case E_V8HFmode: gen = gen_aarch64_extv8hf; break;
- case E_V2SFmode: gen = gen_aarch64_extv2sf; break;
- case E_V4SFmode: gen = gen_aarch64_extv4sf; break;
- case E_V2DImode: gen = gen_aarch64_extv2di; break;
- case E_V2DFmode: gen = gen_aarch64_extv2df; break;
- default:
- return false;
- }
-
/* Success! */
if (d->testing_p)
return true;
@@ -13584,7 +13442,10 @@ aarch64_evpc_ext (struct expand_vec_perm
}
offset = GEN_INT (location);
- emit_insn (gen (d->target, d->op0, d->op1, offset));
+ emit_set_insn (d->target,
+ gen_rtx_UNSPEC (d->vmode,
+ gen_rtvec (3, d->op0, d->op1, offset),
+ UNSPEC_EXT));
return true;
}
@@ -13593,55 +13454,21 @@ aarch64_evpc_ext (struct expand_vec_perm
static bool
aarch64_evpc_rev (struct expand_vec_perm_d *d)
{
- unsigned int i, j, diff, nelt = d->perm.length ();
- rtx (*gen) (rtx, rtx);
+ unsigned int i, j, diff, size, unspec, nelt = d->perm.length ();
if (!d->one_vector_p)
return false;
diff = d->perm[0];
- switch (diff)
- {
- case 7:
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_rev64v16qi; break;
- case E_V8QImode: gen = gen_aarch64_rev64v8qi; break;
- default:
- return false;
- }
- break;
- case 3:
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_rev32v16qi; break;
- case E_V8QImode: gen = gen_aarch64_rev32v8qi; break;
- case E_V8HImode: gen = gen_aarch64_rev64v8hi; break;
- case E_V4HImode: gen = gen_aarch64_rev64v4hi; break;
- default:
- return false;
- }
- break;
- case 1:
- switch (d->vmode)
- {
- case E_V16QImode: gen = gen_aarch64_rev16v16qi; break;
- case E_V8QImode: gen = gen_aarch64_rev16v8qi; break;
- case E_V8HImode: gen = gen_aarch64_rev32v8hi; break;
- case E_V4HImode: gen = gen_aarch64_rev32v4hi; break;
- case E_V4SImode: gen = gen_aarch64_rev64v4si; break;
- case E_V2SImode: gen = gen_aarch64_rev64v2si; break;
- case E_V4SFmode: gen = gen_aarch64_rev64v4sf; break;
- case E_V2SFmode: gen = gen_aarch64_rev64v2sf; break;
- case E_V8HFmode: gen = gen_aarch64_rev64v8hf; break;
- case E_V4HFmode: gen = gen_aarch64_rev64v4hf; break;
- default:
- return false;
- }
- break;
- default:
- return false;
- }
+ size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
+ if (size == 8)
+ unspec = UNSPEC_REV64;
+ else if (size == 4)
+ unspec = UNSPEC_REV32;
+ else if (size == 2)
+ unspec = UNSPEC_REV16;
+ else
+ return false;
for (i = 0; i < nelt ; i += diff + 1)
for (j = 0; j <= diff; j += 1)
@@ -13660,14 +13487,14 @@ aarch64_evpc_rev (struct expand_vec_perm
if (d->testing_p)
return true;
- emit_insn (gen (d->target, d->op0));
+ emit_set_insn (d->target, gen_rtx_UNSPEC (d->vmode, gen_rtvec (1, d->op0),
+ unspec));
return true;
}
static bool
aarch64_evpc_dup (struct expand_vec_perm_d *d)
{
- rtx (*gen) (rtx, rtx, rtx);
rtx out = d->target;
rtx in0;
machine_mode vmode = d->vmode;
@@ -13689,25 +13516,9 @@ aarch64_evpc_dup (struct expand_vec_perm
in0 = d->op0;
lane = GEN_INT (elt); /* The pattern corrects for big-endian. */
- switch (vmode)
- {
- case E_V16QImode: gen = gen_aarch64_dup_lanev16qi; break;
- case E_V8QImode: gen = gen_aarch64_dup_lanev8qi; break;
- case E_V8HImode: gen = gen_aarch64_dup_lanev8hi; break;
- case E_V4HImode: gen = gen_aarch64_dup_lanev4hi; break;
- case E_V4SImode: gen = gen_aarch64_dup_lanev4si; break;
- case E_V2SImode: gen = gen_aarch64_dup_lanev2si; break;
- case E_V2DImode: gen = gen_aarch64_dup_lanev2di; break;
- case E_V8HFmode: gen = gen_aarch64_dup_lanev8hf; break;
- case E_V4HFmode: gen = gen_aarch64_dup_lanev4hf; break;
- case E_V4SFmode: gen = gen_aarch64_dup_lanev4sf; break;
- case E_V2SFmode: gen = gen_aarch64_dup_lanev2sf; break;
- case E_V2DFmode: gen = gen_aarch64_dup_lanev2df; break;
- default:
- return false;
- }
-
- emit_insn (gen (out, in0, lane));
+ rtx parallel = gen_rtx_PARALLEL (vmode, gen_rtvec (1, lane));
+ rtx select = gen_rtx_VEC_SELECT (GET_MODE_INNER (vmode), in0, parallel);
+ emit_set_insn (out, gen_rtx_VEC_DUPLICATE (vmode, select));
return true;
}
@@ -13760,7 +13571,7 @@ aarch64_expand_vec_perm_const_1 (struct
std::swap (d->op0, d->op1);
}
- if (TARGET_SIMD)
+ if (TARGET_SIMD && nelt > 1)
{
if (aarch64_evpc_rev (d))
return true;
Index: gcc/config/aarch64/iterators.md
===================================================================
--- gcc/config/aarch64/iterators.md 2017-11-01 08:07:13.560976713 +0000
+++ gcc/config/aarch64/iterators.md 2017-11-01 20:35:54.431165938 +0000
@@ -322,16 +322,21 @@ (define_c_enum "unspec"
UNSPEC_TBL ; Used in vector permute patterns.
UNSPEC_TBX ; Used in vector permute patterns.
UNSPEC_CONCAT ; Used in vector permute patterns.
+
+ ;; The following permute unspecs are generated directly by
+ ;; aarch64_expand_vec_perm_const, so any changes to the underlying
+ ;; instructions would need a corresponding change there.
UNSPEC_ZIP1 ; Used in vector permute patterns.
UNSPEC_ZIP2 ; Used in vector permute patterns.
UNSPEC_UZP1 ; Used in vector permute patterns.
UNSPEC_UZP2 ; Used in vector permute patterns.
UNSPEC_TRN1 ; Used in vector permute patterns.
UNSPEC_TRN2 ; Used in vector permute patterns.
- UNSPEC_EXT ; Used in aarch64-simd.md.
+ UNSPEC_EXT ; Used in vector permute patterns.
UNSPEC_REV64 ; Used in vector reverse patterns (permute).
UNSPEC_REV32 ; Used in vector reverse patterns (permute).
UNSPEC_REV16 ; Used in vector reverse patterns (permute).
+
UNSPEC_AESE ; Used in aarch64-simd.md.
UNSPEC_AESD ; Used in aarch64-simd.md.
UNSPEC_AESMC ; Used in aarch64-simd.md.
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2017-11-01 08:07:13.561934013 +0000
+++ gcc/config/aarch64/aarch64-simd.md 2017-11-01 20:35:54.427167006 +0000
@@ -5369,6 +5369,9 @@ (define_insn_and_split "aarch64_combinev
[(set_attr "type" "multiple")]
)
+;; This instruction's pattern is generated directly by
+;; aarch64_expand_vec_perm_const, so any changes to the pattern would
+;; need corresponding changes there.
(define_insn "aarch64_<PERMUTE:perm_insn><PERMUTE:perm_hilo><mode>"
[(set (match_operand:VALL_F16 0 "register_operand" "=w")
(unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
@@ -5379,7 +5382,10 @@ (define_insn "aarch64_<PERMUTE:perm_insn
[(set_attr "type" "neon_permute<q>")]
)
-;; Note immediate (third) operand is lane index not byte index.
+;; This instruction's pattern is generated directly by
+;; aarch64_expand_vec_perm_const, so any changes to the pattern would
+;; need corresponding changes there. Note that the immediate (third)
+;; operand is a lane index not a byte index.
(define_insn "aarch64_ext<mode>"
[(set (match_operand:VALL_F16 0 "register_operand" "=w")
(unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")
@@ -5395,6 +5401,9 @@ (define_insn "aarch64_ext<mode>"
[(set_attr "type" "neon_ext<q>")]
)
+;; This instruction's pattern is generated directly by
+;; aarch64_expand_vec_perm_const, so any changes to the pattern would
+;; need corresponding changes there.
(define_insn "aarch64_rev<REVERSE:rev_op><mode>"
[(set (match_operand:VALL_F16 0 "register_operand" "=w")
(unspec:VALL_F16 [(match_operand:VALL_F16 1 "register_operand" "w")]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [06/nn] [AArch64] Add an endian_lane_rtx helper routine
2017-10-27 13:28 ` [06/nn] [AArch64] Add an endian_lane_rtx helper routine Richard Sandiford
@ 2017-11-02 9:55 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-02 9:55 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:27:50PM +0100, Richard Sandiford wrote:
> Later patches turn the number of vector units into a poly_int.
> We deliberately don't support applying GEN_INT to those (except
> in target code that doesn't disguish between poly_ints and normal
> constants); gen_int_mode needs to be used instead.
>
> This patch therefore replaces instances of:
>
> GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])))
>
> with uses of a new endian_lane_rtx function.
OK.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
>
>
> 2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64-protos.h (aarch64_endian_lane_rtx): Declare.
> * config/aarch64/aarch64.c (aarch64_endian_lane_rtx): New function.
> * config/aarch64/aarch64.h (ENDIAN_LANE_N): Take the number
> of units rather than the mode.
> * config/aarch64/iterators.md (nunits): New mode attribute.
> * config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args):
> Use aarch64_endian_lane_rtx instead of GEN_INT (ENDIAN_LANE_N ...).
> * config/aarch64/aarch64-simd.md (aarch64_dup_lane<mode>)
> (aarch64_dup_lane_<vswap_width_name><mode>, *aarch64_mul3_elt<mode>)
> (*aarch64_mul3_elt_<vswap_width_name><mode>): Likewise.
> (*aarch64_mul3_elt_to_64v2df, *aarch64_mla_elt<mode>): Likewise.
> (*aarch64_mla_elt_<vswap_width_name><mode>, *aarch64_mls_elt<mode>)
> (*aarch64_mls_elt_<vswap_width_name><mode>, *aarch64_fma4_elt<mode>)
> (*aarch64_fma4_elt_<vswap_width_name><mode>):: Likewise.
> (*aarch64_fma4_elt_to_64v2df, *aarch64_fnma4_elt<mode>): Likewise.
> (*aarch64_fnma4_elt_<vswap_width_name><mode>): Likewise.
> (*aarch64_fnma4_elt_to_64v2df, reduc_plus_scal_<mode>): Likewise.
> (reduc_plus_scal_v4sf, reduc_<maxmin_uns>_scal_<mode>): Likewise.
> (reduc_<maxmin_uns>_scal_<mode>): Likewise.
> (*aarch64_get_lane_extend<GPI:mode><VDQQH:mode>): Likewise.
> (*aarch64_get_lane_zero_extendsi<mode>): Likewise.
> (aarch64_get_lane<mode>, *aarch64_mulx_elt_<vswap_width_name><mode>)
> (*aarch64_mulx_elt<mode>, *aarch64_vgetfmulx<mode>): Likewise.
> (aarch64_sq<r>dmulh_lane<mode>, aarch64_sq<r>dmulh_laneq<mode>)
> (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_lane<mode>): Likewise.
> (aarch64_sqrdml<SQRDMLH_AS:rdma_as>h_laneq<mode>): Likewise.
> (aarch64_sqdml<SBINQOPS:as>l_lane<mode>): Likewise.
> (aarch64_sqdml<SBINQOPS:as>l_laneq<mode>): Likewise.
> (aarch64_sqdml<SBINQOPS:as>l2_lane<mode>_internal): Likewise.
> (aarch64_sqdml<SBINQOPS:as>l2_laneq<mode>_internal): Likewise.
> (aarch64_sqdmull_lane<mode>, aarch64_sqdmull_laneq<mode>): Likewise.
> (aarch64_sqdmull2_lane<mode>_internal): Likewise.
> (aarch64_sqdmull2_laneq<mode>_internal): Likewise.
> (aarch64_vec_load_lanesoi_lane<mode>): Likewise.
> (aarch64_vec_store_lanesoi_lane<mode>): Likewise.
> (aarch64_vec_load_lanesci_lane<mode>): Likewise.
> (aarch64_vec_store_lanesci_lane<mode>): Likewise.
> (aarch64_vec_load_lanesxi_lane<mode>): Likewise.
> (aarch64_vec_store_lanesxi_lane<mode>): Likewise.
> (aarch64_simd_vec_set<mode>): Update use of ENDIAN_LANE_N.
> (aarch64_simd_vec_setv2di): Likewise.
>
> Index: gcc/config/aarch64/aarch64-protos.h
> ===================================================================
> --- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:11:56.993658452 +0100
> +++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:00.601693018 +0100
> @@ -437,6 +437,7 @@ void aarch64_simd_emit_reg_reg_move (rtx
> rtx aarch64_simd_expand_builtin (int, tree, rtx);
>
> void aarch64_simd_lane_bounds (rtx, HOST_WIDE_INT, HOST_WIDE_INT, const_tree);
> +rtx aarch64_endian_lane_rtx (machine_mode, unsigned int);
>
> void aarch64_split_128bit_move (rtx, rtx);
>
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c 2017-10-27 14:11:56.995515870 +0100
> +++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:00.603550436 +0100
> @@ -12083,6 +12083,15 @@ aarch64_simd_lane_bounds (rtx operand, H
> }
> }
>
> +/* Peform endian correction on lane number N, which indexes a vector
> + of mode MODE, and return the result as an SImode rtx. */
> +
> +rtx
> +aarch64_endian_lane_rtx (machine_mode mode, unsigned int n)
> +{
> + return gen_int_mode (ENDIAN_LANE_N (GET_MODE_NUNITS (mode), n), SImode);
> +}
> +
> /* Return TRUE if OP is a valid vector addressing mode. */
> bool
> aarch64_simd_mem_operand_p (rtx op)
> Index: gcc/config/aarch64/aarch64.h
> ===================================================================
> --- gcc/config/aarch64/aarch64.h 2017-10-27 14:05:38.132936808 +0100
> +++ gcc/config/aarch64/aarch64.h 2017-10-27 14:12:00.603550436 +0100
> @@ -910,8 +910,8 @@ #define AARCH64_VALID_SIMD_QREG_MODE(MOD
> || (MODE) == V4SFmode || (MODE) == V8HFmode || (MODE) == V2DImode \
> || (MODE) == V2DFmode)
>
> -#define ENDIAN_LANE_N(mode, n) \
> - (BYTES_BIG_ENDIAN ? GET_MODE_NUNITS (mode) - 1 - n : n)
> +#define ENDIAN_LANE_N(NUNITS, N) \
> + (BYTES_BIG_ENDIAN ? NUNITS - 1 - N : N)
>
> /* Support for a configure-time default CPU, etc. We currently support
> --with-arch and --with-cpu. Both are ignored if either is specified
> Index: gcc/config/aarch64/iterators.md
> ===================================================================
> --- gcc/config/aarch64/iterators.md 2017-10-27 14:11:56.995515870 +0100
> +++ gcc/config/aarch64/iterators.md 2017-10-27 14:12:00.604479145 +0100
> @@ -438,6 +438,17 @@ (define_mode_attr vw2 [(DI "") (QI "h")
> (define_mode_attr rtn [(DI "d") (SI "")])
> (define_mode_attr vas [(DI "") (SI ".2s")])
>
> +;; Map a vector to the number of units in it, if the size of the mode
> +;; is constant.
> +(define_mode_attr nunits [(V8QI "8") (V16QI "16")
> + (V4HI "4") (V8HI "8")
> + (V2SI "2") (V4SI "4")
> + (V2DI "2")
> + (V4HF "4") (V8HF "8")
> + (V2SF "2") (V4SF "4")
> + (V1DF "1") (V2DF "2")
> + (DI "1") (DF "1")])
> +
> ;; Map a mode to the number of bits in it, if the size of the mode
> ;; is constant.
> (define_mode_attr bitsize [(V8QI "64") (V16QI "128")
> Index: gcc/config/aarch64/aarch64-builtins.c
> ===================================================================
> --- gcc/config/aarch64/aarch64-builtins.c 2017-10-27 14:05:38.132936808 +0100
> +++ gcc/config/aarch64/aarch64-builtins.c 2017-10-27 14:12:00.601693018 +0100
> @@ -1069,8 +1069,8 @@ aarch64_simd_expand_args (rtx target, in
> GET_MODE_NUNITS (builtin_mode),
> exp);
> /* Keep to GCC-vector-extension lane indices in the RTL. */
> - op[opc] =
> - GEN_INT (ENDIAN_LANE_N (builtin_mode, INTVAL (op[opc])));
> + op[opc] = aarch64_endian_lane_rtx (builtin_mode,
> + INTVAL (op[opc]));
> }
> goto constant_arg;
>
> @@ -1083,7 +1083,7 @@ aarch64_simd_expand_args (rtx target, in
> aarch64_simd_lane_bounds (op[opc],
> 0, GET_MODE_NUNITS (vmode), exp);
> /* Keep to GCC-vector-extension lane indices in the RTL. */
> - op[opc] = GEN_INT (ENDIAN_LANE_N (vmode, INTVAL (op[opc])));
> + op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
> }
> /* Fall through - if the lane index isn't a constant then
> the next case will error. */
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:11:56.994587161 +0100
> +++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:00.602621727 +0100
> @@ -80,7 +80,7 @@ (define_insn "aarch64_dup_lane<mode>"
> )))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "dup\\t%0.<Vtype>, %1.<Vetype>[%2]";
> }
> [(set_attr "type" "neon_dup<q>")]
> @@ -95,8 +95,7 @@ (define_insn "aarch64_dup_lane_<vswap_wi
> )))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
> return "dup\\t%0.<Vtype>, %1.<Vetype>[%2]";
> }
> [(set_attr "type" "neon_dup<q>")]
> @@ -501,7 +500,7 @@ (define_insn "*aarch64_mul3_elt<mode>"
> (match_operand:VMUL 3 "register_operand" "w")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
> }
> [(set_attr "type" "neon<fp>_mul_<stype>_scalar<q>")]
> @@ -517,8 +516,7 @@ (define_insn "*aarch64_mul3_elt_<vswap_w
> (match_operand:VMUL_CHANGE_NLANES 3 "register_operand" "w")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
> return "<f>mul\\t%0.<Vtype>, %3.<Vtype>, %1.<Vetype>[%2]";
> }
> [(set_attr "type" "neon<fp>_mul_<Vetype>_scalar<q>")]
> @@ -571,7 +569,7 @@ (define_insn "*aarch64_mul3_elt_to_64v2d
> (match_operand:DF 3 "register_operand" "w")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (V2DFmode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (V2DFmode, INTVAL (operands[2]));
> return "fmul\\t%0.2d, %3.2d, %1.d[%2]";
> }
> [(set_attr "type" "neon_fp_mul_d_scalar_q")]
> @@ -706,7 +704,7 @@ (define_insn "aarch64_simd_vec_set<mode>
> (match_operand:SI 2 "immediate_operand" "i,i,i")))]
> "TARGET_SIMD"
> {
> - int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
> + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2])));
> operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
> switch (which_alternative)
> {
> @@ -1072,7 +1070,7 @@ (define_insn "aarch64_simd_vec_setv2di"
> (match_operand:SI 2 "immediate_operand" "i,i")))]
> "TARGET_SIMD"
> {
> - int elt = ENDIAN_LANE_N (V2DImode, exact_log2 (INTVAL (operands[2])));
> + int elt = ENDIAN_LANE_N (2, exact_log2 (INTVAL (operands[2])));
> operands[2] = GEN_INT ((HOST_WIDE_INT) 1 << elt);
> switch (which_alternative)
> {
> @@ -1109,7 +1107,7 @@ (define_insn "aarch64_simd_vec_set<mode>
> (match_operand:SI 2 "immediate_operand" "i")))]
> "TARGET_SIMD"
> {
> - int elt = ENDIAN_LANE_N (<MODE>mode, exact_log2 (INTVAL (operands[2])));
> + int elt = ENDIAN_LANE_N (<nunits>, exact_log2 (INTVAL (operands[2])));
>
> operands[2] = GEN_INT ((HOST_WIDE_INT)1 << elt);
> return "ins\t%0.<Vetype>[%p2], %1.<Vetype>[0]";
> @@ -1154,7 +1152,7 @@ (define_insn "*aarch64_mla_elt<mode>"
> (match_operand:VDQHS 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "mla\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
> @@ -1172,8 +1170,7 @@ (define_insn "*aarch64_mla_elt_<vswap_wi
> (match_operand:VDQHS 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
> return "mla\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
> @@ -1213,7 +1210,7 @@ (define_insn "*aarch64_mls_elt<mode>"
> (match_operand:VDQHS 3 "register_operand" "w"))))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "mls\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
> @@ -1231,8 +1228,7 @@ (define_insn "*aarch64_mls_elt_<vswap_wi
> (match_operand:VDQHS 3 "register_operand" "w"))))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
> return "mls\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_mla_<Vetype>_scalar<q>")]
> @@ -1802,7 +1798,7 @@ (define_insn "*aarch64_fma4_elt<mode>"
> (match_operand:VDQF 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "fmla\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
> @@ -1819,8 +1815,7 @@ (define_insn "*aarch64_fma4_elt_<vswap_w
> (match_operand:VDQSF 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
> return "fmla\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
> @@ -1848,7 +1843,7 @@ (define_insn "*aarch64_fma4_elt_to_64v2d
> (match_operand:DF 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (V2DFmode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (V2DFmode, INTVAL (operands[2]));
> return "fmla\\t%0.2d, %3.2d, %1.2d[%2]";
> }
> [(set_attr "type" "neon_fp_mla_d_scalar_q")]
> @@ -1878,7 +1873,7 @@ (define_insn "*aarch64_fnma4_elt<mode>"
> (match_operand:VDQF 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "fmls\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
> @@ -1896,8 +1891,7 @@ (define_insn "*aarch64_fnma4_elt_<vswap_
> (match_operand:VDQSF 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[2]));
> return "fmls\\t%0.<Vtype>, %3.<Vtype>, %1.<Vtype>[%2]";
> }
> [(set_attr "type" "neon_fp_mla_<Vetype>_scalar<q>")]
> @@ -1927,7 +1921,7 @@ (define_insn "*aarch64_fnma4_elt_to_64v2
> (match_operand:DF 4 "register_operand" "0")))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (V2DFmode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (V2DFmode, INTVAL (operands[2]));
> return "fmls\\t%0.2d, %3.2d, %1.2d[%2]";
> }
> [(set_attr "type" "neon_fp_mla_d_scalar_q")]
> @@ -2260,7 +2254,7 @@ (define_expand "reduc_plus_scal_<mode>"
> UNSPEC_ADDV)]
> "TARGET_SIMD"
> {
> - rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
> + rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
> rtx scratch = gen_reg_rtx (<MODE>mode);
> emit_insn (gen_aarch64_reduc_plus_internal<mode> (scratch, operands[1]));
> emit_insn (gen_aarch64_get_lane<mode> (operands[0], scratch, elt));
> @@ -2311,7 +2305,7 @@ (define_expand "reduc_plus_scal_v4sf"
> UNSPEC_FADDV))]
> "TARGET_SIMD"
> {
> - rtx elt = GEN_INT (ENDIAN_LANE_N (V4SFmode, 0));
> + rtx elt = aarch64_endian_lane_rtx (V4SFmode, 0);
> rtx scratch = gen_reg_rtx (V4SFmode);
> emit_insn (gen_aarch64_faddpv4sf (scratch, operands[1], operands[1]));
> emit_insn (gen_aarch64_faddpv4sf (scratch, scratch, scratch));
> @@ -2353,7 +2347,7 @@ (define_expand "reduc_<maxmin_uns>_scal_
> FMAXMINV)]
> "TARGET_SIMD"
> {
> - rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
> + rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
> rtx scratch = gen_reg_rtx (<MODE>mode);
> emit_insn (gen_aarch64_reduc_<maxmin_uns>_internal<mode> (scratch,
> operands[1]));
> @@ -2369,7 +2363,7 @@ (define_expand "reduc_<maxmin_uns>_scal_
> MAXMINV)]
> "TARGET_SIMD"
> {
> - rtx elt = GEN_INT (ENDIAN_LANE_N (<MODE>mode, 0));
> + rtx elt = aarch64_endian_lane_rtx (<MODE>mode, 0);
> rtx scratch = gen_reg_rtx (<MODE>mode);
> emit_insn (gen_aarch64_reduc_<maxmin_uns>_internal<mode> (scratch,
> operands[1]));
> @@ -2894,7 +2888,7 @@ (define_insn "*aarch64_get_lane_extend<G
> (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "smov\\t%<GPI:w>0, %1.<VDQQH:Vetype>[%2]";
> }
> [(set_attr "type" "neon_to_gp<q>")]
> @@ -2908,7 +2902,7 @@ (define_insn "*aarch64_get_lane_zero_ext
> (parallel [(match_operand:SI 2 "immediate_operand" "i")]))))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "umov\\t%w0, %1.<Vetype>[%2]";
> }
> [(set_attr "type" "neon_to_gp<q>")]
> @@ -2924,7 +2918,7 @@ (define_insn "aarch64_get_lane<mode>"
> (parallel [(match_operand:SI 2 "immediate_operand" "i, i, i")])))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> switch (which_alternative)
> {
> case 0:
> @@ -3300,8 +3294,7 @@ (define_insn "*aarch64_mulx_elt_<vswap_w
> UNSPEC_FMULX))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VSWAP_WIDTH>mode,
> - INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VSWAP_WIDTH>mode, INTVAL (operands[3]));
> return "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_fp_mul_<Vetype>_scalar<q>")]
> @@ -3320,7 +3313,7 @@ (define_insn "*aarch64_mulx_elt<mode>"
> UNSPEC_FMULX))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
> return "fmulx\t%<v>0<Vmtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_fp_mul_<Vetype><q>")]
> @@ -3354,7 +3347,7 @@ (define_insn "*aarch64_vgetfmulx<mode>"
> UNSPEC_FMULX))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
> return "fmulx\t%<Vetype>0, %<Vetype>1, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "fmul<Vetype>")]
> @@ -3440,7 +3433,7 @@ (define_insn "aarch64_sq<r>dmulh_lane<mo
> VQDMULH))]
> "TARGET_SIMD"
> "*
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
> return \"sq<r>dmulh\\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[%3]\";"
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
> )
> @@ -3455,7 +3448,7 @@ (define_insn "aarch64_sq<r>dmulh_laneq<m
> VQDMULH))]
> "TARGET_SIMD"
> "*
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
> return \"sq<r>dmulh\\t%0.<Vtype>, %1.<Vtype>, %2.<Vetype>[%3]\";"
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
> )
> @@ -3470,7 +3463,7 @@ (define_insn "aarch64_sq<r>dmulh_lane<mo
> VQDMULH))]
> "TARGET_SIMD"
> "*
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
> return \"sq<r>dmulh\\t%<v>0, %<v>1, %2.<v>[%3]\";"
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
> )
> @@ -3485,7 +3478,7 @@ (define_insn "aarch64_sq<r>dmulh_laneq<m
> VQDMULH))]
> "TARGET_SIMD"
> "*
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
> return \"sq<r>dmulh\\t%<v>0, %<v>1, %2.<v>[%3]\";"
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar<q>")]
> )
> @@ -3517,7 +3510,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
> return
> "sqrdml<SQRDMLH_AS:rdma_as>h\\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[%4]";
> }
> @@ -3535,7 +3528,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
> return
> "sqrdml<SQRDMLH_AS:rdma_as>h\\t%<v>0, %<v>2, %3.<Vetype>[%4]";
> }
> @@ -3555,7 +3548,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
> return
> "sqrdml<SQRDMLH_AS:rdma_as>h\\t%0.<Vtype>, %2.<Vtype>, %3.<Vetype>[%4]";
> }
> @@ -3573,7 +3566,7 @@ (define_insn "aarch64_sqrdml<SQRDMLH_AS:
> SQRDMLH_AS))]
> "TARGET_SIMD_RDMA"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
> return
> "sqrdml<SQRDMLH_AS:rdma_as>h\\t%<v>0, %<v>2, %3.<v>[%4]";
> }
> @@ -3617,7 +3610,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
> (const_int 1))))]
> "TARGET_SIMD"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
> return
> "sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
> }
> @@ -3641,7 +3634,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
> (const_int 1))))]
> "TARGET_SIMD"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
> return
> "sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
> }
> @@ -3664,7 +3657,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
> (const_int 1))))]
> "TARGET_SIMD"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
> return
> "sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
> }
> @@ -3687,7 +3680,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
> (const_int 1))))]
> "TARGET_SIMD"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
> return
> "sqdml<SBINQOPS:as>l\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
> }
> @@ -3782,7 +3775,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
> (const_int 1))))]
> "TARGET_SIMD"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[4]));
> return
> "sqdml<SBINQOPS:as>l2\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
> }
> @@ -3808,7 +3801,7 @@ (define_insn "aarch64_sqdml<SBINQOPS:as>
> (const_int 1))))]
> "TARGET_SIMD"
> {
> - operands[4] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[4])));
> + operands[4] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[4]));
> return
> "sqdml<SBINQOPS:as>l2\\t%<vw2>0<Vmwtype>, %<v>2<Vmtype>, %3.<Vetype>[%4]";
> }
> @@ -3955,7 +3948,7 @@ (define_insn "aarch64_sqdmull_lane<mode>
> (const_int 1)))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
> return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
> @@ -3976,7 +3969,7 @@ (define_insn "aarch64_sqdmull_laneq<mode
> (const_int 1)))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
> return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
> @@ -3996,7 +3989,7 @@ (define_insn "aarch64_sqdmull_lane<mode>
> (const_int 1)))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
> return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
> @@ -4016,7 +4009,7 @@ (define_insn "aarch64_sqdmull_laneq<mode
> (const_int 1)))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
> return "sqdmull\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
> @@ -4094,7 +4087,7 @@ (define_insn "aarch64_sqdmull2_lane<mode
> (const_int 1)))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCOND>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCOND>mode, INTVAL (operands[3]));
> return "sqdmull2\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
> @@ -4117,7 +4110,7 @@ (define_insn "aarch64_sqdmull2_laneq<mod
> (const_int 1)))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<VCONQ>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<VCONQ>mode, INTVAL (operands[3]));
> return "sqdmull2\\t%<vw2>0<Vmwtype>, %<v>1<Vmtype>, %2.<Vetype>[%3]";
> }
> [(set_attr "type" "neon_sat_mul_<Vetype>_scalar_long")]
> @@ -4623,7 +4616,7 @@ (define_insn "aarch64_vec_load_lanesoi_l
> UNSPEC_LD2_LANE))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
> return "ld2\\t{%S0.<Vetype> - %T0.<Vetype>}[%3], %1";
> }
> [(set_attr "type" "neon_load2_one_lane")]
> @@ -4667,7 +4660,7 @@ (define_insn "aarch64_vec_store_lanesoi_
> UNSPEC_ST2_LANE))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "st2\\t{%S1.<Vetype> - %T1.<Vetype>}[%2], %0";
> }
> [(set_attr "type" "neon_store2_one_lane<q>")]
> @@ -4721,7 +4714,7 @@ (define_insn "aarch64_vec_load_lanesci_l
> UNSPEC_LD3_LANE))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
> return "ld3\\t{%S0.<Vetype> - %U0.<Vetype>}[%3], %1";
> }
> [(set_attr "type" "neon_load3_one_lane")]
> @@ -4765,7 +4758,7 @@ (define_insn "aarch64_vec_store_lanesci_
> UNSPEC_ST3_LANE))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "st3\\t{%S1.<Vetype> - %U1.<Vetype>}[%2], %0";
> }
> [(set_attr "type" "neon_store3_one_lane<q>")]
> @@ -4819,7 +4812,7 @@ (define_insn "aarch64_vec_load_lanesxi_l
> UNSPEC_LD4_LANE))]
> "TARGET_SIMD"
> {
> - operands[3] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[3])));
> + operands[3] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[3]));
> return "ld4\\t{%S0.<Vetype> - %V0.<Vetype>}[%3], %1";
> }
> [(set_attr "type" "neon_load4_one_lane")]
> @@ -4863,7 +4856,7 @@ (define_insn "aarch64_vec_store_lanesxi_
> UNSPEC_ST4_LANE))]
> "TARGET_SIMD"
> {
> - operands[2] = GEN_INT (ENDIAN_LANE_N (<MODE>mode, INTVAL (operands[2])));
> + operands[2] = aarch64_endian_lane_rtx (<MODE>mode, INTVAL (operands[2]));
> return "st4\\t{%S1.<Vetype> - %V1.<Vetype>}[%2], %0";
> }
> [(set_attr "type" "neon_store4_one_lane<q>")]
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [07/nn] [AArch64] Pass number of units to aarch64_reverse_mask
2017-10-27 13:29 ` [07/nn] [AArch64] Pass number of units to aarch64_reverse_mask Richard Sandiford
@ 2017-11-02 9:56 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-02 9:56 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:28:27PM +0100, Richard Sandiford wrote:
> This patch passes the number of units to aarch64_reverse_mask,
> which avoids a to_constant () once GET_MODE_NUNITS is variable.
OK
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
>
>
> 2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64-protos.h (aarch64_reverse_mask): Take
> the number of units too.
> * config/aarch64/aarch64.c (aarch64_reverse_mask): Likewise.
> * config/aarch64/aarch64-simd.md (vec_load_lanesoi<mode>)
> (vec_store_lanesoi<mode>, vec_load_lanesci<mode>)
> (vec_store_lanesci<mode>, vec_load_lanesxi<mode>)
> (vec_store_lanesxi<mode>): Update accordingly.
>
> Index: gcc/config/aarch64/aarch64-protos.h
> ===================================================================
> --- gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:00.601693018 +0100
> +++ gcc/config/aarch64/aarch64-protos.h 2017-10-27 14:12:04.192082112 +0100
> @@ -365,7 +365,7 @@ bool aarch64_mask_and_shift_for_ubfiz_p
> bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
> bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
> bool aarch64_mov_operand_p (rtx, machine_mode);
> -rtx aarch64_reverse_mask (machine_mode);
> +rtx aarch64_reverse_mask (machine_mode, unsigned int);
> bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
> char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
> char *aarch64_output_simd_mov_immediate (rtx, unsigned,
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c 2017-10-27 14:12:00.603550436 +0100
> +++ gcc/config/aarch64/aarch64.c 2017-10-27 14:12:04.193939530 +0100
> @@ -13945,16 +13945,18 @@ aarch64_vectorize_vec_perm_const_ok (mac
> return ret;
> }
>
> +/* Generate a byte permute mask for a register of mode MODE,
> + which has NUNITS units. */
> +
> rtx
> -aarch64_reverse_mask (machine_mode mode)
> +aarch64_reverse_mask (machine_mode mode, unsigned int nunits)
> {
> /* We have to reverse each vector because we dont have
> a permuted load that can reverse-load according to ABI rules. */
> rtx mask;
> rtvec v = rtvec_alloc (16);
> - int i, j;
> - int nunits = GET_MODE_NUNITS (mode);
> - int usize = GET_MODE_UNIT_SIZE (mode);
> + unsigned int i, j;
> + unsigned int usize = GET_MODE_UNIT_SIZE (mode);
>
> gcc_assert (BYTES_BIG_ENDIAN);
> gcc_assert (AARCH64_VALID_SIMD_QREG_MODE (mode));
> Index: gcc/config/aarch64/aarch64-simd.md
> ===================================================================
> --- gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:00.602621727 +0100
> +++ gcc/config/aarch64/aarch64-simd.md 2017-10-27 14:12:04.193010821 +0100
> @@ -4632,7 +4632,7 @@ (define_expand "vec_load_lanesoi<mode>"
> if (BYTES_BIG_ENDIAN)
> {
> rtx tmp = gen_reg_rtx (OImode);
> - rtx mask = aarch64_reverse_mask (<MODE>mode);
> + rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
> emit_insn (gen_aarch64_simd_ld2<mode> (tmp, operands[1]));
> emit_insn (gen_aarch64_rev_reglistoi (operands[0], tmp, mask));
> }
> @@ -4676,7 +4676,7 @@ (define_expand "vec_store_lanesoi<mode>"
> if (BYTES_BIG_ENDIAN)
> {
> rtx tmp = gen_reg_rtx (OImode);
> - rtx mask = aarch64_reverse_mask (<MODE>mode);
> + rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
> emit_insn (gen_aarch64_rev_reglistoi (tmp, operands[1], mask));
> emit_insn (gen_aarch64_simd_st2<mode> (operands[0], tmp));
> }
> @@ -4730,7 +4730,7 @@ (define_expand "vec_load_lanesci<mode>"
> if (BYTES_BIG_ENDIAN)
> {
> rtx tmp = gen_reg_rtx (CImode);
> - rtx mask = aarch64_reverse_mask (<MODE>mode);
> + rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
> emit_insn (gen_aarch64_simd_ld3<mode> (tmp, operands[1]));
> emit_insn (gen_aarch64_rev_reglistci (operands[0], tmp, mask));
> }
> @@ -4774,7 +4774,7 @@ (define_expand "vec_store_lanesci<mode>"
> if (BYTES_BIG_ENDIAN)
> {
> rtx tmp = gen_reg_rtx (CImode);
> - rtx mask = aarch64_reverse_mask (<MODE>mode);
> + rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
> emit_insn (gen_aarch64_rev_reglistci (tmp, operands[1], mask));
> emit_insn (gen_aarch64_simd_st3<mode> (operands[0], tmp));
> }
> @@ -4828,7 +4828,7 @@ (define_expand "vec_load_lanesxi<mode>"
> if (BYTES_BIG_ENDIAN)
> {
> rtx tmp = gen_reg_rtx (XImode);
> - rtx mask = aarch64_reverse_mask (<MODE>mode);
> + rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
> emit_insn (gen_aarch64_simd_ld4<mode> (tmp, operands[1]));
> emit_insn (gen_aarch64_rev_reglistxi (operands[0], tmp, mask));
> }
> @@ -4872,7 +4872,7 @@ (define_expand "vec_store_lanesxi<mode>"
> if (BYTES_BIG_ENDIAN)
> {
> rtx tmp = gen_reg_rtx (XImode);
> - rtx mask = aarch64_reverse_mask (<MODE>mode);
> + rtx mask = aarch64_reverse_mask (<MODE>mode, <nunits>);
> emit_insn (gen_aarch64_rev_reglistxi (tmp, operands[1], mask));
> emit_insn (gen_aarch64_simd_st4<mode> (operands[0], tmp));
> }
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half
2017-10-27 13:29 ` [08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half Richard Sandiford
@ 2017-11-02 9:59 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-02 9:59 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:28:57PM +0100, Richard Sandiford wrote:
> This patch passes the number of units to aarch64_simd_vect_par_cnst_half,
> which avoids a to_constant () once GET_MODE_NUNITS is variable.
OK.
Reviewed-by: James GReenhalgh <james.greenhalgh@arm.com>
Thanks,
James
> 2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64-protos.h (aarch64_simd_vect_par_cnst_half):
> Take the number of units too.
> * config/aarch64/aarch64.c (aarch64_simd_vect_par_cnst_half): Likewise.
> (aarch64_simd_check_vect_par_cnst_half): Update call accordingly,
> but check for a vector mode before rather than after the call.
> * config/aarch64/aarch64-simd.md (aarch64_split_simd_mov<mode>)
> (move_hi_quad_<mode>, vec_unpack<su>_hi_<mode>)
> (vec_unpack<su>_lo_<mode, vec_widen_<su>mult_lo_<mode>)
> (vec_widen_<su>mult_hi_<mode>, vec_unpacks_lo_<mode>)
> (vec_unpacks_hi_<mode>, aarch64_saddl2<mode>, aarch64_uaddl2<mode>)
> (aarch64_ssubl2<mode>, aarch64_usubl2<mode>, widen_ssum<mode>3)
> (widen_usum<mode>3, aarch64_saddw2<mode>, aarch64_uaddw2<mode>)
> (aarch64_ssubw2<mode>, aarch64_usubw2<mode>, aarch64_sqdmlal2<mode>)
> (aarch64_sqdmlsl2<mode>, aarch64_sqdmlal2_lane<mode>)
> (aarch64_sqdmlal2_laneq<mode>, aarch64_sqdmlsl2_lane<mode>)
> (aarch64_sqdmlsl2_laneq<mode>, aarch64_sqdmlal2_n<mode>)
> (aarch64_sqdmlsl2_n<mode>, aarch64_sqdmull2<mode>)
> (aarch64_sqdmull2_lane<mode>, aarch64_sqdmull2_laneq<mode>)
> (aarch64_sqdmull2_n<mode>): Update accordingly.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const)
2017-10-27 13:30 ` [09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const) Richard Sandiford
@ 2017-11-02 10:00 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-02 10:00 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
On Fri, Oct 27, 2017 at 02:29:30PM +0100, Richard Sandiford wrote:
> This patch passes the number of units to aarch64_expand_vec_perm
> and aarch64_expand_vec_perm_const, which avoids a to_constant ()
> once GET_MODE_NUNITS is variable.
OK.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
> 2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64-protos.h (aarch64_expand_vec_perm)
> (aarch64_expand_vec_perm_const): Take the number of units too.
> * config/aarch64/aarch64.c (aarch64_expand_vec_perm)
> (aarch64_expand_vec_perm_const): Likewise.
> * config/aarch64/aarch64-simd.md (vec_perm_const<mode>)
> (vec_perm<mode>): Update accordingly.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [12/nn] [AArch64] Add const_offset field to aarch64_address_info
2017-10-27 13:37 ` [12/nn] [AArch64] Add const_offset field to aarch64_address_info Richard Sandiford
@ 2017-11-02 10:09 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-02 10:09 UTC (permalink / raw)
To: gcc-patches, richard.earnshaw, marcus.shawcroft, richard.sandiford; +Cc: nd
On Fri, Oct 27, 2017 at 02:31:35PM +0100, Richard Sandiford wrote:
> This patch records the integer value of the address offset in
> aarch64_address_info, so that it doesn't need to be re-extracted
> from the rtx. The SVE port will make more use of this. The patch
> also uses poly_int64 routines to manipulate the offset, rather than
> just handling CONST_INTs.
OK.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
> 2017-10-27 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64.c (aarch64_address_info): Add a const_offset
> field.
> (aarch64_classify_address): Initialize it. Track polynomial offsets.
> (aarch64_print_operand_address): Use it to check for a zero offset.
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate
2017-10-27 13:28 ` [05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate Richard Sandiford
@ 2017-11-10 11:20 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-10 11:20 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
On Fri, Oct 27, 2017 at 02:27:08PM +0100, Richard Sandiford wrote:
> This patch reworks aarch64_simd_valid_immediate so that
> it's easier to add SVE support. The main changes are:
>
> - make simd_immediate_info easier to construct
> - replace the while (1) { ... break; } blocks with checks that use
> the full 64-bit value of the constant
> - treat floating-point modes as integers if they aren't valid
> as floating-point values
This is a nice cleanup. I'm very pleased that we can now read which
conditions you are checking rather than trying to derive them from the
old CHECK macros.
Thanks for the patch, this is OK for trunk.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
James
>
>
> 2017-10-26 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64-protos.h (aarch64_output_simd_mov_immediate):
> Remove the mode argument.
> (aarch64_simd_valid_immediate): Remove the mode and inverse
> arguments.
> * config/aarch64/iterators.md (bitsize): New iterator.
> * config/aarch64/aarch64-simd.md (*aarch64_simd_mov<mode>, and<mode>3)
> (ior<mode>3): Update calls to aarch64_output_simd_mov_immediate.
> * config/aarch64/constraints.md (Do, Db, Dn): Update calls to
> aarch64_simd_valid_immediate.
> * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Likewise.
> (aarch64_reg_or_bic_imm): Likewise.
> * config/aarch64/aarch64.c (simd_immediate_info): Replace mvn
> with an insn_type enum and msl with a modifier_type enum.
> Replace element_width with a scalar_mode. Change the shift
> to unsigned int. Add constructors for scalar_float_mode and
> scalar_int_mode elements.
> (aarch64_vect_float_const_representable_p): Delete.
> (aarch64_can_const_movi_rtx_p, aarch64_legitimate_constant_p)
> (aarch64_simd_scalar_immediate_valid_for_move)
> (aarch64_simd_make_constant): Update call to
> aarch64_simd_valid_immediate.
> (aarch64_advsimd_valid_immediate_hs): New function.
> (aarch64_advsimd_valid_immediate): Likewise.
> (aarch64_simd_valid_immediate): Remove mode and inverse
> arguments. Rewrite to use the above. Use const_vec_duplicate_p
> to detect duplicated constants and use aarch64_float_const_zero_rtx_p
> and aarch64_float_const_representable_p on the result.
> (aarch64_output_simd_mov_immediate): Remove mode argument.
> Update call to aarch64_simd_valid_immediate and use of
> simd_immediate_info.
> (aarch64_output_scalar_simd_mov_immediate): Update call
> accordingly.
>
> gcc/testsuite/
> * gcc.target/aarch64/vect-movi.c (movi_float_lsl24): New function.
> (main): Call it.
>
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: [03/nn] [AArch64] Rework interface to add constant/offset routines
2017-10-30 11:03 ` Richard Sandiford
@ 2017-11-10 15:43 ` James Greenhalgh
0 siblings, 0 replies; 29+ messages in thread
From: James Greenhalgh @ 2017-11-10 15:43 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
On Mon, Oct 30, 2017 at 10:52:26AM +0000, Richard Sandiford wrote:
> Richard Sandiford <richard.sandiford@linaro.org> writes:
> > The port had aarch64_add_offset and aarch64_add_constant routines
> > that did similar things. This patch replaces them with an expanded
> > version of aarch64_add_offset that takes separate source and
> > destination registers. The new routine also takes a poly_int64 offset
> > instead of a HOST_WIDE_INT offset, but it leaves the HOST_WIDE_INT
> > case to aarch64_add_offset_1, which is basically a repurposed
> > aarch64_add_constant_internal. The SVE patch will put the handling
> > of VL-based constants in aarch64_add_offset, while still using
> > aarch64_add_offset_1 for the constant part.
> >
> > The vcall_offset == 0 path in aarch64_output_mi_thunk will use temp0
> > as well as temp1 once SVE is added.
> >
> > A side-effect of the patch is that we now generate:
> >
> > mov x29, sp
> >
> > instead of:
> >
> > add x29, sp, 0
> >
> > in the pr70044.c test.
>
> Sorry, I stupidly rebased the patch just before posting and so
> introduced a last-minute bug. Here's a fixed version that survives
> testing on aarch64-linux-gnu.
>
> 2017-10-30 Richard Sandiford <richard.sandiford@linaro.org>
> Alan Hayward <alan.hayward@arm.com>
> David Sherwood <david.sherwood@arm.com>
>
> gcc/
> * config/aarch64/aarch64.c (aarch64_force_temporary): Assert that
> x exists before using it.
> (aarch64_add_constant_internal): Rename to...
> (aarch64_add_offset_1): ...this. Replace regnum with separate
> src and dest rtxes. Handle the case in which they're different,
> including when the offset is zero. Replace scratchreg with an rtx.
> Use 2 additions if there is no spare register into which we can
> move a 16-bit constant.
> (aarch64_add_constant): Delete.
> (aarch64_add_offset): Replace reg with separate src and dest
> rtxes. Take a poly_int64 offset instead of a HOST_WIDE_INT.
> Use aarch64_add_offset_1.
> (aarch64_add_sp, aarch64_sub_sp): Take the scratch register as
> an rtx rather than an int. Take the delta as a poly_int64
> rather than a HOST_WIDE_INT. Use aarch64_add_offset.
> (aarch64_expand_mov_immediate): Update uses of aarch64_add_offset.
> (aarch64_allocate_and_probe_stack_space): Take the scratch register
> as an rtx rather than an int. Use Pmode rather than word_mode
> in the loop code. Update calls to aarch64_sub_sp.
> (aarch64_expand_prologue): Update calls to aarch64_sub_sp,
> aarch64_allocate_and_probe_stack_space and aarch64_add_offset.
> (aarch64_expand_epilogue): Update calls to aarch64_add_offset
> and aarch64_add_sp.
> (aarch64_output_mi_thunk): Use aarch64_add_offset rather than
> aarch64_add_constant.
>
> gcc/testsuite/
> * gcc.target/aarch64/pr70044.c: Allow "mov x29, sp" too.
> @@ -1966,86 +1949,123 @@ aarch64_internal_mov_immediate (rtx dest
> return num_insns;
> }
>
> -/* Add DELTA to REGNUM in mode MODE. SCRATCHREG can be used to hold a
> - temporary value if necessary. FRAME_RELATED_P should be true if
> - the RTX_FRAME_RELATED flag should be set and CFA adjustments added
> - to the generated instructions. If SCRATCHREG is known to hold
> - abs (delta), EMIT_MOVE_IMM can be set to false to avoid emitting the
> - immediate again.
> -
> - Since this function may be used to adjust the stack pointer, we must
> - ensure that it cannot cause transient stack deallocation (for example
> - by first incrementing SP and then decrementing when adjusting by a
> - large immediate). */
> +/* A subroutine of aarch64_add_offset that handles the case in which
> + OFFSET is known at compile time. The arguments are otherwise the same. */
Some of the restrictions listed in this comment are important to keep
here.
> - Since this function may be used to adjust the stack pointer, we must
> - ensure that it cannot cause transient stack deallocation (for example
> - by first incrementing SP and then decrementing when adjusting by a
> - large immediate). */
This one in particular seems like we'd want it kept nearby the code.
OK with some sort of change to make the restrictions on what this code
should do clear on both functions.
Reviewed-by: James Greenhalgh <james.greenhalgh@arm.com>
Thanks,
James
^ permalink raw reply [flat|nested] 29+ messages in thread
* PING: [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2
2017-10-27 13:31 ` [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2 Richard Sandiford
@ 2018-01-05 11:27 ` Richard Sandiford
2018-01-06 17:57 ` James Greenhalgh
0 siblings, 1 reply; 29+ messages in thread
From: Richard Sandiford @ 2018-01-05 11:27 UTC (permalink / raw)
To: gcc-patches; +Cc: richard.earnshaw, james.greenhalgh, marcus.shawcroft
Ping. Here's the patch updated to apply on top of the v8.4 and
__builtin_load_no_speculate support.
Richard Sandiford <richard.sandiford@linaro.org> writes:
> This patch switches the AArch64 port to use 2 poly_int coefficients
> and updates code as necessary to keep it compiling.
>
> One potentially-significant change is to
> aarch64_hard_regno_caller_save_mode. The old implementation
> was written in a pretty conservative way: it changed the default
> behaviour for single-register values, but used the default handling
> for multi-register values.
>
> I don't think that's necessary, since the interesting cases for this
> macro are usually the single-register ones. Multi-register modes take
> up the whole of the constituent registers and the move patterns for all
> multi-register modes should be equally good.
>
> Using the original mode for multi-register cases stops us from using
> SVE modes to spill multi-register NEON values. This was caught by
> gcc.c-torture/execute/pr47538.c.
>
> Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
> GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
> (which are all scalars), and I think it's more obvious, since if we ever
> do use this for elementwise shifts of vector modes, the mask will depend
> on the number of bits in each element rather than the number of bits in
> the whole vector.
2018-01-05 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>
gcc/
* config/aarch64/aarch64-modes.def (NUM_POLY_INT_COEFFS): Set to 2.
* config/aarch64/aarch64-protos.h (aarch64_initial_elimination_offset):
Return a poly_int64 rather than a HOST_WIDE_INT.
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a poly_int64
rather than a HOST_WIDE_INT.
* config/aarch64/aarch64.h (aarch64_frame): Protect with
HAVE_POLY_INT_H rather than HOST_WIDE_INT. Change locals_offset,
hard_fp_offset, frame_size, initial_adjust, callee_offset and
final_offset from HOST_WIDE_INT to poly_int64.
* config/aarch64/aarch64-builtins.c (aarch64_simd_expand_args): Use
to_constant when getting the number of units in an Advanced SIMD
mode.
(aarch64_builtin_vectorized_function): Check for a constant number
of units.
* config/aarch64/aarch64-simd.md (mov<mode>): Handle polynomial
GET_MODE_SIZE.
(aarch64_ld<VSTRUCT:nregs>_lane<VALLDIF:mode>): Use the nunits
attribute instead of GET_MODE_NUNITS.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs)
(aarch64_class_max_nregs): Use the constant_lowest_bound of the
GET_MODE_SIZE for fixed-size registers.
(aarch64_const_vec_all_same_in_range_p): Use const_vec_duplicate_p.
(aarch64_hard_regno_call_part_clobbered, aarch64_classify_index)
(aarch64_mode_valid_for_sched_fusion_p, aarch64_classify_address)
(aarch64_legitimize_address_displacement, aarch64_secondary_reload)
(aarch64_print_operand, aarch64_print_address_internal)
(aarch64_address_cost, aarch64_rtx_costs, aarch64_register_move_cost)
(aarch64_short_vector_p, aapcs_vfp_sub_candidate)
(aarch64_simd_attr_length_rglist, aarch64_operands_ok_for_ldpstp):
Handle polynomial GET_MODE_SIZE.
(aarch64_hard_regno_caller_save_mode): Likewise. Return modes
wider than SImode without modification.
(tls_symbolic_operand_type): Use strip_offset instead of split_const.
(aarch64_pass_by_reference, aarch64_layout_arg, aarch64_pad_reg_upward)
(aarch64_gimplify_va_arg_expr): Assert that we don't yet handle
passing and returning SVE modes.
(aarch64_function_value, aarch64_layout_arg): Use gen_int_mode
rather than GEN_INT.
(aarch64_emit_probe_stack_range): Take the size as a poly_int64
rather than a HOST_WIDE_INT, but call sorry if it isn't constant.
(aarch64_allocate_and_probe_stack_space): Likewise.
(aarch64_layout_frame): Cope with polynomial offsets.
(aarch64_save_callee_saves, aarch64_restore_callee_saves): Take the
start_offset as a poly_int64 rather than a HOST_WIDE_INT. Track
polynomial offsets.
(offset_9bit_signed_unscaled_p, offset_12bit_unsigned_scaled_p)
(aarch64_offset_7bit_signed_scaled_p): Take the offset as a
poly_int64 rather than a HOST_WIDE_INT.
(aarch64_get_separate_components, aarch64_process_components)
(aarch64_expand_prologue, aarch64_expand_epilogue)
(aarch64_use_return_insn_p): Handle polynomial frame offsets.
(aarch64_anchor_offset): New function, split out from...
(aarch64_legitimize_address): ...here.
(aarch64_builtin_vectorization_cost): Handle polynomial
TYPE_VECTOR_SUBPARTS.
(aarch64_simd_check_vect_par_cnst_half): Handle polynomial
GET_MODE_NUNITS.
(aarch64_simd_make_constant, aarch64_expand_vector_init): Get the
number of elements from the PARALLEL rather than the mode.
(aarch64_shift_truncation_mask): Use GET_MODE_UNIT_BITSIZE
rather than GET_MODE_BITSIZE.
(aarch64_evpc_trn, aarch64_evpc_uzp, aarch64_evpc_ext)
(aarch64_evpc_rev, aarch64_evpc_dup, aarch64_evpc_zip)
(aarch64_expand_vec_perm_const_1): Handle polynomial
d->perm.length () and d->perm elements.
(aarch64_evpc_tbl): Likewise. Use nelt rather than GET_MODE_NUNITS.
Apply to_constant to d->perm elements.
(aarch64_simd_valid_immediate, aarch64_vec_fpconst_pow_of_2): Handle
polynomial CONST_VECTOR_NUNITS.
(aarch64_move_pointer): Take amount as a poly_int64 rather
than an int.
(aarch64_progress_pointer): Avoid temporary variable.
* config/aarch64/aarch64.md (aarch64_<crc_variant>): Use
the mode attribute instead of GET_MODE.
Index: gcc/config/aarch64/aarch64-modes.def
===================================================================
--- gcc/config/aarch64/aarch64-modes.def 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64-modes.def 2018-01-05 11:24:44.864399697 +0000
@@ -46,3 +46,7 @@ INT_MODE (XI, 64);
/* Quad float: 128-bit floating mode for long doubles. */
FLOAT_MODE (TF, 16, ieee_quad_format);
+
+/* Coefficient 1 is multiplied by the number of 128-bit chunks in an
+ SVE vector (referred to as "VQ") minus one. */
+#define NUM_POLY_INT_COEFFS 2
Index: gcc/config/aarch64/aarch64-protos.h
===================================================================
--- gcc/config/aarch64/aarch64-protos.h 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64-protos.h 2018-01-05 11:24:44.864399697 +0000
@@ -333,7 +333,7 @@ enum simd_immediate_check {
extern struct tune_params aarch64_tune_params;
-HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
+poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
int aarch64_get_condition_code (rtx);
bool aarch64_address_valid_for_prefetch_p (rtx, bool);
bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
@@ -366,7 +366,7 @@ bool aarch64_zero_extend_const_eq (machi
bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
bool aarch64_mov_operand_p (rtx, machine_mode);
rtx aarch64_reverse_mask (machine_mode, unsigned int);
-bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT);
+bool aarch64_offset_7bit_signed_scaled_p (machine_mode, poly_int64);
char *aarch64_output_scalar_simd_mov_immediate (rtx, scalar_int_mode);
char *aarch64_output_simd_mov_immediate (rtx, unsigned,
enum simd_immediate_check w = AARCH64_CHECK_MOV);
Index: gcc/config/aarch64/aarch64.h
===================================================================
--- gcc/config/aarch64/aarch64.h 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64.h 2018-01-05 11:24:44.867399574 +0000
@@ -554,7 +554,7 @@ #define LIBCALL_VALUE(MODE) \
#define DEFAULT_PCC_STRUCT_RETURN 0
-#ifdef HOST_WIDE_INT
+#ifdef HAVE_POLY_INT_H
struct GTY (()) aarch64_frame
{
HOST_WIDE_INT reg_offset[FIRST_PSEUDO_REGISTER];
@@ -572,20 +572,20 @@ struct GTY (()) aarch64_frame
/* Offset from the base of the frame (incomming SP) to the
top of the locals area. This value is always a multiple of
STACK_BOUNDARY. */
- HOST_WIDE_INT locals_offset;
+ poly_int64 locals_offset;
/* Offset from the base of the frame (incomming SP) to the
hard_frame_pointer. This value is always a multiple of
STACK_BOUNDARY. */
- HOST_WIDE_INT hard_fp_offset;
+ poly_int64 hard_fp_offset;
/* The size of the frame. This value is the offset from base of the
- * frame (incomming SP) to the stack_pointer. This value is always
- * a multiple of STACK_BOUNDARY. */
- HOST_WIDE_INT frame_size;
+ frame (incomming SP) to the stack_pointer. This value is always
+ a multiple of STACK_BOUNDARY. */
+ poly_int64 frame_size;
/* The size of the initial stack adjustment before saving callee-saves. */
- HOST_WIDE_INT initial_adjust;
+ poly_int64 initial_adjust;
/* The writeback value when pushing callee-save registers.
It is zero when no push is used. */
@@ -593,10 +593,10 @@ struct GTY (()) aarch64_frame
/* The offset from SP to the callee-save registers after initial_adjust.
It may be non-zero if no push is used (ie. callee_adjust == 0). */
- HOST_WIDE_INT callee_offset;
+ poly_int64 callee_offset;
/* The size of the stack adjustment after saving callee-saves. */
- HOST_WIDE_INT final_adjust;
+ poly_int64 final_adjust;
/* Store FP,LR and setup a frame pointer. */
bool emit_frame_chain;
Index: gcc/config/aarch64/aarch64-builtins.c
===================================================================
--- gcc/config/aarch64/aarch64-builtins.c 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64-builtins.c 2018-01-05 11:24:44.864399697 +0000
@@ -1065,9 +1065,9 @@ aarch64_simd_expand_args (rtx target, in
gcc_assert (opc > 1);
if (CONST_INT_P (op[opc]))
{
- aarch64_simd_lane_bounds (op[opc], 0,
- GET_MODE_NUNITS (builtin_mode),
- exp);
+ unsigned int nunits
+ = GET_MODE_NUNITS (builtin_mode).to_constant ();
+ aarch64_simd_lane_bounds (op[opc], 0, nunits, exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
op[opc] = aarch64_endian_lane_rtx (builtin_mode,
INTVAL (op[opc]));
@@ -1080,8 +1080,9 @@ aarch64_simd_expand_args (rtx target, in
if (CONST_INT_P (op[opc]))
{
machine_mode vmode = insn_data[icode].operand[opc - 1].mode;
- aarch64_simd_lane_bounds (op[opc],
- 0, GET_MODE_NUNITS (vmode), exp);
+ unsigned int nunits
+ = GET_MODE_NUNITS (vmode).to_constant ();
+ aarch64_simd_lane_bounds (op[opc], 0, nunits, exp);
/* Keep to GCC-vector-extension lane indices in the RTL. */
op[opc] = aarch64_endian_lane_rtx (vmode, INTVAL (op[opc]));
}
@@ -1400,16 +1401,17 @@ aarch64_builtin_vectorized_function (uns
tree type_in)
{
machine_mode in_mode, out_mode;
- int in_n, out_n;
+ unsigned HOST_WIDE_INT in_n, out_n;
if (TREE_CODE (type_out) != VECTOR_TYPE
|| TREE_CODE (type_in) != VECTOR_TYPE)
return NULL_TREE;
out_mode = TYPE_MODE (TREE_TYPE (type_out));
- out_n = TYPE_VECTOR_SUBPARTS (type_out);
in_mode = TYPE_MODE (TREE_TYPE (type_in));
- in_n = TYPE_VECTOR_SUBPARTS (type_in);
+ if (!TYPE_VECTOR_SUBPARTS (type_out).is_constant (&out_n)
+ || !TYPE_VECTOR_SUBPARTS (type_in).is_constant (&in_n))
+ return NULL_TREE;
#undef AARCH64_CHECK_BUILTIN_MODE
#define AARCH64_CHECK_BUILTIN_MODE(C, N) 1
Index: gcc/config/aarch64/aarch64-simd.md
===================================================================
--- gcc/config/aarch64/aarch64-simd.md 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64-simd.md 2018-01-05 11:24:44.865399656 +0000
@@ -31,9 +31,9 @@ (define_expand "mov<mode>"
normal str, so the check need not apply. */
if (GET_CODE (operands[0]) == MEM
&& !(aarch64_simd_imm_zero (operands[1], <MODE>mode)
- && ((GET_MODE_SIZE (<MODE>mode) == 16
+ && ((known_eq (GET_MODE_SIZE (<MODE>mode), 16)
&& aarch64_mem_pair_operand (operands[0], DImode))
- || GET_MODE_SIZE (<MODE>mode) == 8)))
+ || known_eq (GET_MODE_SIZE (<MODE>mode), 8))))
operands[1] = force_reg (<MODE>mode, operands[1]);
"
)
@@ -5334,9 +5334,7 @@ (define_expand "aarch64_ld<VSTRUCT:nregs
set_mem_size (mem, GET_MODE_SIZE (GET_MODE_INNER (<VALLDIF:MODE>mode))
* <VSTRUCT:nregs>);
- aarch64_simd_lane_bounds (operands[3], 0,
- GET_MODE_NUNITS (<VALLDIF:MODE>mode),
- NULL);
+ aarch64_simd_lane_bounds (operands[3], 0, <VALLDIF:nunits>, NULL);
emit_insn (gen_aarch64_vec_load_lanes<VSTRUCT:mode>_lane<VALLDIF:mode> (
operands[0], mem, operands[2], operands[3]));
DONE;
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64.c 2018-01-05 11:24:44.867399574 +0000
@@ -1139,13 +1139,18 @@ aarch64_array_mode_supported_p (machine_
static unsigned int
aarch64_hard_regno_nregs (unsigned regno, machine_mode mode)
{
+ /* ??? Logically we should only need to provide a value when
+ HARD_REGNO_MODE_OK says that the combination is valid,
+ but at the moment we need to handle all modes. Just ignore
+ any runtime parts for registers that can't store them. */
+ HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
switch (aarch64_regno_regclass (regno))
{
case FP_REGS:
case FP_LO_REGS:
- return (GET_MODE_SIZE (mode) + UNITS_PER_VREG - 1) / UNITS_PER_VREG;
+ return CEIL (lowest_size, UNITS_PER_VREG);
default:
- return (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ return CEIL (lowest_size, UNITS_PER_WORD);
}
gcc_unreachable ();
}
@@ -1188,25 +1193,17 @@ aarch64_hard_regno_mode_ok (unsigned reg
static bool
aarch64_hard_regno_call_part_clobbered (unsigned int regno, machine_mode mode)
{
- return FP_REGNUM_P (regno) && GET_MODE_SIZE (mode) > 8;
+ return FP_REGNUM_P (regno) && maybe_gt (GET_MODE_SIZE (mode), 8);
}
/* Implement HARD_REGNO_CALLER_SAVE_MODE. */
machine_mode
-aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned nregs,
- machine_mode mode)
+aarch64_hard_regno_caller_save_mode (unsigned, unsigned, machine_mode mode)
{
- /* Handle modes that fit within single registers. */
- if (nregs == 1 && GET_MODE_SIZE (mode) <= 16)
- {
- if (GET_MODE_SIZE (mode) >= 4)
- return mode;
- else
- return SImode;
- }
- /* Fall back to generic for multi-reg and very large modes. */
+ if (known_ge (GET_MODE_SIZE (mode), 4))
+ return mode;
else
- return choose_hard_reg_mode (regno, nregs, false);
+ return SImode;
}
/* Implement TARGET_CONSTANT_ALIGNMENT. Make strings word-aligned so
@@ -1319,11 +1316,10 @@ aarch64_tls_get_addr (void)
tls_symbolic_operand_type (rtx addr)
{
enum tls_model tls_kind = TLS_MODEL_NONE;
- rtx sym, addend;
-
if (GET_CODE (addr) == CONST)
{
- split_const (addr, &sym, &addend);
+ poly_int64 addend;
+ rtx sym = strip_offset (addr, &addend);
if (GET_CODE (sym) == SYMBOL_REF)
tls_kind = SYMBOL_REF_TLS_MODEL (sym);
}
@@ -2262,8 +2258,12 @@ aarch64_pass_by_reference (cumulative_ar
int nregs;
/* GET_MODE_SIZE (BLKmode) is useless since it is 0. */
- size = (mode == BLKmode && type)
- ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode);
+ if (mode == BLKmode && type)
+ size = int_size_in_bytes (type);
+ else
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ size = GET_MODE_SIZE (mode).to_constant ();
/* Aggregates are passed by reference based on their size. */
if (type && AGGREGATE_TYPE_P (type))
@@ -2360,8 +2360,8 @@ aarch64_function_value (const_tree type,
for (i = 0; i < count; i++)
{
rtx tmp = gen_rtx_REG (ag_mode, V0_REGNUM + i);
- tmp = gen_rtx_EXPR_LIST (VOIDmode, tmp,
- GEN_INT (i * GET_MODE_SIZE (ag_mode)));
+ rtx offset = gen_int_mode (i * GET_MODE_SIZE (ag_mode), Pmode);
+ tmp = gen_rtx_EXPR_LIST (VOIDmode, tmp, offset);
XVECEXP (par, 0, i) = tmp;
}
return par;
@@ -2488,9 +2488,13 @@ aarch64_layout_arg (cumulative_args_t pc
pcum->aapcs_arg_processed = true;
/* Size in bytes, rounded to the nearest multiple of 8 bytes. */
- size
- = ROUND_UP (type ? int_size_in_bytes (type) : GET_MODE_SIZE (mode),
- UNITS_PER_WORD);
+ if (type)
+ size = int_size_in_bytes (type);
+ else
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ size = GET_MODE_SIZE (mode).to_constant ();
+ size = ROUND_UP (size, UNITS_PER_WORD);
allocate_ncrn = (type) ? !(FLOAT_TYPE_P (type)) : !FLOAT_MODE_P (mode);
allocate_nvrn = aarch64_vfp_is_call_candidate (pcum_v,
@@ -2527,9 +2531,9 @@ aarch64_layout_arg (cumulative_args_t pc
{
rtx tmp = gen_rtx_REG (pcum->aapcs_vfp_rmode,
V0_REGNUM + nvrn + i);
- tmp = gen_rtx_EXPR_LIST
- (VOIDmode, tmp,
- GEN_INT (i * GET_MODE_SIZE (pcum->aapcs_vfp_rmode)));
+ rtx offset = gen_int_mode
+ (i * GET_MODE_SIZE (pcum->aapcs_vfp_rmode), Pmode);
+ tmp = gen_rtx_EXPR_LIST (VOIDmode, tmp, offset);
XVECEXP (par, 0, i) = tmp;
}
pcum->aapcs_reg = par;
@@ -2754,8 +2758,13 @@ aarch64_pad_reg_upward (machine_mode mod
/* Small composite types are always padded upward. */
if (BYTES_BIG_ENDIAN && aarch64_composite_type_p (type, mode))
{
- HOST_WIDE_INT size = (type ? int_size_in_bytes (type)
- : GET_MODE_SIZE (mode));
+ HOST_WIDE_INT size;
+ if (type)
+ size = int_size_in_bytes (type);
+ else
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ size = GET_MODE_SIZE (mode).to_constant ();
if (size < 2 * UNITS_PER_WORD)
return true;
}
@@ -2784,12 +2793,19 @@ #define ARITH_FACTOR 4096
#define PROBE_STACK_FIRST_REG 9
#define PROBE_STACK_SECOND_REG 10
-/* Emit code to probe a range of stack addresses from FIRST to FIRST+SIZE,
+/* Emit code to probe a range of stack addresses from FIRST to FIRST+POLY_SIZE,
inclusive. These are offsets from the current stack pointer. */
static void
-aarch64_emit_probe_stack_range (HOST_WIDE_INT first, HOST_WIDE_INT size)
+aarch64_emit_probe_stack_range (HOST_WIDE_INT first, poly_int64 poly_size)
{
+ HOST_WIDE_INT size;
+ if (!poly_size.is_constant (&size))
+ {
+ sorry ("stack probes for SVE frames");
+ return;
+ }
+
rtx reg1 = gen_rtx_REG (Pmode, PROBE_STACK_FIRST_REG);
/* See the same assertion on PROBE_INTERVAL above. */
@@ -3067,13 +3083,16 @@ #define SLOT_REQUIRED (-1)
= offset + cfun->machine->frame.saved_varargs_size;
cfun->machine->frame.hard_fp_offset
- = ROUND_UP (varargs_and_saved_regs_size + get_frame_size (),
- STACK_BOUNDARY / BITS_PER_UNIT);
+ = aligned_upper_bound (varargs_and_saved_regs_size
+ + get_frame_size (),
+ STACK_BOUNDARY / BITS_PER_UNIT);
+ /* Both these values are already aligned. */
+ gcc_assert (multiple_p (crtl->outgoing_args_size,
+ STACK_BOUNDARY / BITS_PER_UNIT));
cfun->machine->frame.frame_size
- = ROUND_UP (cfun->machine->frame.hard_fp_offset
- + crtl->outgoing_args_size,
- STACK_BOUNDARY / BITS_PER_UNIT);
+ = (cfun->machine->frame.hard_fp_offset
+ + crtl->outgoing_args_size);
cfun->machine->frame.locals_offset = cfun->machine->frame.saved_varargs_size;
@@ -3088,18 +3107,21 @@ #define SLOT_REQUIRED (-1)
else if (cfun->machine->frame.wb_candidate1 != INVALID_REGNUM)
max_push_offset = 256;
- if (cfun->machine->frame.frame_size < max_push_offset
- && crtl->outgoing_args_size == 0)
+ HOST_WIDE_INT const_size, const_fp_offset;
+ if (cfun->machine->frame.frame_size.is_constant (&const_size)
+ && const_size < max_push_offset
+ && known_eq (crtl->outgoing_args_size, 0))
{
/* Simple, small frame with no outgoing arguments:
stp reg1, reg2, [sp, -frame_size]!
stp reg3, reg4, [sp, 16] */
- cfun->machine->frame.callee_adjust = cfun->machine->frame.frame_size;
+ cfun->machine->frame.callee_adjust = const_size;
}
- else if ((crtl->outgoing_args_size
- + cfun->machine->frame.saved_regs_size < 512)
+ else if (known_lt (crtl->outgoing_args_size
+ + cfun->machine->frame.saved_regs_size, 512)
&& !(cfun->calls_alloca
- && cfun->machine->frame.hard_fp_offset < max_push_offset))
+ && known_lt (cfun->machine->frame.hard_fp_offset,
+ max_push_offset)))
{
/* Frame with small outgoing arguments:
sub sp, sp, frame_size
@@ -3109,13 +3131,14 @@ #define SLOT_REQUIRED (-1)
cfun->machine->frame.callee_offset
= cfun->machine->frame.frame_size - cfun->machine->frame.hard_fp_offset;
}
- else if (cfun->machine->frame.hard_fp_offset < max_push_offset)
+ else if (cfun->machine->frame.hard_fp_offset.is_constant (&const_fp_offset)
+ && const_fp_offset < max_push_offset)
{
/* Frame with large outgoing arguments but a small local area:
stp reg1, reg2, [sp, -hard_fp_offset]!
stp reg3, reg4, [sp, 16]
sub sp, sp, outgoing_args_size */
- cfun->machine->frame.callee_adjust = cfun->machine->frame.hard_fp_offset;
+ cfun->machine->frame.callee_adjust = const_fp_offset;
cfun->machine->frame.final_adjust
= cfun->machine->frame.frame_size - cfun->machine->frame.callee_adjust;
}
@@ -3328,7 +3351,7 @@ aarch64_return_address_signing_enabled (
skipping any write-back candidates if SKIP_WB is true. */
static void
-aarch64_save_callee_saves (machine_mode mode, HOST_WIDE_INT start_offset,
+aarch64_save_callee_saves (machine_mode mode, poly_int64 start_offset,
unsigned start, unsigned limit, bool skip_wb)
{
rtx_insn *insn;
@@ -3340,7 +3363,7 @@ aarch64_save_callee_saves (machine_mode
regno = aarch64_next_callee_save (regno + 1, limit))
{
rtx reg, mem;
- HOST_WIDE_INT offset;
+ poly_int64 offset;
if (skip_wb
&& (regno == cfun->machine->frame.wb_candidate1
@@ -3393,13 +3416,13 @@ aarch64_save_callee_saves (machine_mode
static void
aarch64_restore_callee_saves (machine_mode mode,
- HOST_WIDE_INT start_offset, unsigned start,
+ poly_int64 start_offset, unsigned start,
unsigned limit, bool skip_wb, rtx *cfi_ops)
{
rtx base_rtx = stack_pointer_rtx;
unsigned regno;
unsigned regno2;
- HOST_WIDE_INT offset;
+ poly_int64 offset;
for (regno = aarch64_next_callee_save (start, limit);
regno <= limit;
@@ -3444,25 +3467,27 @@ aarch64_restore_callee_saves (machine_mo
static inline bool
offset_9bit_signed_unscaled_p (machine_mode mode ATTRIBUTE_UNUSED,
- HOST_WIDE_INT offset)
+ poly_int64 offset)
{
- return offset >= -256 && offset < 256;
+ HOST_WIDE_INT const_offset;
+ return (offset.is_constant (&const_offset)
+ && IN_RANGE (const_offset, -256, 255));
}
static inline bool
-offset_12bit_unsigned_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
+offset_12bit_unsigned_scaled_p (machine_mode mode, poly_int64 offset)
{
- return (offset >= 0
- && offset < 4096 * GET_MODE_SIZE (mode)
- && offset % GET_MODE_SIZE (mode) == 0);
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, 0, 4095));
}
bool
-aarch64_offset_7bit_signed_scaled_p (machine_mode mode, HOST_WIDE_INT offset)
+aarch64_offset_7bit_signed_scaled_p (machine_mode mode, poly_int64 offset)
{
- return (offset >= -64 * GET_MODE_SIZE (mode)
- && offset < 64 * GET_MODE_SIZE (mode)
- && offset % GET_MODE_SIZE (mode) == 0);
+ HOST_WIDE_INT multiple;
+ return (constant_multiple_p (offset, GET_MODE_SIZE (mode), &multiple)
+ && IN_RANGE (multiple, -64, 63));
}
/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_COMPONENTS. */
@@ -3479,7 +3504,7 @@ aarch64_get_separate_components (void)
for (unsigned regno = 0; regno <= LAST_SAVED_REGNUM; regno++)
if (aarch64_register_saved_on_entry (regno))
{
- HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+ poly_int64 offset = cfun->machine->frame.reg_offset[regno];
if (!frame_pointer_needed)
offset += cfun->machine->frame.frame_size
- cfun->machine->frame.hard_fp_offset;
@@ -3583,7 +3608,7 @@ aarch64_process_components (sbitmap comp
so DFmode for the vector registers is enough. */
machine_mode mode = GP_REGNUM_P (regno) ? E_DImode : E_DFmode;
rtx reg = gen_rtx_REG (mode, regno);
- HOST_WIDE_INT offset = cfun->machine->frame.reg_offset[regno];
+ poly_int64 offset = cfun->machine->frame.reg_offset[regno];
if (!frame_pointer_needed)
offset += cfun->machine->frame.frame_size
- cfun->machine->frame.hard_fp_offset;
@@ -3605,13 +3630,13 @@ aarch64_process_components (sbitmap comp
break;
}
- HOST_WIDE_INT offset2 = cfun->machine->frame.reg_offset[regno2];
+ poly_int64 offset2 = cfun->machine->frame.reg_offset[regno2];
/* The next register is not of the same class or its offset is not
mergeable with the current one into a pair. */
if (!satisfies_constraint_Ump (mem)
|| GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
- || (offset2 - cfun->machine->frame.reg_offset[regno])
- != GET_MODE_SIZE (mode))
+ || maybe_ne ((offset2 - cfun->machine->frame.reg_offset[regno]),
+ GET_MODE_SIZE (mode)))
{
insn = emit_insn (set);
RTX_FRAME_RELATED_P (insn) = 1;
@@ -3681,11 +3706,19 @@ aarch64_set_handled_components (sbitmap
cfun->machine->reg_is_wrapped_separately[regno] = true;
}
-/* Allocate SIZE bytes of stack space using TEMP1 as a scratch register. */
+/* Allocate POLY_SIZE bytes of stack space using TEMP1 as a scratch
+ register. */
static void
-aarch64_allocate_and_probe_stack_space (rtx temp1, HOST_WIDE_INT size)
+aarch64_allocate_and_probe_stack_space (rtx temp1, poly_int64 poly_size)
{
+ HOST_WIDE_INT size;
+ if (!poly_size.is_constant (&size))
+ {
+ sorry ("stack probes for SVE frames");
+ return;
+ }
+
HOST_WIDE_INT probe_interval
= 1 << PARAM_VALUE (PARAM_STACK_CLASH_PROTECTION_PROBE_INTERVAL);
HOST_WIDE_INT guard_size
@@ -3845,11 +3878,11 @@ aarch64_expand_prologue (void)
{
aarch64_layout_frame ();
- HOST_WIDE_INT frame_size = cfun->machine->frame.frame_size;
- HOST_WIDE_INT initial_adjust = cfun->machine->frame.initial_adjust;
+ poly_int64 frame_size = cfun->machine->frame.frame_size;
+ poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
- HOST_WIDE_INT final_adjust = cfun->machine->frame.final_adjust;
- HOST_WIDE_INT callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+ poly_int64 callee_offset = cfun->machine->frame.callee_offset;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
bool emit_frame_chain = cfun->machine->frame.emit_frame_chain;
@@ -3864,19 +3897,19 @@ aarch64_expand_prologue (void)
}
if (flag_stack_usage_info)
- current_function_static_stack_size = frame_size;
+ current_function_static_stack_size = constant_lower_bound (frame_size);
if (flag_stack_check == STATIC_BUILTIN_STACK_CHECK)
{
if (crtl->is_leaf && !cfun->calls_alloca)
{
- if (frame_size > PROBE_INTERVAL
- && frame_size > get_stack_check_protect ())
+ if (maybe_gt (frame_size, PROBE_INTERVAL)
+ && maybe_gt (frame_size, get_stack_check_protect ()))
aarch64_emit_probe_stack_range (get_stack_check_protect (),
(frame_size
- get_stack_check_protect ()));
}
- else if (frame_size > 0)
+ else if (maybe_gt (frame_size, 0))
aarch64_emit_probe_stack_range (get_stack_check_protect (), frame_size);
}
@@ -3911,23 +3944,23 @@ aarch64_expand_prologue (void)
HOST_WIDE_INT guard_used_by_caller = 1024;
if (flag_stack_clash_protection)
{
- if (frame_size == 0)
+ if (known_eq (frame_size, 0))
dump_stack_clash_frame_info (NO_PROBE_NO_FRAME, false);
- else if (initial_adjust < guard_size - guard_used_by_caller
- && final_adjust < guard_size - guard_used_by_caller)
+ else if (known_lt (initial_adjust, guard_size - guard_used_by_caller)
+ && known_lt (final_adjust, guard_size - guard_used_by_caller))
dump_stack_clash_frame_info (NO_PROBE_SMALL_FRAME, true);
}
/* In theory we should never have both an initial adjustment
and a callee save adjustment. Verify that is the case since the
code below does not handle it for -fstack-clash-protection. */
- gcc_assert (initial_adjust == 0 || callee_adjust == 0);
+ gcc_assert (known_eq (initial_adjust, 0) || callee_adjust == 0);
/* Only probe if the initial adjustment is larger than the guard
less the amount of the guard reserved for use by the caller's
outgoing args. */
if (flag_stack_clash_protection
- && initial_adjust >= guard_size - guard_used_by_caller)
+ && maybe_ge (initial_adjust, guard_size - guard_used_by_caller))
aarch64_allocate_and_probe_stack_space (ip0_rtx, initial_adjust);
else
aarch64_sub_sp (ip0_rtx, initial_adjust, true);
@@ -3952,19 +3985,19 @@ aarch64_expand_prologue (void)
callee_adjust != 0 || emit_frame_chain);
/* We may need to probe the final adjustment as well. */
- if (flag_stack_clash_protection && final_adjust != 0)
+ if (flag_stack_clash_protection && maybe_ne (final_adjust, 0))
{
/* First probe if the final adjustment is larger than the guard size
less the amount of the guard reserved for use by the caller's
outgoing args. */
- if (final_adjust >= guard_size - guard_used_by_caller)
+ if (maybe_ge (final_adjust, guard_size - guard_used_by_caller))
aarch64_allocate_and_probe_stack_space (ip1_rtx, final_adjust);
else
aarch64_sub_sp (ip1_rtx, final_adjust, !frame_pointer_needed);
/* We must also probe if the final adjustment is larger than the guard
that is assumed used by the caller. This may be sub-optimal. */
- if (final_adjust >= guard_used_by_caller)
+ if (maybe_ge (final_adjust, guard_used_by_caller))
{
if (dump_file)
fprintf (dump_file,
@@ -3993,7 +4026,7 @@ aarch64_use_return_insn_p (void)
aarch64_layout_frame ();
- return cfun->machine->frame.frame_size == 0;
+ return known_eq (cfun->machine->frame.frame_size, 0);
}
/* Generate the epilogue instructions for returning from a function.
@@ -4006,21 +4039,23 @@ aarch64_expand_epilogue (bool for_sibcal
{
aarch64_layout_frame ();
- HOST_WIDE_INT initial_adjust = cfun->machine->frame.initial_adjust;
+ poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
- HOST_WIDE_INT final_adjust = cfun->machine->frame.final_adjust;
- HOST_WIDE_INT callee_offset = cfun->machine->frame.callee_offset;
+ poly_int64 final_adjust = cfun->machine->frame.final_adjust;
+ poly_int64 callee_offset = cfun->machine->frame.callee_offset;
unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
rtx cfi_ops = NULL;
rtx_insn *insn;
/* We need to add memory barrier to prevent read from deallocated stack. */
- bool need_barrier_p = (get_frame_size ()
- + cfun->machine->frame.saved_varargs_size) != 0;
+ bool need_barrier_p
+ = maybe_ne (get_frame_size ()
+ + cfun->machine->frame.saved_varargs_size, 0);
/* Emit a barrier to prevent loads from a deallocated stack. */
- if (final_adjust > crtl->outgoing_args_size || cfun->calls_alloca
+ if (maybe_gt (final_adjust, crtl->outgoing_args_size)
+ || cfun->calls_alloca
|| crtl->calls_eh_return)
{
emit_insn (gen_stack_tie (stack_pointer_rtx, stack_pointer_rtx));
@@ -4031,7 +4066,8 @@ aarch64_expand_epilogue (bool for_sibcal
be the same as the stack pointer. */
rtx ip0_rtx = gen_rtx_REG (Pmode, IP0_REGNUM);
rtx ip1_rtx = gen_rtx_REG (Pmode, IP1_REGNUM);
- if (frame_pointer_needed && (final_adjust || cfun->calls_alloca))
+ if (frame_pointer_needed
+ && (maybe_ne (final_adjust, 0) || cfun->calls_alloca))
/* If writeback is used when restoring callee-saves, the CFA
is restored on the instruction doing the writeback. */
aarch64_add_offset (Pmode, stack_pointer_rtx,
@@ -4055,7 +4091,7 @@ aarch64_expand_epilogue (bool for_sibcal
if (callee_adjust != 0)
aarch64_pop_regs (reg1, reg2, callee_adjust, &cfi_ops);
- if (callee_adjust != 0 || initial_adjust > 65536)
+ if (callee_adjust != 0 || maybe_gt (initial_adjust, 65536))
{
/* Emit delayed restores and set the CFA to be SP + initial_adjust. */
insn = get_last_insn ();
@@ -4656,9 +4692,9 @@ aarch64_classify_index (struct aarch64_a
&& contains_reg_of_mode[GENERAL_REGS][GET_MODE (SUBREG_REG (index))])
index = SUBREG_REG (index);
- if ((shift == 0 ||
- (shift > 0 && shift <= 3
- && (1 << shift) == GET_MODE_SIZE (mode)))
+ if ((shift == 0
+ || (shift > 0 && shift <= 3
+ && known_eq (1 << shift, GET_MODE_SIZE (mode))))
&& REG_P (index)
&& aarch64_regno_ok_for_index_p (REGNO (index), strict_p))
{
@@ -4680,7 +4716,7 @@ aarch64_mode_valid_for_sched_fusion_p (m
return mode == SImode || mode == DImode
|| mode == SFmode || mode == DFmode
|| (aarch64_vector_mode_supported_p (mode)
- && GET_MODE_SIZE (mode) == 8);
+ && known_eq (GET_MODE_SIZE (mode), 8));
}
/* Return true if REGNO is a virtual pointer register, or an eliminable
@@ -4706,6 +4742,7 @@ aarch64_classify_address (struct aarch64
{
enum rtx_code code = GET_CODE (x);
rtx op0, op1;
+ HOST_WIDE_INT const_size;
/* On BE, we use load/store pair for all large int mode load/stores.
TI/TFmode may also use a load/store pair. */
@@ -4715,10 +4752,10 @@ aarch64_classify_address (struct aarch64
|| (BYTES_BIG_ENDIAN
&& aarch64_vect_struct_mode_p (mode)));
- bool allow_reg_index_p =
- !load_store_pair_p
- && (GET_MODE_SIZE (mode) != 16 || aarch64_vector_mode_supported_p (mode))
- && !aarch64_vect_struct_mode_p (mode);
+ bool allow_reg_index_p = (!load_store_pair_p
+ && (maybe_ne (GET_MODE_SIZE (mode), 16)
+ || aarch64_vector_mode_supported_p (mode))
+ && !aarch64_vect_struct_mode_p (mode));
/* On LE, for AdvSIMD, don't support anything other than POST_INC or
REG addressing. */
@@ -4751,7 +4788,7 @@ aarch64_classify_address (struct aarch64
return true;
}
- if (GET_MODE_SIZE (mode) != 0
+ if (maybe_ne (GET_MODE_SIZE (mode), 0)
&& CONST_INT_P (op1)
&& aarch64_base_register_rtx_p (op0, strict_p))
{
@@ -4798,7 +4835,8 @@ aarch64_classify_address (struct aarch64
offset + 32));
if (load_store_pair_p)
- return ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
+ return ((known_eq (GET_MODE_SIZE (mode), 4)
+ || known_eq (GET_MODE_SIZE (mode), 8))
&& aarch64_offset_7bit_signed_scaled_p (mode, offset));
else
return (offset_9bit_signed_unscaled_p (mode, offset)
@@ -4858,7 +4896,8 @@ aarch64_classify_address (struct aarch64
&& offset_9bit_signed_unscaled_p (mode, offset));
if (load_store_pair_p)
- return ((GET_MODE_SIZE (mode) == 4 || GET_MODE_SIZE (mode) == 8)
+ return ((known_eq (GET_MODE_SIZE (mode), 4)
+ || known_eq (GET_MODE_SIZE (mode), 8))
&& aarch64_offset_7bit_signed_scaled_p (mode, offset));
else
return offset_9bit_signed_unscaled_p (mode, offset);
@@ -4872,7 +4911,9 @@ aarch64_classify_address (struct aarch64
for SI mode or larger. */
info->type = ADDRESS_SYMBOLIC;
- if (!load_store_pair_p && GET_MODE_SIZE (mode) >= 4)
+ if (!load_store_pair_p
+ && GET_MODE_SIZE (mode).is_constant (&const_size)
+ && const_size >= 4)
{
rtx sym, addend;
@@ -4898,7 +4939,6 @@ aarch64_classify_address (struct aarch64
{
/* The symbol and offset must be aligned to the access size. */
unsigned int align;
- unsigned int ref_size;
if (CONSTANT_POOL_ADDRESS_P (sym))
align = GET_MODE_ALIGNMENT (get_pool_mode (sym));
@@ -4916,12 +4956,12 @@ aarch64_classify_address (struct aarch64
else
align = BITS_PER_UNIT;
- ref_size = GET_MODE_SIZE (mode);
- if (ref_size == 0)
+ poly_int64 ref_size = GET_MODE_SIZE (mode);
+ if (known_eq (ref_size, 0))
ref_size = GET_MODE_SIZE (DImode);
- return ((INTVAL (offs) & (ref_size - 1)) == 0
- && ((align / BITS_PER_UNIT) & (ref_size - 1)) == 0);
+ return (multiple_p (INTVAL (offs), ref_size)
+ && multiple_p (align / BITS_PER_UNIT, ref_size));
}
}
return false;
@@ -4999,19 +5039,24 @@ aarch64_legitimate_address_p (machine_mo
static bool
aarch64_legitimize_address_displacement (rtx *disp, rtx *off, machine_mode mode)
{
- HOST_WIDE_INT offset = INTVAL (*disp);
- HOST_WIDE_INT base;
+ HOST_WIDE_INT size;
+ if (GET_MODE_SIZE (mode).is_constant (&size))
+ {
+ HOST_WIDE_INT offset = INTVAL (*disp);
+ HOST_WIDE_INT base;
- if (mode == TImode || mode == TFmode)
- base = (offset + 0x100) & ~0x1f8;
- else if ((offset & (GET_MODE_SIZE (mode) - 1)) != 0)
- base = (offset + 0x100) & ~0x1ff;
- else
- base = offset & ~(GET_MODE_SIZE (mode) < 4 ? 0xfff : 0x3ffc);
+ if (mode == TImode || mode == TFmode)
+ base = (offset + 0x100) & ~0x1f8;
+ else if ((offset & (size - 1)) != 0)
+ base = (offset + 0x100) & ~0x1ff;
+ else
+ base = offset & ~(size < 4 ? 0xfff : 0x3ffc);
- *off = GEN_INT (base);
- *disp = GEN_INT (offset - base);
- return true;
+ *off = GEN_INT (base);
+ *disp = GEN_INT (offset - base);
+ return true;
+ }
+ return false;
}
/* Return the binary representation of floating point constant VALUE in INTVAL.
@@ -5399,26 +5444,13 @@ aarch64_get_condition_code_1 (machine_mo
bool
aarch64_const_vec_all_same_in_range_p (rtx x,
- HOST_WIDE_INT minval,
- HOST_WIDE_INT maxval)
+ HOST_WIDE_INT minval,
+ HOST_WIDE_INT maxval)
{
- HOST_WIDE_INT firstval;
- int count, i;
-
- if (GET_CODE (x) != CONST_VECTOR
- || GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_INT)
- return false;
-
- firstval = INTVAL (CONST_VECTOR_ELT (x, 0));
- if (firstval < minval || firstval > maxval)
- return false;
-
- count = CONST_VECTOR_NUNITS (x);
- for (i = 1; i < count; i++)
- if (INTVAL (CONST_VECTOR_ELT (x, i)) != firstval)
- return false;
-
- return true;
+ rtx elt;
+ return (const_vec_duplicate_p (x, &elt)
+ && CONST_INT_P (elt)
+ && IN_RANGE (INTVAL (elt), minval, maxval));
}
bool
@@ -5860,7 +5892,7 @@ #define buf_size 20
machine_mode mode = GET_MODE (x);
if (GET_CODE (x) != MEM
- || (code == 'y' && GET_MODE_SIZE (mode) != 16))
+ || (code == 'y' && maybe_ne (GET_MODE_SIZE (mode), 16)))
{
output_operand_lossage ("invalid operand for '%%%c'", code);
return;
@@ -5891,6 +5923,7 @@ aarch64_print_address_internal (FILE *f,
aarch64_addr_query_type type)
{
struct aarch64_address_info addr;
+ unsigned int size;
/* Check all addresses are Pmode - including ILP32. */
gcc_assert (GET_MODE (x) == Pmode);
@@ -5934,30 +5967,28 @@ aarch64_print_address_internal (FILE *f,
return true;
case ADDRESS_REG_WB:
+ /* Writeback is only supported for fixed-width modes. */
+ size = GET_MODE_SIZE (mode).to_constant ();
switch (GET_CODE (x))
{
case PRE_INC:
- asm_fprintf (f, "[%s, %d]!", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s, %d]!", reg_names [REGNO (addr.base)], size);
return true;
case POST_INC:
- asm_fprintf (f, "[%s], %d", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s], %d", reg_names [REGNO (addr.base)], size);
return true;
case PRE_DEC:
- asm_fprintf (f, "[%s, -%d]!", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s, -%d]!", reg_names [REGNO (addr.base)], size);
return true;
case POST_DEC:
- asm_fprintf (f, "[%s], -%d", reg_names [REGNO (addr.base)],
- GET_MODE_SIZE (mode));
+ asm_fprintf (f, "[%s], -%d", reg_names [REGNO (addr.base)], size);
return true;
case PRE_MODIFY:
- asm_fprintf (f, "[%s, %wd]!", reg_names [REGNO (addr.base)],
+ asm_fprintf (f, "[%s, %wd]!", reg_names[REGNO (addr.base)],
INTVAL (addr.offset));
return true;
case POST_MODIFY:
- asm_fprintf (f, "[%s], %wd", reg_names [REGNO (addr.base)],
+ asm_fprintf (f, "[%s], %wd", reg_names[REGNO (addr.base)],
INTVAL (addr.offset));
return true;
default:
@@ -6048,6 +6079,39 @@ aarch64_regno_regclass (unsigned regno)
return NO_REGS;
}
+/* OFFSET is an address offset for mode MODE, which has SIZE bytes.
+ If OFFSET is out of range, return an offset of an anchor point
+ that is in range. Return 0 otherwise. */
+
+static HOST_WIDE_INT
+aarch64_anchor_offset (HOST_WIDE_INT offset, HOST_WIDE_INT size,
+ machine_mode mode)
+{
+ /* Does it look like we'll need a 16-byte load/store-pair operation? */
+ if (size > 16)
+ return (offset + 0x400) & ~0x7f0;
+
+ /* For offsets that aren't a multiple of the access size, the limit is
+ -256...255. */
+ if (offset & (size - 1))
+ {
+ /* BLKmode typically uses LDP of X-registers. */
+ if (mode == BLKmode)
+ return (offset + 512) & ~0x3ff;
+ return (offset + 0x100) & ~0x1ff;
+ }
+
+ /* Small negative offsets are supported. */
+ if (IN_RANGE (offset, -256, 0))
+ return 0;
+
+ if (mode == TImode || mode == TFmode)
+ return (offset + 0x100) & ~0x1ff;
+
+ /* Use 12-bit offset by access size. */
+ return offset & (~0xfff * size);
+}
+
static rtx
aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode)
{
@@ -6097,34 +6161,17 @@ aarch64_legitimize_address (rtx x, rtx /
x = gen_rtx_PLUS (Pmode, base, offset_rtx);
}
- /* Does it look like we'll need a 16-byte load/store-pair operation? */
- HOST_WIDE_INT base_offset;
- if (GET_MODE_SIZE (mode) > 16)
- base_offset = (offset + 0x400) & ~0x7f0;
- /* For offsets aren't a multiple of the access size, the limit is
- -256...255. */
- else if (offset & (GET_MODE_SIZE (mode) - 1))
- {
- base_offset = (offset + 0x100) & ~0x1ff;
-
- /* BLKmode typically uses LDP of X-registers. */
- if (mode == BLKmode)
- base_offset = (offset + 512) & ~0x3ff;
- }
- /* Small negative offsets are supported. */
- else if (IN_RANGE (offset, -256, 0))
- base_offset = 0;
- else if (mode == TImode || mode == TFmode)
- base_offset = (offset + 0x100) & ~0x1ff;
- /* Use 12-bit offset by access size. */
- else
- base_offset = offset & (~0xfff * GET_MODE_SIZE (mode));
-
- if (base_offset != 0)
+ HOST_WIDE_INT size;
+ if (GET_MODE_SIZE (mode).is_constant (&size))
{
- base = plus_constant (Pmode, base, base_offset);
- base = force_operand (base, NULL_RTX);
- return plus_constant (Pmode, base, offset - base_offset);
+ HOST_WIDE_INT base_offset = aarch64_anchor_offset (offset, size,
+ mode);
+ if (base_offset != 0)
+ {
+ base = plus_constant (Pmode, base, base_offset);
+ base = force_operand (base, NULL_RTX);
+ return plus_constant (Pmode, base, offset - base_offset);
+ }
}
}
@@ -6211,7 +6258,7 @@ aarch64_secondary_reload (bool in_p ATTR
because AArch64 has richer addressing modes for LDR/STR instructions
than LDP/STP instructions. */
if (TARGET_FLOAT && rclass == GENERAL_REGS
- && GET_MODE_SIZE (mode) == 16 && MEM_P (x))
+ && known_eq (GET_MODE_SIZE (mode), 16) && MEM_P (x))
return FP_REGS;
if (rclass == FP_REGS && (mode == TImode || mode == TFmode) && CONSTANT_P(x))
@@ -6232,7 +6279,7 @@ aarch64_can_eliminate (const int from AT
return true;
}
-HOST_WIDE_INT
+poly_int64
aarch64_initial_elimination_offset (unsigned from, unsigned to)
{
aarch64_layout_frame ();
@@ -6318,6 +6365,11 @@ aarch64_trampoline_init (rtx m_tramp, tr
static unsigned char
aarch64_class_max_nregs (reg_class_t regclass, machine_mode mode)
{
+ /* ??? Logically we should only need to provide a value when
+ HARD_REGNO_MODE_OK says that at least one register in REGCLASS
+ can hold MODE, but at the moment we need to handle all modes.
+ Just ignore any runtime parts for registers that can't store them. */
+ HOST_WIDE_INT lowest_size = constant_lower_bound (GET_MODE_SIZE (mode));
switch (regclass)
{
case CALLER_SAVE_REGS:
@@ -6327,10 +6379,9 @@ aarch64_class_max_nregs (reg_class_t reg
case POINTER_AND_FP_REGS:
case FP_REGS:
case FP_LO_REGS:
- return
- aarch64_vector_mode_p (mode)
- ? (GET_MODE_SIZE (mode) + UNITS_PER_VREG - 1) / UNITS_PER_VREG
- : (GET_MODE_SIZE (mode) + UNITS_PER_WORD - 1) / UNITS_PER_WORD;
+ return (aarch64_vector_mode_p (mode)
+ ? CEIL (lowest_size, UNITS_PER_VREG)
+ : CEIL (lowest_size, UNITS_PER_WORD));
case STACK_REG:
return 1;
@@ -6881,25 +6932,15 @@ aarch64_address_cost (rtx x,
{
/* For the sake of calculating the cost of the shifted register
component, we can treat same sized modes in the same way. */
- switch (GET_MODE_BITSIZE (mode))
- {
- case 16:
- cost += addr_cost->addr_scale_costs.hi;
- break;
-
- case 32:
- cost += addr_cost->addr_scale_costs.si;
- break;
-
- case 64:
- cost += addr_cost->addr_scale_costs.di;
- break;
-
- /* We can't tell, or this is a 128-bit vector. */
- default:
- cost += addr_cost->addr_scale_costs.ti;
- break;
- }
+ if (known_eq (GET_MODE_BITSIZE (mode), 16))
+ cost += addr_cost->addr_scale_costs.hi;
+ else if (known_eq (GET_MODE_BITSIZE (mode), 32))
+ cost += addr_cost->addr_scale_costs.si;
+ else if (known_eq (GET_MODE_BITSIZE (mode), 64))
+ cost += addr_cost->addr_scale_costs.di;
+ else
+ /* We can't tell, or this is a 128-bit vector. */
+ cost += addr_cost->addr_scale_costs.ti;
}
return cost;
@@ -8028,7 +8069,8 @@ aarch64_rtx_costs (rtx x, machine_mode m
if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
&& CONST_INT_P (XEXP (op1, 1))
- && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
+ && known_eq (INTVAL (XEXP (op1, 1)),
+ GET_MODE_BITSIZE (mode) - 1))
{
*cost += rtx_cost (op0, mode, (rtx_code) code, 0, speed);
/* We already demanded XEXP (op1, 0) to be REG_P, so
@@ -8076,7 +8118,8 @@ aarch64_rtx_costs (rtx x, machine_mode m
if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
&& CONST_INT_P (XEXP (op1, 1))
- && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
+ && known_eq (INTVAL (XEXP (op1, 1)),
+ GET_MODE_BITSIZE (mode) - 1))
{
*cost += rtx_cost (op0, mode, (rtx_code) code, 0, speed);
/* We already demanded XEXP (op1, 0) to be REG_P, so
@@ -8502,7 +8545,7 @@ aarch64_register_move_cost (machine_mode
return aarch64_register_move_cost (mode, from, GENERAL_REGS)
+ aarch64_register_move_cost (mode, GENERAL_REGS, to);
- if (GET_MODE_SIZE (mode) == 16)
+ if (known_eq (GET_MODE_SIZE (mode), 16))
{
/* 128-bit operations on general registers require 2 instructions. */
if (from == GENERAL_REGS && to == GENERAL_REGS)
@@ -8878,7 +8921,7 @@ aarch64_builtin_vectorization_cost (enum
return fp ? costs->vec_fp_stmt_cost : costs->vec_int_stmt_cost;
case vec_construct:
- elements = TYPE_VECTOR_SUBPARTS (vectype);
+ elements = estimated_poly_value (TYPE_VECTOR_SUBPARTS (vectype));
return elements / 2 + 1;
default:
@@ -10925,6 +10968,10 @@ aarch64_gimplify_va_arg_expr (tree valis
&nregs,
&is_ha))
{
+ /* No frontends can create types with variable-sized modes, so we
+ shouldn't be asked to pass or return them. */
+ unsigned int ag_size = GET_MODE_SIZE (ag_mode).to_constant ();
+
/* TYPE passed in fp/simd registers. */
if (!TARGET_FLOAT)
aarch64_err_no_fpadvsimd (mode, "varargs");
@@ -10938,8 +10985,8 @@ aarch64_gimplify_va_arg_expr (tree valis
if (is_ha)
{
- if (BYTES_BIG_ENDIAN && GET_MODE_SIZE (ag_mode) < UNITS_PER_VREG)
- adjust = UNITS_PER_VREG - GET_MODE_SIZE (ag_mode);
+ if (BYTES_BIG_ENDIAN && ag_size < UNITS_PER_VREG)
+ adjust = UNITS_PER_VREG - ag_size;
}
else if (BLOCK_REG_PADDING (mode, type, 1) == PAD_DOWNWARD
&& size < UNITS_PER_VREG)
@@ -11327,8 +11374,8 @@ aapcs_vfp_sub_candidate (const_tree type
- tree_to_uhwi (TYPE_MIN_VALUE (index)));
/* There must be no padding. */
- if (wi::to_wide (TYPE_SIZE (type))
- != count * GET_MODE_BITSIZE (*modep))
+ if (maybe_ne (wi::to_poly_wide (TYPE_SIZE (type)),
+ count * GET_MODE_BITSIZE (*modep)))
return -1;
return count;
@@ -11358,8 +11405,8 @@ aapcs_vfp_sub_candidate (const_tree type
}
/* There must be no padding. */
- if (wi::to_wide (TYPE_SIZE (type))
- != count * GET_MODE_BITSIZE (*modep))
+ if (maybe_ne (wi::to_poly_wide (TYPE_SIZE (type)),
+ count * GET_MODE_BITSIZE (*modep)))
return -1;
return count;
@@ -11391,8 +11438,8 @@ aapcs_vfp_sub_candidate (const_tree type
}
/* There must be no padding. */
- if (wi::to_wide (TYPE_SIZE (type))
- != count * GET_MODE_BITSIZE (*modep))
+ if (maybe_ne (wi::to_poly_wide (TYPE_SIZE (type)),
+ count * GET_MODE_BITSIZE (*modep)))
return -1;
return count;
@@ -11414,7 +11461,7 @@ aapcs_vfp_sub_candidate (const_tree type
aarch64_short_vector_p (const_tree type,
machine_mode mode)
{
- HOST_WIDE_INT size = -1;
+ poly_int64 size = -1;
if (type && TREE_CODE (type) == VECTOR_TYPE)
size = int_size_in_bytes (type);
@@ -11422,7 +11469,7 @@ aarch64_short_vector_p (const_tree type,
|| GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
size = GET_MODE_SIZE (mode);
- return (size == 8 || size == 16);
+ return known_eq (size, 8) || known_eq (size, 16);
}
/* Return TRUE if the type, as described by TYPE and MODE, is a composite
@@ -11874,8 +11921,9 @@ aarch64_simd_valid_immediate (rtx op, si
unsigned int n_elts;
if (const_vec_duplicate_p (op, &elt))
n_elts = 1;
- else if (GET_CODE (op) == CONST_VECTOR)
- n_elts = CONST_VECTOR_NUNITS (op);
+ else if (GET_CODE (op) == CONST_VECTOR
+ && CONST_VECTOR_NUNITS (op).is_constant (&n_elts))
+ ;
else
return false;
@@ -12064,11 +12112,11 @@ aarch64_simd_vect_par_cnst_half (machine
aarch64_simd_check_vect_par_cnst_half (rtx op, machine_mode mode,
bool high)
{
- if (!VECTOR_MODE_P (mode))
+ int nelts;
+ if (!VECTOR_MODE_P (mode) || !GET_MODE_NUNITS (mode).is_constant (&nelts))
return false;
- rtx ideal = aarch64_simd_vect_par_cnst_half (mode, GET_MODE_NUNITS (mode),
- high);
+ rtx ideal = aarch64_simd_vect_par_cnst_half (mode, nelts, high);
HOST_WIDE_INT count_op = XVECLEN (op, 0);
HOST_WIDE_INT count_ideal = XVECLEN (ideal, 0);
int i = 0;
@@ -12153,7 +12201,8 @@ aarch64_simd_emit_reg_reg_move (rtx *ope
int
aarch64_simd_attr_length_rglist (machine_mode mode)
{
- return (GET_MODE_SIZE (mode) / UNITS_PER_VREG) * 4;
+ /* This is only used (and only meaningful) for Advanced SIMD, not SVE. */
+ return (GET_MODE_SIZE (mode).to_constant () / UNITS_PER_VREG) * 4;
}
/* Implement target hook TARGET_VECTOR_ALIGNMENT. The AAPCS64 sets the maximum
@@ -12233,7 +12282,6 @@ aarch64_simd_make_constant (rtx vals)
machine_mode mode = GET_MODE (vals);
rtx const_dup;
rtx const_vec = NULL_RTX;
- int n_elts = GET_MODE_NUNITS (mode);
int n_const = 0;
int i;
@@ -12244,6 +12292,7 @@ aarch64_simd_make_constant (rtx vals)
/* A CONST_VECTOR must contain only CONST_INTs and
CONST_DOUBLEs, but CONSTANT_P allows more (e.g. SYMBOL_REF).
Only store valid constants in a CONST_VECTOR. */
+ int n_elts = XVECLEN (vals, 0);
for (i = 0; i < n_elts; ++i)
{
rtx x = XVECEXP (vals, 0, i);
@@ -12282,7 +12331,7 @@ aarch64_expand_vector_init (rtx target,
machine_mode mode = GET_MODE (target);
scalar_mode inner_mode = GET_MODE_INNER (mode);
/* The number of vector elements. */
- int n_elts = GET_MODE_NUNITS (mode);
+ int n_elts = XVECLEN (vals, 0);
/* The number of vector elements which are not constant. */
int n_var = 0;
rtx any_const = NULL_RTX;
@@ -12464,7 +12513,9 @@ aarch64_shift_truncation_mask (machine_m
return
(!SHIFT_COUNT_TRUNCATED
|| aarch64_vector_mode_supported_p (mode)
- || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - 1);
+ || aarch64_vect_struct_mode_p (mode))
+ ? 0
+ : (GET_MODE_UNIT_BITSIZE (mode) - 1);
}
/* Select a format to encode pointers in exception handling data. */
@@ -13587,7 +13638,8 @@ aarch64_expand_vec_perm (rtx target, rtx
static bool
aarch64_evpc_trn (struct expand_vec_perm_d *d)
{
- unsigned int odd, nelt = d->perm.length ();
+ HOST_WIDE_INT odd;
+ poly_uint64 nelt = d->perm.length ();
rtx out, in0, in1, x;
machine_mode vmode = d->vmode;
@@ -13596,8 +13648,8 @@ aarch64_evpc_trn (struct expand_vec_perm
/* Note that these are little-endian tests.
We correct for big-endian later. */
- odd = d->perm[0];
- if ((odd != 0 && odd != 1)
+ if (!d->perm[0].is_constant (&odd)
+ || (odd != 0 && odd != 1)
|| !d->perm.series_p (0, 2, odd, 2)
|| !d->perm.series_p (1, 2, nelt + odd, 2))
return false;
@@ -13624,7 +13676,7 @@ aarch64_evpc_trn (struct expand_vec_perm
static bool
aarch64_evpc_uzp (struct expand_vec_perm_d *d)
{
- unsigned int odd;
+ HOST_WIDE_INT odd;
rtx out, in0, in1, x;
machine_mode vmode = d->vmode;
@@ -13633,8 +13685,8 @@ aarch64_evpc_uzp (struct expand_vec_perm
/* Note that these are little-endian tests.
We correct for big-endian later. */
- odd = d->perm[0];
- if ((odd != 0 && odd != 1)
+ if (!d->perm[0].is_constant (&odd)
+ || (odd != 0 && odd != 1)
|| !d->perm.series_p (0, 1, odd, 2))
return false;
@@ -13660,7 +13712,8 @@ aarch64_evpc_uzp (struct expand_vec_perm
static bool
aarch64_evpc_zip (struct expand_vec_perm_d *d)
{
- unsigned int high, nelt = d->perm.length ();
+ unsigned int high;
+ poly_uint64 nelt = d->perm.length ();
rtx out, in0, in1, x;
machine_mode vmode = d->vmode;
@@ -13669,11 +13722,12 @@ aarch64_evpc_zip (struct expand_vec_perm
/* Note that these are little-endian tests.
We correct for big-endian later. */
- high = d->perm[0];
- if ((high != 0 && high * 2 != nelt)
- || !d->perm.series_p (0, 2, high, 1)
- || !d->perm.series_p (1, 2, high + nelt, 1))
+ poly_uint64 first = d->perm[0];
+ if ((maybe_ne (first, 0U) && maybe_ne (first * 2, nelt))
+ || !d->perm.series_p (0, 2, first, 1)
+ || !d->perm.series_p (1, 2, first + nelt, 1))
return false;
+ high = maybe_ne (first, 0U);
/* Success! */
if (d->testing_p)
@@ -13698,13 +13752,13 @@ aarch64_evpc_zip (struct expand_vec_perm
static bool
aarch64_evpc_ext (struct expand_vec_perm_d *d)
{
- unsigned int nelt = d->perm.length ();
+ HOST_WIDE_INT location;
rtx offset;
- unsigned int location = d->perm[0]; /* Always < nelt. */
-
- /* Check if the extracted indices are increasing by one. */
- if (!d->perm.series_p (0, 1, location, 1))
+ /* The first element always refers to the first vector.
+ Check if the extracted indices are increasing by one. */
+ if (!d->perm[0].is_constant (&location)
+ || !d->perm.series_p (0, 1, location, 1))
return false;
/* Success! */
@@ -13720,8 +13774,10 @@ aarch64_evpc_ext (struct expand_vec_perm
at the LSB end of the register), and the low elements of the second
vector (stored at the MSB end of the register). So swap. */
std::swap (d->op0, d->op1);
- /* location != 0 (above), so safe to assume (nelt - location) < nelt. */
- location = nelt - location;
+ /* location != 0 (above), so safe to assume (nelt - location) < nelt.
+ to_constant () is safe since this is restricted to Advanced SIMD
+ vectors. */
+ location = d->perm.length ().to_constant () - location;
}
offset = GEN_INT (location);
@@ -13737,12 +13793,13 @@ aarch64_evpc_ext (struct expand_vec_perm
static bool
aarch64_evpc_rev (struct expand_vec_perm_d *d)
{
- unsigned int i, diff, size, unspec;
+ HOST_WIDE_INT diff;
+ unsigned int i, size, unspec;
- if (!d->one_vector_p)
+ if (!d->one_vector_p
+ || !d->perm[0].is_constant (&diff))
return false;
- diff = d->perm[0];
size = (diff + 1) * GET_MODE_UNIT_SIZE (d->vmode);
if (size == 8)
unspec = UNSPEC_REV64;
@@ -13772,19 +13829,18 @@ aarch64_evpc_dup (struct expand_vec_perm
{
rtx out = d->target;
rtx in0;
+ HOST_WIDE_INT elt;
machine_mode vmode = d->vmode;
- unsigned int elt;
rtx lane;
- if (d->perm.encoding ().encoded_nelts () != 1)
+ if (d->perm.encoding ().encoded_nelts () != 1
+ || !d->perm[0].is_constant (&elt))
return false;
/* Success! */
if (d->testing_p)
return true;
- elt = d->perm[0];
-
/* The generic preparation in aarch64_expand_vec_perm_const_1
swaps the operand order and the permute indices if it finds
d->perm[0] to be in the second operand. Thus, we can always
@@ -13804,7 +13860,12 @@ aarch64_evpc_tbl (struct expand_vec_perm
{
rtx rperm[MAX_VECT_LEN], sel;
machine_mode vmode = d->vmode;
- unsigned int i, nelt = d->perm.length ();
+
+ /* Make sure that the indices are constant. */
+ unsigned int encoded_nelts = d->perm.encoding ().encoded_nelts ();
+ for (unsigned int i = 0; i < encoded_nelts; ++i)
+ if (!d->perm[i].is_constant ())
+ return false;
if (d->testing_p)
return true;
@@ -13815,16 +13876,17 @@ aarch64_evpc_tbl (struct expand_vec_perm
if (vmode != V8QImode && vmode != V16QImode)
return false;
- for (i = 0; i < nelt; ++i)
- {
- int nunits = GET_MODE_NUNITS (vmode);
+ /* to_constant is safe since this routine is specific to Advanced SIMD
+ vectors. */
+ unsigned int nelt = d->perm.length ().to_constant ();
+ for (unsigned int i = 0; i < nelt; ++i)
+ /* If big-endian and two vectors we end up with a weird mixed-endian
+ mode on NEON. Reverse the index within each word but not the word
+ itself. to_constant is safe because we checked is_constant above. */
+ rperm[i] = GEN_INT (BYTES_BIG_ENDIAN
+ ? d->perm[i].to_constant () ^ (nelt - 1)
+ : d->perm[i].to_constant ());
- /* If big-endian and two vectors we end up with a weird mixed-endian
- mode on NEON. Reverse the index within each word but not the word
- itself. */
- rperm[i] = GEN_INT (BYTES_BIG_ENDIAN ? d->perm[i] ^ (nunits - 1)
- : (HOST_WIDE_INT) d->perm[i]);
- }
sel = gen_rtx_CONST_VECTOR (vmode, gen_rtvec_v (nelt, rperm));
sel = force_reg (vmode, sel);
@@ -13838,14 +13900,14 @@ aarch64_expand_vec_perm_const_1 (struct
/* The pattern matching functions above are written to look for a small
number to begin the sequence (0, 1, N/2). If we begin with an index
from the second operand, we can swap the operands. */
- unsigned int nelt = d->perm.length ();
- if (d->perm[0] >= nelt)
+ poly_int64 nelt = d->perm.length ();
+ if (known_ge (d->perm[0], nelt))
{
d->perm.rotate_inputs (1);
std::swap (d->op0, d->op1);
}
- if (TARGET_SIMD && nelt > 1)
+ if (TARGET_SIMD && known_gt (nelt, 1))
{
if (aarch64_evpc_rev (d))
return true;
@@ -13961,7 +14023,7 @@ aarch64_modes_tieable_p (machine_mode mo
AMOUNT bytes. */
static rtx
-aarch64_move_pointer (rtx pointer, int amount)
+aarch64_move_pointer (rtx pointer, poly_int64 amount)
{
rtx next = plus_constant (Pmode, XEXP (pointer, 0), amount);
@@ -13975,9 +14037,7 @@ aarch64_move_pointer (rtx pointer, int a
static rtx
aarch64_progress_pointer (rtx pointer)
{
- HOST_WIDE_INT amount = GET_MODE_SIZE (GET_MODE (pointer));
-
- return aarch64_move_pointer (pointer, amount);
+ return aarch64_move_pointer (pointer, GET_MODE_SIZE (GET_MODE (pointer)));
}
/* Copy one MODE sized block from SRC to DST, then progress SRC and DST by
@@ -14788,7 +14848,9 @@ aarch64_operands_ok_for_ldpstp (rtx *ope
offval_1 = INTVAL (offset_1);
offval_2 = INTVAL (offset_2);
- msize = GET_MODE_SIZE (mode);
+ /* We should only be trying this for fixed-sized modes. There is no
+ SVE LDP/STP instruction. */
+ msize = GET_MODE_SIZE (mode).to_constant ();
/* Check if the offsets are consecutive. */
if (offval_1 != (offval_2 + msize) && offval_2 != (offval_1 + msize))
return false;
@@ -15148,7 +15210,9 @@ aarch64_fpconst_pow_of_2 (rtx x)
int
aarch64_vec_fpconst_pow_of_2 (rtx x)
{
- if (GET_CODE (x) != CONST_VECTOR)
+ int nelts;
+ if (GET_CODE (x) != CONST_VECTOR
+ || !CONST_VECTOR_NUNITS (x).is_constant (&nelts))
return -1;
if (GET_MODE_CLASS (GET_MODE (x)) != MODE_VECTOR_FLOAT)
@@ -15158,7 +15222,7 @@ aarch64_vec_fpconst_pow_of_2 (rtx x)
if (firstval <= 0)
return -1;
- for (int i = 1; i < CONST_VECTOR_NUNITS (x); i++)
+ for (int i = 1; i < nelts; i++)
if (aarch64_fpconst_pow_of_2 (CONST_VECTOR_ELT (x, i)) != firstval)
return -1;
Index: gcc/config/aarch64/aarch64.md
===================================================================
--- gcc/config/aarch64/aarch64.md 2018-01-05 11:24:44.647408566 +0000
+++ gcc/config/aarch64/aarch64.md 2018-01-05 11:24:44.868399534 +0000
@@ -3328,7 +3328,7 @@ (define_insn "aarch64_<crc_variant>"
CRC))]
"TARGET_CRC32"
{
- if (GET_MODE_BITSIZE (GET_MODE (operands[2])) >= 64)
+ if (GET_MODE_BITSIZE (<crc_mode>mode) >= 64)
return "<crc_variant>\\t%w0, %w1, %x2";
else
return "<crc_variant>\\t%w0, %w1, %w2";
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: PING: [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2
2018-01-05 11:27 ` PING: " Richard Sandiford
@ 2018-01-06 17:57 ` James Greenhalgh
2018-01-06 19:03 ` Richard Sandiford
0 siblings, 1 reply; 29+ messages in thread
From: James Greenhalgh @ 2018-01-06 17:57 UTC (permalink / raw)
To: Richard Sandiford; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
On Fri, Jan 05, 2018 at 11:26:59AM +0000, Richard Sandiford wrote:
> Ping. Here's the patch updated to apply on top of the v8.4 and
> __builtin_load_no_speculate support.
>
> Richard Sandiford <richard.sandiford@linaro.org> writes:
> > This patch switches the AArch64 port to use 2 poly_int coefficients
> > and updates code as necessary to keep it compiling.
> >
> > One potentially-significant change is to
> > aarch64_hard_regno_caller_save_mode. The old implementation
> > was written in a pretty conservative way: it changed the default
> > behaviour for single-register values, but used the default handling
> > for multi-register values.
> >
> > I don't think that's necessary, since the interesting cases for this
> > macro are usually the single-register ones. Multi-register modes take
> > up the whole of the constituent registers and the move patterns for all
> > multi-register modes should be equally good.
> >
> > Using the original mode for multi-register cases stops us from using
> > SVE modes to spill multi-register NEON values. This was caught by
> > gcc.c-torture/execute/pr47538.c.
> >
> > Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
> > GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
> > (which are all scalars), and I think it's more obvious, since if we ever
> > do use this for elementwise shifts of vector modes, the mask will depend
> > on the number of bits in each element rather than the number of bits in
> > the whole vector.
This is OK for trunk, with whatever modifications are needed to make the
rebase work. I do have one question and one minor comment; you can assume
that if you do make modifications in response to these that the patch is
still OK.
> Index: gcc/config/aarch64/aarch64.c
> ===================================================================
> --- gcc/config/aarch64/aarch64.c 2018-01-05 11:24:44.647408566 +0000
> +++ gcc/config/aarch64/aarch64.c 2018-01-05 11:24:44.867399574 +0000
> @@ -2262,8 +2258,12 @@ aarch64_pass_by_reference (cumulative_ar
> int nregs;
>
> /* GET_MODE_SIZE (BLKmode) is useless since it is 0. */
> - size = (mode == BLKmode && type)
> - ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode);
> + if (mode == BLKmode && type)
> + size = int_size_in_bytes (type);
> + else
> + /* No frontends can create types with variable-sized modes, so we
> + shouldn't be asked to pass or return them. */
> + size = GET_MODE_SIZE (mode).to_constant ();
I presume that the assertion in your comment is checked in the
GET_MODE_SIZE (mode).to_constant (); call? If not, is it worth making the
assert explicit here?
> @@ -11874,8 +11921,9 @@ aarch64_simd_valid_immediate (rtx op, si
> unsigned int n_elts;
> if (const_vec_duplicate_p (op, &elt))
> n_elts = 1;
> - else if (GET_CODE (op) == CONST_VECTOR)
> - n_elts = CONST_VECTOR_NUNITS (op);
> + else if (GET_CODE (op) == CONST_VECTOR
> + && CONST_VECTOR_NUNITS (op).is_constant (&n_elts))
> + ;
> else
> return false;
>
A comment in the empty else if case would be useful for clarity.
Thanks,
James
^ permalink raw reply [flat|nested] 29+ messages in thread
* Re: PING: [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2
2018-01-06 17:57 ` James Greenhalgh
@ 2018-01-06 19:03 ` Richard Sandiford
0 siblings, 0 replies; 29+ messages in thread
From: Richard Sandiford @ 2018-01-06 19:03 UTC (permalink / raw)
To: James Greenhalgh; +Cc: gcc-patches, Richard Earnshaw, Marcus Shawcroft, nd
James Greenhalgh <james.greenhalgh@arm.com> writes:
> On Fri, Jan 05, 2018 at 11:26:59AM +0000, Richard Sandiford wrote:
>> Ping. Here's the patch updated to apply on top of the v8.4 and
>> __builtin_load_no_speculate support.
>>
>> Richard Sandiford <richard.sandiford@linaro.org> writes:
>> > This patch switches the AArch64 port to use 2 poly_int coefficients
>> > and updates code as necessary to keep it compiling.
>> >
>> > One potentially-significant change is to
>> > aarch64_hard_regno_caller_save_mode. The old implementation
>> > was written in a pretty conservative way: it changed the default
>> > behaviour for single-register values, but used the default handling
>> > for multi-register values.
>> >
>> > I don't think that's necessary, since the interesting cases for this
>> > macro are usually the single-register ones. Multi-register modes take
>> > up the whole of the constituent registers and the move patterns for all
>> > multi-register modes should be equally good.
>> >
>> > Using the original mode for multi-register cases stops us from using
>> > SVE modes to spill multi-register NEON values. This was caught by
>> > gcc.c-torture/execute/pr47538.c.
>> >
>> > Also, aarch64_shift_truncation_mask used GET_MODE_BITSIZE - 1.
>> > GET_MODE_UNIT_BITSIZE - 1 is equivalent for the cases that it handles
>> > (which are all scalars), and I think it's more obvious, since if we ever
>> > do use this for elementwise shifts of vector modes, the mask will depend
>> > on the number of bits in each element rather than the number of bits in
>> > the whole vector.
>
> This is OK for trunk, with whatever modifications are needed to make the
> rebase work. I do have one question and one minor comment; you can assume
> that if you do make modifications in response to these that the patch is
> still OK.
Thanks!
>> Index: gcc/config/aarch64/aarch64.c
>> ===================================================================
>> --- gcc/config/aarch64/aarch64.c 2018-01-05 11:24:44.647408566 +0000
>> +++ gcc/config/aarch64/aarch64.c 2018-01-05 11:24:44.867399574 +0000
>> @@ -2262,8 +2258,12 @@ aarch64_pass_by_reference (cumulative_ar
>> int nregs;
>>
>> /* GET_MODE_SIZE (BLKmode) is useless since it is 0. */
>> - size = (mode == BLKmode && type)
>> - ? int_size_in_bytes (type) : (int) GET_MODE_SIZE (mode);
>> + if (mode == BLKmode && type)
>> + size = int_size_in_bytes (type);
>> + else
>> + /* No frontends can create types with variable-sized modes, so we
>> + shouldn't be asked to pass or return them. */
>> + size = GET_MODE_SIZE (mode).to_constant ();
>
> I presume that the assertion in your comment is checked in the
> GET_MODE_SIZE (mode).to_constant (); call? If not, is it worth making the
> assert explicit here?
Yeah, to_constant is an asserting operation.
Maybe my history with bad naming decisions is cropping up again
here though. Perhaps it would be clearer as "force_constant" or
"require_constant" instead?
>> @@ -11874,8 +11921,9 @@ aarch64_simd_valid_immediate (rtx op, si
>> unsigned int n_elts;
>> if (const_vec_duplicate_p (op, &elt))
>> n_elts = 1;
>> - else if (GET_CODE (op) == CONST_VECTOR)
>> - n_elts = CONST_VECTOR_NUNITS (op);
>> + else if (GET_CODE (op) == CONST_VECTOR
>> + && CONST_VECTOR_NUNITS (op).is_constant (&n_elts))
>> + ;
>> else
>> return false;
>>
>
> A comment in the empty else if case would be useful for clarity.
OK, I'll go with:
/* N_ELTS set above. */;
Thanks,
Richard
^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2018-01-06 19:03 UTC | newest]
Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-10-27 13:22 [00/nn] AArch64 patches preparing for SVE Richard Sandiford
2017-10-27 13:23 ` [01/nn] [AArch64] Generate permute patterns using rtx builders Richard Sandiford
2017-10-31 18:02 ` James Greenhalgh
2017-11-02 9:03 ` Richard Sandiford
2017-10-27 13:25 ` [02/nn] [AArch64] Move code around Richard Sandiford
2017-10-31 18:03 ` James Greenhalgh
2017-10-27 13:26 ` [03/nn] [AArch64] Rework interface to add constant/offset routines Richard Sandiford
2017-10-30 11:03 ` Richard Sandiford
2017-11-10 15:43 ` James Greenhalgh
2017-10-27 13:27 ` [04/nn] [AArch64] Rename the internal "Upl" constraint Richard Sandiford
2017-10-31 18:04 ` James Greenhalgh
2017-10-27 13:28 ` [05/nn] [AArch64] Rewrite aarch64_simd_valid_immediate Richard Sandiford
2017-11-10 11:20 ` James Greenhalgh
2017-10-27 13:28 ` [06/nn] [AArch64] Add an endian_lane_rtx helper routine Richard Sandiford
2017-11-02 9:55 ` James Greenhalgh
2017-10-27 13:29 ` [07/nn] [AArch64] Pass number of units to aarch64_reverse_mask Richard Sandiford
2017-11-02 9:56 ` James Greenhalgh
2017-10-27 13:29 ` [08/nn] [AArch64] Pass number of units to aarch64_simd_vect_par_cnst_half Richard Sandiford
2017-11-02 9:59 ` James Greenhalgh
2017-10-27 13:30 ` [09/nn] [AArch64] Pass number of units to aarch64_expand_vec_perm(_const) Richard Sandiford
2017-11-02 10:00 ` James Greenhalgh
2017-10-27 13:31 ` [10/nn] [AArch64] Minor rtx costs tweak Richard Sandiford
2017-10-31 18:25 ` James Greenhalgh
2017-10-27 13:31 ` [11/nn] [AArch64] Set NUM_POLY_INT_COEFFS to 2 Richard Sandiford
2018-01-05 11:27 ` PING: " Richard Sandiford
2018-01-06 17:57 ` James Greenhalgh
2018-01-06 19:03 ` Richard Sandiford
2017-10-27 13:37 ` [12/nn] [AArch64] Add const_offset field to aarch64_address_info Richard Sandiford
2017-11-02 10:09 ` James Greenhalgh
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).