* [PATCH Take #2] x86_64: Expand ashrv1ti (and PR target/102986)
@ 2021-10-31 10:02 Roger Sayle
2021-11-01 7:27 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Roger Sayle @ 2021-10-31 10:02 UTC (permalink / raw)
To: 'GCC Patches'; +Cc: 'Uros Bizjak', 'Jakub Jelinek'
[-- Attachment #1: Type: text/plain, Size: 4455 bytes --]
Very many thanks to Jakub for proof-reading my patch, catching my silly
GNU-style
mistakes and making excellent suggestions. This revised patch incorporates
all of
his feedback, and has been tested on x86_64-pc-linux-gnu with make bootstrap
and
make -k check with no new failures.
2021-10-31 Roger Sayle <roger@nextmovesoftware.com>
Jakub Jelinek <jakub@redhat.com>
gcc/ChangeLog
PR target/102986
* config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
ix86_expand_ti_to_v1ti): New helper functions.
(ix86_expand_v1ti_shift): Check if the amount operand is an
integer constant, and expand as a TImode shift if it isn't.
(ix86_expand_v1ti_rotate): Check if the amount operand is an
integer constant, and expand as a TImode rotate if it isn't.
(ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
right shifts of V1TImode quantities.
* config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
* config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
to QImode general_operand, and let the helper functions lower
shifts by non-constant operands, as TImode shifts. Make
conditional on TARGET_64BIT.
(ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
(rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
Make conditional on TARGET_64BIT.
gcc/testsuite/ChangeLog
PR target/102986
* gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
* gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
* gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
Thanks.
Roger
--
-----Original Message-----
From: Jakub Jelinek <jakub@redhat.com>
Sent: 30 October 2021 11:30
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Uros Bizjak'
<ubizjak@gmail.com>
Subject: Re: [PATCH] x86_64: Expand ashrv1ti (and PR target/102986)
On Sat, Oct 30, 2021 at 11:16:41AM +0100, Roger Sayle wrote:
> 2021-10-30 Roger Sayle <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
> PR target/102986
> * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
> ix86_expand_ti_to_v1ti): New helper functions.
> (ix86_expand_v1ti_shift): Check if the amount operand is an
> integer constant, and expand as a TImode shift if it isn't.
> (ix86_expand_v1ti_rotate): Check if the amount operand is an
> integer constant, and expand as a TImode rotate if it isn't.
> (ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
> right shifts of V1TImode quantities.
> * config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
> * config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
> to QImode general_operand, and let the helper functions lower
> shifts by non-constant operands, as TImode shifts.
> (ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
> (rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
>
> gcc/testsuite/ChangeLog
> PR target/102986
> * gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
> * gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
> * gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
> * gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
> * gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
>
> Sorry again for the breakage in my last patch. I wasn't testing things
> that shouldn't have been affected/changed.
Not a review, will defer that to Uros, but just nits:
> +/* Expand move of V1TI mode register X to a new TI mode register. */
> +static rtx ix86_expand_v1ti_to_ti (rtx x)
ix86_expand_v1ti_to_ti should be at the start of next line, so static rtx
ix86_expand_v1ti_to_ti (rtx x)
Ditto for other functions and also in functions you've added by the previous
patch.
> + emit_insn (code == ASHIFT ? gen_ashlti3(tmp2, tmp1, operands[2])
> + : gen_lshrti3(tmp2, tmp1, operands[2]));
Space before ( twice.
> + emit_insn (code == ROTATE ? gen_rotlti3(tmp2, tmp1, operands[2])
> + : gen_rotrti3(tmp2, tmp1, operands[2]));
Likewise.
> + emit_insn (gen_ashrti3(tmp2, tmp1, operands[2]));
Similarly.
Also, I wonder for all these patterns (previously and now added), shouldn't
they have && TARGET_64BIT in conditions? I mean, we don't really support
scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
the constant shifts can be done, I think the variable shifts can't, there
are no TImode shift patterns...
Jakub
[-- Attachment #2: patchv4.txt --]
[-- Type: text/plain, Size: 41539 bytes --]
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 4c3800e..db967e4 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -6157,12 +6157,52 @@ ix86_split_lshr (rtx *operands, rtx scratch, machine_mode mode)
}
}
+/* Expand move of V1TI mode register X to a new TI mode register. */
+static rtx
+ix86_expand_v1ti_to_ti (rtx x)
+{
+ rtx result = gen_reg_rtx (TImode);
+ emit_move_insn (result, gen_lowpart (TImode, x));
+ return result;
+}
+
+/* Expand move of TI mode register X to a new V1TI mode register. */
+static rtx
+ix86_expand_ti_to_v1ti (rtx x)
+{
+ rtx result = gen_reg_rtx (V1TImode);
+ if (TARGET_SSE2)
+ {
+ rtx lo = gen_lowpart (DImode, x);
+ rtx hi = gen_highpart (DImode, x);
+ rtx tmp = gen_reg_rtx (V2DImode);
+ emit_insn (gen_vec_concatv2di (tmp, lo, hi));
+ emit_move_insn (result, gen_lowpart (V1TImode, tmp));
+ }
+ else
+ emit_move_insn (result, gen_lowpart (V1TImode, x));
+ return result;
+}
+
/* Expand V1TI mode shift (of rtx_code CODE) by constant. */
-void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
+void
+ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
{
- HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
rtx op1 = force_reg (V1TImode, operands[1]);
+ if (!CONST_INT_P (operands[2]))
+ {
+ rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+ rtx tmp2 = gen_reg_rtx (TImode);
+ emit_insn (code == ASHIFT ? gen_ashlti3 (tmp2, tmp1, operands[2])
+ : gen_lshrti3 (tmp2, tmp1, operands[2]));
+ rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+ emit_move_insn (operands[0], tmp3);
+ return;
+ }
+
+ HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
+
if (bits == 0)
{
emit_move_insn (operands[0], op1);
@@ -6173,7 +6213,7 @@ void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
{
rtx tmp = gen_reg_rtx (V1TImode);
if (code == ASHIFT)
- emit_insn (gen_sse2_ashlv1ti3 (tmp, op1, GEN_INT (bits)));
+ emit_insn (gen_sse2_ashlv1ti3 (tmp, op1, GEN_INT (bits)));
else
emit_insn (gen_sse2_lshrv1ti3 (tmp, op1, GEN_INT (bits)));
emit_move_insn (operands[0], tmp);
@@ -6228,11 +6268,24 @@ void ix86_expand_v1ti_shift (enum rtx_code code, rtx operands[])
}
/* Expand V1TI mode rotate (of rtx_code CODE) by constant. */
-void ix86_expand_v1ti_rotate (enum rtx_code code, rtx operands[])
+void
+ix86_expand_v1ti_rotate (enum rtx_code code, rtx operands[])
{
- HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
rtx op1 = force_reg (V1TImode, operands[1]);
+ if (!CONST_INT_P (operands[2]))
+ {
+ rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+ rtx tmp2 = gen_reg_rtx (TImode);
+ emit_insn (code == ROTATE ? gen_rotlti3 (tmp2, tmp1, operands[2])
+ : gen_rotrti3 (tmp2, tmp1, operands[2]));
+ rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+ emit_move_insn (operands[0], tmp3);
+ return;
+ }
+
+ HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
+
if (bits == 0)
{
emit_move_insn (operands[0], op1);
@@ -6320,6 +6373,469 @@ void ix86_expand_v1ti_rotate (enum rtx_code code, rtx operands[])
emit_move_insn (operands[0], tmp4);
}
+/* Expand V1TI mode ashiftrt by constant. */
+void
+ix86_expand_v1ti_ashiftrt (rtx operands[])
+{
+ rtx op1 = force_reg (V1TImode, operands[1]);
+
+ if (!CONST_INT_P (operands[2]))
+ {
+ rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+ rtx tmp2 = gen_reg_rtx (TImode);
+ emit_insn (gen_ashrti3 (tmp2, tmp1, operands[2]));
+ rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+ emit_move_insn (operands[0], tmp3);
+ return;
+ }
+
+ HOST_WIDE_INT bits = INTVAL (operands[2]) & 127;
+
+ if (bits == 0)
+ {
+ emit_move_insn (operands[0], op1);
+ return;
+ }
+
+ if (bits == 127)
+ {
+ /* Two operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp4, gen_lowpart (V1TImode, tmp3));
+ emit_move_insn (operands[0], tmp4);
+ return;
+ }
+
+ if (bits == 64)
+ {
+ /* Three operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V2DImode);
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp1));
+ emit_move_insn (tmp5, gen_lowpart (V2DImode, tmp3));
+ emit_insn (gen_vec_interleave_highv2di (tmp6, tmp4, tmp5));
+
+ rtx tmp7 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+ emit_move_insn (operands[0], tmp7);
+ return;
+ }
+
+ if (bits == 96)
+ {
+ /* Three operations. */
+ rtx tmp3 = gen_reg_rtx (V2DImode);
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V2DImode);
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp3, gen_lowpart (V2DImode, tmp1));
+ emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp2));
+ emit_insn (gen_vec_interleave_highv2di (tmp5, tmp3, tmp4));
+
+ rtx tmp6 = gen_reg_rtx (V4SImode);
+ rtx tmp7 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp6, gen_lowpart (V4SImode, tmp5));
+ emit_insn (gen_sse2_pshufd (tmp7, tmp6, GEN_INT (0xfd)));
+
+ rtx tmp8 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp8, gen_lowpart (V1TImode, tmp7));
+ emit_move_insn (operands[0], tmp8);
+ return;
+ }
+
+ if (TARGET_AVX2 || TARGET_SSE4_1)
+ {
+ /* Three operations. */
+ if (bits == 32)
+ {
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (31)));
+
+ rtx tmp3 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (32)));
+
+ if (TARGET_AVX2)
+ {
+ rtx tmp4 = gen_reg_rtx (V4SImode);
+ rtx tmp5 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp4, gen_lowpart (V4SImode, tmp3));
+ emit_insn (gen_avx2_pblenddv4si (tmp5, tmp2, tmp4,
+ GEN_INT (7)));
+
+ rtx tmp6 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp6, gen_lowpart (V1TImode, tmp5));
+ emit_move_insn (operands[0], tmp6);
+ }
+ else
+ {
+ rtx tmp4 = gen_reg_rtx (V8HImode);
+ rtx tmp5 = gen_reg_rtx (V8HImode);
+ rtx tmp6 = gen_reg_rtx (V8HImode);
+ emit_move_insn (tmp4, gen_lowpart (V8HImode, tmp2));
+ emit_move_insn (tmp5, gen_lowpart (V8HImode, tmp3));
+ emit_insn (gen_sse4_1_pblendw (tmp6, tmp4, tmp5,
+ GEN_INT (0x3f)));
+
+ rtx tmp7 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+ emit_move_insn (operands[0], tmp7);
+ }
+ return;
+ }
+
+ /* Three operations. */
+ if (bits == 8 || bits == 16 || bits == 24)
+ {
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits)));
+
+ rtx tmp3 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (bits)));
+
+ if (TARGET_AVX2)
+ {
+ rtx tmp4 = gen_reg_rtx (V4SImode);
+ rtx tmp5 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp4, gen_lowpart (V4SImode, tmp3));
+ emit_insn (gen_avx2_pblenddv4si (tmp5, tmp2, tmp4,
+ GEN_INT (7)));
+
+ rtx tmp6 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp6, gen_lowpart (V1TImode, tmp5));
+ emit_move_insn (operands[0], tmp6);
+ }
+ else
+ {
+ rtx tmp4 = gen_reg_rtx (V8HImode);
+ rtx tmp5 = gen_reg_rtx (V8HImode);
+ rtx tmp6 = gen_reg_rtx (V8HImode);
+ emit_move_insn (tmp4, gen_lowpart (V8HImode, tmp2));
+ emit_move_insn (tmp5, gen_lowpart (V8HImode, tmp3));
+ emit_insn (gen_sse4_1_pblendw (tmp6, tmp4, tmp5,
+ GEN_INT (0x3f)));
+
+ rtx tmp7 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp6));
+ emit_move_insn (operands[0], tmp7);
+ }
+ return;
+ }
+ }
+
+ if (bits > 96)
+ {
+ /* Four operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits - 96)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp1, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V2DImode);
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp2));
+ emit_move_insn (tmp5, gen_lowpart (V2DImode, tmp3));
+ emit_insn (gen_vec_interleave_highv2di (tmp6, tmp4, tmp5));
+
+ rtx tmp7 = gen_reg_rtx (V4SImode);
+ rtx tmp8 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp7, gen_lowpart (V4SImode, tmp6));
+ emit_insn (gen_sse2_pshufd (tmp8, tmp7, GEN_INT (0xfd)));
+
+ rtx tmp9 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp9, gen_lowpart (V1TImode, tmp8));
+ emit_move_insn (operands[0], tmp9);
+ return;
+ }
+
+ if (TARGET_SSE4_1 && (bits == 48 || bits == 80))
+ {
+ /* Four operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (bits)));
+
+ rtx tmp5 = gen_reg_rtx (V8HImode);
+ rtx tmp6 = gen_reg_rtx (V8HImode);
+ rtx tmp7 = gen_reg_rtx (V8HImode);
+ emit_move_insn (tmp5, gen_lowpart (V8HImode, tmp3));
+ emit_move_insn (tmp6, gen_lowpart (V8HImode, tmp4));
+ emit_insn (gen_sse4_1_pblendw (tmp7, tmp5, tmp6,
+ GEN_INT (bits == 48 ? 0x1f : 0x07)));
+
+ rtx tmp8 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp8, gen_lowpart (V1TImode, tmp7));
+ emit_move_insn (operands[0], tmp8);
+ return;
+ }
+
+ if ((bits & 7) == 0)
+ {
+ /* Five operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (bits)));
+
+ rtx tmp5 = gen_reg_rtx (V1TImode);
+ rtx tmp6 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp5, gen_lowpart (V1TImode, tmp3));
+ emit_insn (gen_sse2_ashlv1ti3 (tmp6, tmp5, GEN_INT (128 - bits)));
+
+ rtx tmp7 = gen_reg_rtx (V2DImode);
+ rtx tmp8 = gen_reg_rtx (V2DImode);
+ rtx tmp9 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp7, gen_lowpart (V2DImode, tmp4));
+ emit_move_insn (tmp8, gen_lowpart (V2DImode, tmp6));
+ emit_insn (gen_iorv2di3 (tmp9, tmp7, tmp8));
+
+ rtx tmp10 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp10, gen_lowpart (V1TImode, tmp9));
+ emit_move_insn (operands[0], tmp10);
+ return;
+ }
+
+ if (TARGET_AVX2 && bits < 32)
+ {
+ /* Six operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits)));
+
+ rtx tmp3 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (64)));
+
+ rtx tmp4 = gen_reg_rtx (V2DImode);
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp4, gen_lowpart (V2DImode, op1));
+ emit_insn (gen_lshrv2di3 (tmp5, tmp4, GEN_INT (bits)));
+
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ rtx tmp7 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp6, gen_lowpart (V2DImode, tmp3));
+ emit_insn (gen_ashlv2di3 (tmp7, tmp6, GEN_INT (64 - bits)));
+
+ rtx tmp8 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp8, tmp5, tmp7));
+
+ rtx tmp9 = gen_reg_rtx (V4SImode);
+ rtx tmp10 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp9, gen_lowpart (V4SImode, tmp8));
+ emit_insn (gen_avx2_pblenddv4si (tmp10, tmp2, tmp9, GEN_INT (7)));
+
+ rtx tmp11 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp11, gen_lowpart (V1TImode, tmp10));
+ emit_move_insn (operands[0], tmp11);
+ return;
+ }
+
+ if (TARGET_SSE4_1 && bits < 15)
+ {
+ /* Six operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_ashrv4si3 (tmp2, tmp1, GEN_INT (bits)));
+
+ rtx tmp3 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp3, op1, GEN_INT (64)));
+
+ rtx tmp4 = gen_reg_rtx (V2DImode);
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp4, gen_lowpart (V2DImode, op1));
+ emit_insn (gen_lshrv2di3 (tmp5, tmp4, GEN_INT (bits)));
+
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ rtx tmp7 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp6, gen_lowpart (V2DImode, tmp3));
+ emit_insn (gen_ashlv2di3 (tmp7, tmp6, GEN_INT (64 - bits)));
+
+ rtx tmp8 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp8, tmp5, tmp7));
+
+ rtx tmp9 = gen_reg_rtx (V8HImode);
+ rtx tmp10 = gen_reg_rtx (V8HImode);
+ rtx tmp11 = gen_reg_rtx (V8HImode);
+ emit_move_insn (tmp9, gen_lowpart (V8HImode, tmp2));
+ emit_move_insn (tmp10, gen_lowpart (V8HImode, tmp8));
+ emit_insn (gen_sse4_1_pblendw (tmp11, tmp9, tmp10, GEN_INT (0x3f)));
+
+ rtx tmp12 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp12, gen_lowpart (V1TImode, tmp11));
+ emit_move_insn (operands[0], tmp12);
+ return;
+ }
+
+ if (bits == 1)
+ {
+ /* Eight operations. */
+ rtx tmp1 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp1, op1, GEN_INT (64)));
+
+ rtx tmp2 = gen_reg_rtx (V2DImode);
+ rtx tmp3 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp2, gen_lowpart (V2DImode, op1));
+ emit_insn (gen_lshrv2di3 (tmp3, tmp2, GEN_INT (1)));
+
+ rtx tmp4 = gen_reg_rtx (V2DImode);
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp4, gen_lowpart (V2DImode, tmp1));
+ emit_insn (gen_ashlv2di3 (tmp5, tmp4, GEN_INT (63)));
+
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp6, tmp3, tmp5));
+
+ rtx tmp7 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_lshrv2di3 (tmp7, tmp2, GEN_INT (63)));
+
+ rtx tmp8 = gen_reg_rtx (V4SImode);
+ rtx tmp9 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp8, gen_lowpart (V4SImode, tmp7));
+ emit_insn (gen_sse2_pshufd (tmp9, tmp8, GEN_INT (0xbf)));
+
+ rtx tmp10 = gen_reg_rtx (V2DImode);
+ rtx tmp11 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp10, gen_lowpart (V2DImode, tmp9));
+ emit_insn (gen_ashlv2di3 (tmp11, tmp10, GEN_INT (31)));
+
+ rtx tmp12 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp12, tmp6, tmp11));
+
+ rtx tmp13 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp13, gen_lowpart (V1TImode, tmp12));
+ emit_move_insn (operands[0], tmp13);
+ return;
+ }
+
+ if (bits > 64)
+ {
+ /* Eight operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (64)));
+
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp5, gen_lowpart (V2DImode, tmp4));
+ emit_insn (gen_lshrv2di3 (tmp6, tmp5, GEN_INT (bits - 64)));
+
+ rtx tmp7 = gen_reg_rtx (V1TImode);
+ rtx tmp8 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp7, gen_lowpart (V1TImode, tmp3));
+ emit_insn (gen_sse2_ashlv1ti3 (tmp8, tmp7, GEN_INT (64)));
+
+ rtx tmp9 = gen_reg_rtx (V2DImode);
+ rtx tmp10 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp9, gen_lowpart (V2DImode, tmp3));
+ emit_insn (gen_ashlv2di3 (tmp10, tmp9, GEN_INT (128 - bits)));
+
+ rtx tmp11 = gen_reg_rtx (V2DImode);
+ rtx tmp12 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp11, gen_lowpart (V2DImode, tmp8));
+ emit_insn (gen_iorv2di3 (tmp12, tmp10, tmp11));
+
+ rtx tmp13 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp13, tmp6, tmp12));
+
+ rtx tmp14 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp14, gen_lowpart (V1TImode, tmp13));
+ emit_move_insn (operands[0], tmp14);
+ }
+ else
+ {
+ /* Nine operations. */
+ rtx tmp1 = gen_reg_rtx (V4SImode);
+ rtx tmp2 = gen_reg_rtx (V4SImode);
+ emit_move_insn (tmp1, gen_lowpart (V4SImode, op1));
+ emit_insn (gen_sse2_pshufd (tmp2, tmp1, GEN_INT (0xff)));
+
+ rtx tmp3 = gen_reg_rtx (V4SImode);
+ emit_insn (gen_ashrv4si3 (tmp3, tmp2, GEN_INT (31)));
+
+ rtx tmp4 = gen_reg_rtx (V1TImode);
+ emit_insn (gen_sse2_lshrv1ti3 (tmp4, op1, GEN_INT (64)));
+
+ rtx tmp5 = gen_reg_rtx (V2DImode);
+ rtx tmp6 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp5, gen_lowpart (V2DImode, op1));
+ emit_insn (gen_lshrv2di3 (tmp6, tmp5, GEN_INT (bits)));
+
+ rtx tmp7 = gen_reg_rtx (V2DImode);
+ rtx tmp8 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp7, gen_lowpart (V2DImode, tmp4));
+ emit_insn (gen_ashlv2di3 (tmp8, tmp7, GEN_INT (64 - bits)));
+
+ rtx tmp9 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp9, tmp6, tmp8));
+
+ rtx tmp10 = gen_reg_rtx (V1TImode);
+ rtx tmp11 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp10, gen_lowpart (V1TImode, tmp3));
+ emit_insn (gen_sse2_ashlv1ti3 (tmp11, tmp10, GEN_INT (64)));
+
+ rtx tmp12 = gen_reg_rtx (V2DImode);
+ rtx tmp13 = gen_reg_rtx (V2DImode);
+ emit_move_insn (tmp12, gen_lowpart (V2DImode, tmp11));
+ emit_insn (gen_ashlv2di3 (tmp13, tmp12, GEN_INT (64 - bits)));
+
+ rtx tmp14 = gen_reg_rtx (V2DImode);
+ emit_insn (gen_iorv2di3 (tmp14, tmp9, tmp13));
+
+ rtx tmp15 = gen_reg_rtx (V1TImode);
+ emit_move_insn (tmp15, gen_lowpart (V1TImode, tmp14));
+ emit_move_insn (operands[0], tmp15);
+ }
+}
+
/* Return mode for the memcpy/memset loop counter. Prefer SImode over
DImode for constant loop counts. */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index 9918a28..bd52450 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -161,6 +161,7 @@ extern void ix86_split_ashr (rtx *, rtx, machine_mode);
extern void ix86_split_lshr (rtx *, rtx, machine_mode);
extern void ix86_expand_v1ti_shift (enum rtx_code, rtx[]);
extern void ix86_expand_v1ti_rotate (enum rtx_code, rtx[]);
+extern void ix86_expand_v1ti_ashiftrt (rtx[]);
extern rtx ix86_find_base_term (rtx);
extern bool ix86_check_movabs (rtx, int);
extern bool ix86_check_no_addr_space (rtx);
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index bdc6067..3307c1b 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -15079,8 +15079,8 @@
[(set (match_operand:V1TI 0 "register_operand")
(ashift:V1TI
(match_operand:V1TI 1 "register_operand")
- (match_operand:SI 2 "const_int_operand")))]
- "TARGET_SSE2"
+ (match_operand:QI 2 "general_operand")))]
+ "TARGET_SSE2 && TARGET_64BIT"
{
ix86_expand_v1ti_shift (ASHIFT, operands);
DONE;
@@ -15090,19 +15090,30 @@
[(set (match_operand:V1TI 0 "register_operand")
(lshiftrt:V1TI
(match_operand:V1TI 1 "register_operand")
- (match_operand:SI 2 "const_int_operand")))]
- "TARGET_SSE2"
+ (match_operand:QI 2 "general_operand")))]
+ "TARGET_SSE2 && TARGET_64BIT"
{
ix86_expand_v1ti_shift (LSHIFTRT, operands);
DONE;
})
+(define_expand "ashrv1ti3"
+ [(set (match_operand:V1TI 0 "register_operand")
+ (ashiftrt:V1TI
+ (match_operand:V1TI 1 "register_operand")
+ (match_operand:QI 2 "general_operand")))]
+ "TARGET_SSE2 && TARGET_64BIT"
+{
+ ix86_expand_v1ti_ashiftrt (operands);
+ DONE;
+})
+
(define_expand "rotlv1ti3"
[(set (match_operand:V1TI 0 "register_operand")
(rotate:V1TI
(match_operand:V1TI 1 "register_operand")
- (match_operand:SI 2 "const_int_operand")))]
- "TARGET_SSE2"
+ (match_operand:QI 2 "const_int_operand")))]
+ "TARGET_SSE2 && TARGET_64BIT"
{
ix86_expand_v1ti_rotate (ROTATE, operands);
DONE;
@@ -15112,8 +15123,8 @@
[(set (match_operand:V1TI 0 "register_operand")
(rotatert:V1TI
(match_operand:V1TI 1 "register_operand")
- (match_operand:SI 2 "const_int_operand")))]
- "TARGET_SSE2"
+ (match_operand:QI 2 "const_int_operand")))]
+ "TARGET_SSE2 && TARGET_64BIT"
{
ix86_expand_v1ti_rotate (ROTATERT, operands);
DONE;
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-1.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-1.c
new file mode 100644
index 0000000..05869bf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-1.c
@@ -0,0 +1,167 @@
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 ti;
+
+ti ashr(ti x, unsigned int i) { return x >> i; }
+
+v1ti ashr_1(v1ti x) { return x >> 1; }
+v1ti ashr_2(v1ti x) { return x >> 2; }
+v1ti ashr_7(v1ti x) { return x >> 7; }
+v1ti ashr_8(v1ti x) { return x >> 8; }
+v1ti ashr_9(v1ti x) { return x >> 9; }
+v1ti ashr_15(v1ti x) { return x >> 15; }
+v1ti ashr_16(v1ti x) { return x >> 16; }
+v1ti ashr_17(v1ti x) { return x >> 17; }
+v1ti ashr_23(v1ti x) { return x >> 23; }
+v1ti ashr_24(v1ti x) { return x >> 24; }
+v1ti ashr_25(v1ti x) { return x >> 25; }
+v1ti ashr_31(v1ti x) { return x >> 31; }
+v1ti ashr_32(v1ti x) { return x >> 32; }
+v1ti ashr_33(v1ti x) { return x >> 33; }
+v1ti ashr_47(v1ti x) { return x >> 47; }
+v1ti ashr_48(v1ti x) { return x >> 48; }
+v1ti ashr_49(v1ti x) { return x >> 49; }
+v1ti ashr_63(v1ti x) { return x >> 63; }
+v1ti ashr_64(v1ti x) { return x >> 64; }
+v1ti ashr_65(v1ti x) { return x >> 65; }
+v1ti ashr_72(v1ti x) { return x >> 72; }
+v1ti ashr_79(v1ti x) { return x >> 79; }
+v1ti ashr_80(v1ti x) { return x >> 80; }
+v1ti ashr_81(v1ti x) { return x >> 81; }
+v1ti ashr_95(v1ti x) { return x >> 95; }
+v1ti ashr_96(v1ti x) { return x >> 96; }
+v1ti ashr_97(v1ti x) { return x >> 97; }
+v1ti ashr_111(v1ti x) { return x >> 111; }
+v1ti ashr_112(v1ti x) { return x >> 112; }
+v1ti ashr_113(v1ti x) { return x >> 113; }
+v1ti ashr_119(v1ti x) { return x >> 119; }
+v1ti ashr_120(v1ti x) { return x >> 120; }
+v1ti ashr_121(v1ti x) { return x >> 121; }
+v1ti ashr_126(v1ti x) { return x >> 126; }
+v1ti ashr_127(v1ti x) { return x >> 127; }
+
+typedef v1ti (*fun)(v1ti);
+
+struct {
+ unsigned int i;
+ fun ashr;
+} table[35] = {
+ { 1, ashr_1 },
+ { 2, ashr_2 },
+ { 7, ashr_7 },
+ { 8, ashr_8 },
+ { 9, ashr_9 },
+ { 15, ashr_15 },
+ { 16, ashr_16 },
+ { 17, ashr_17 },
+ { 23, ashr_23 },
+ { 24, ashr_24 },
+ { 25, ashr_25 },
+ { 31, ashr_31 },
+ { 32, ashr_32 },
+ { 33, ashr_33 },
+ { 47, ashr_47 },
+ { 48, ashr_48 },
+ { 49, ashr_49 },
+ { 63, ashr_63 },
+ { 64, ashr_64 },
+ { 65, ashr_65 },
+ { 72, ashr_72 },
+ { 79, ashr_79 },
+ { 80, ashr_80 },
+ { 81, ashr_81 },
+ { 95, ashr_95 },
+ { 96, ashr_96 },
+ { 97, ashr_97 },
+ { 111, ashr_111 },
+ { 112, ashr_112 },
+ { 113, ashr_113 },
+ { 119, ashr_119 },
+ { 120, ashr_120 },
+ { 121, ashr_121 },
+ { 126, ashr_126 },
+ { 127, ashr_127 }
+};
+
+void test(ti x)
+{
+ unsigned int i;
+ v1ti t = (v1ti)x;
+
+ for (i=0; i<(sizeof(table)/sizeof(table[0])); i++) {
+ if ((ti)(*table[i].ashr)(t) != ashr(x,table[i].i))
+ __builtin_abort();
+ }
+}
+
+int main()
+{
+ ti x;
+
+ x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+ test(x);
+ x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = 0;
+ test(x);
+ x = 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+ test(x);
+ x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+ test(x);
+ x = 0xffull;
+ test(x);
+ x = 0xff00ull;
+ test(x);
+ x = 0xff0000ull;
+ test(x);
+ x = 0xff000000ull;
+ test(x);
+ x = 0xff00000000ull;
+ test(x);
+ x = 0xff0000000000ull;
+ test(x);
+ x = 0xff000000000000ull;
+ test(x);
+ x = 0xff00000000000000ull;
+ test(x);
+ x = ((ti)0xffull)<<64;
+ test(x);
+ x = ((ti)0xff00ull)<<64;
+ test(x);
+ x = ((ti)0xff0000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000ull)<<64;
+ test(x);
+ x = ((ti)0xff0000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000000000ull)<<64;
+ test(x);
+ x = 0xdeadbeefcafebabeull;
+ test(x);
+ x = ((ti)0xdeadbeefcafebabeull)<<64;
+ test(x);
+
+ return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-2.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-2.c
new file mode 100644
index 0000000..b3d0aa3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-2.c
@@ -0,0 +1,166 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2 -mavx2 " } */
+
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 ti;
+
+ti ashr(ti x, unsigned int i) { return x >> i; }
+
+v1ti ashr_1(v1ti x) { return x >> 1; }
+v1ti ashr_2(v1ti x) { return x >> 2; }
+v1ti ashr_7(v1ti x) { return x >> 7; }
+v1ti ashr_8(v1ti x) { return x >> 8; }
+v1ti ashr_9(v1ti x) { return x >> 9; }
+v1ti ashr_15(v1ti x) { return x >> 15; }
+v1ti ashr_16(v1ti x) { return x >> 16; }
+v1ti ashr_17(v1ti x) { return x >> 17; }
+v1ti ashr_23(v1ti x) { return x >> 23; }
+v1ti ashr_24(v1ti x) { return x >> 24; }
+v1ti ashr_25(v1ti x) { return x >> 25; }
+v1ti ashr_31(v1ti x) { return x >> 31; }
+v1ti ashr_32(v1ti x) { return x >> 32; }
+v1ti ashr_33(v1ti x) { return x >> 33; }
+v1ti ashr_47(v1ti x) { return x >> 47; }
+v1ti ashr_48(v1ti x) { return x >> 48; }
+v1ti ashr_49(v1ti x) { return x >> 49; }
+v1ti ashr_63(v1ti x) { return x >> 63; }
+v1ti ashr_64(v1ti x) { return x >> 64; }
+v1ti ashr_65(v1ti x) { return x >> 65; }
+v1ti ashr_72(v1ti x) { return x >> 72; }
+v1ti ashr_79(v1ti x) { return x >> 79; }
+v1ti ashr_80(v1ti x) { return x >> 80; }
+v1ti ashr_81(v1ti x) { return x >> 81; }
+v1ti ashr_95(v1ti x) { return x >> 95; }
+v1ti ashr_96(v1ti x) { return x >> 96; }
+v1ti ashr_97(v1ti x) { return x >> 97; }
+v1ti ashr_111(v1ti x) { return x >> 111; }
+v1ti ashr_112(v1ti x) { return x >> 112; }
+v1ti ashr_113(v1ti x) { return x >> 113; }
+v1ti ashr_119(v1ti x) { return x >> 119; }
+v1ti ashr_120(v1ti x) { return x >> 120; }
+v1ti ashr_121(v1ti x) { return x >> 121; }
+v1ti ashr_126(v1ti x) { return x >> 126; }
+v1ti ashr_127(v1ti x) { return x >> 127; }
+
+typedef v1ti (*fun)(v1ti);
+
+struct {
+ unsigned int i;
+ fun ashr;
+} table[35] = {
+ { 1, ashr_1 },
+ { 2, ashr_2 },
+ { 7, ashr_7 },
+ { 8, ashr_8 },
+ { 9, ashr_9 },
+ { 15, ashr_15 },
+ { 16, ashr_16 },
+ { 17, ashr_17 },
+ { 23, ashr_23 },
+ { 24, ashr_24 },
+ { 25, ashr_25 },
+ { 31, ashr_31 },
+ { 32, ashr_32 },
+ { 33, ashr_33 },
+ { 47, ashr_47 },
+ { 48, ashr_48 },
+ { 49, ashr_49 },
+ { 63, ashr_63 },
+ { 64, ashr_64 },
+ { 65, ashr_65 },
+ { 72, ashr_72 },
+ { 79, ashr_79 },
+ { 80, ashr_80 },
+ { 81, ashr_81 },
+ { 95, ashr_95 },
+ { 96, ashr_96 },
+ { 97, ashr_97 },
+ { 111, ashr_111 },
+ { 112, ashr_112 },
+ { 113, ashr_113 },
+ { 119, ashr_119 },
+ { 120, ashr_120 },
+ { 121, ashr_121 },
+ { 126, ashr_126 },
+ { 127, ashr_127 }
+};
+
+void test(ti x)
+{
+ unsigned int i;
+ v1ti t = (v1ti)x;
+
+ for (i=0; i<(sizeof(table)/sizeof(table[0])); i++) {
+ if ((ti)(*table[i].ashr)(t) != ashr(x,table[i].i))
+ __builtin_abort();
+ }
+}
+
+int main()
+{
+ ti x;
+
+ x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+ test(x);
+ x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = 0;
+ test(x);
+ x = 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+ test(x);
+ x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+ test(x);
+ x = 0xffull;
+ test(x);
+ x = 0xff00ull;
+ test(x);
+ x = 0xff0000ull;
+ test(x);
+ x = 0xff000000ull;
+ test(x);
+ x = 0xff00000000ull;
+ test(x);
+ x = 0xff0000000000ull;
+ test(x);
+ x = 0xff000000000000ull;
+ test(x);
+ x = 0xff00000000000000ull;
+ test(x);
+ x = ((ti)0xffull)<<64;
+ test(x);
+ x = ((ti)0xff00ull)<<64;
+ test(x);
+ x = ((ti)0xff0000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000ull)<<64;
+ test(x);
+ x = ((ti)0xff0000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000000000ull)<<64;
+ test(x);
+ x = 0xdeadbeefcafebabeull;
+ test(x);
+ x = ((ti)0xdeadbeefcafebabeull)<<64;
+ test(x);
+
+ return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-3.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-3.c
new file mode 100644
index 0000000..61d4f4c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-ashiftrt-3.c
@@ -0,0 +1,166 @@
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2 -msse4.1" } */
+
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 ti;
+
+ti ashr(ti x, unsigned int i) { return x >> i; }
+
+v1ti ashr_1(v1ti x) { return x >> 1; }
+v1ti ashr_2(v1ti x) { return x >> 2; }
+v1ti ashr_7(v1ti x) { return x >> 7; }
+v1ti ashr_8(v1ti x) { return x >> 8; }
+v1ti ashr_9(v1ti x) { return x >> 9; }
+v1ti ashr_15(v1ti x) { return x >> 15; }
+v1ti ashr_16(v1ti x) { return x >> 16; }
+v1ti ashr_17(v1ti x) { return x >> 17; }
+v1ti ashr_23(v1ti x) { return x >> 23; }
+v1ti ashr_24(v1ti x) { return x >> 24; }
+v1ti ashr_25(v1ti x) { return x >> 25; }
+v1ti ashr_31(v1ti x) { return x >> 31; }
+v1ti ashr_32(v1ti x) { return x >> 32; }
+v1ti ashr_33(v1ti x) { return x >> 33; }
+v1ti ashr_47(v1ti x) { return x >> 47; }
+v1ti ashr_48(v1ti x) { return x >> 48; }
+v1ti ashr_49(v1ti x) { return x >> 49; }
+v1ti ashr_63(v1ti x) { return x >> 63; }
+v1ti ashr_64(v1ti x) { return x >> 64; }
+v1ti ashr_65(v1ti x) { return x >> 65; }
+v1ti ashr_72(v1ti x) { return x >> 72; }
+v1ti ashr_79(v1ti x) { return x >> 79; }
+v1ti ashr_80(v1ti x) { return x >> 80; }
+v1ti ashr_81(v1ti x) { return x >> 81; }
+v1ti ashr_95(v1ti x) { return x >> 95; }
+v1ti ashr_96(v1ti x) { return x >> 96; }
+v1ti ashr_97(v1ti x) { return x >> 97; }
+v1ti ashr_111(v1ti x) { return x >> 111; }
+v1ti ashr_112(v1ti x) { return x >> 112; }
+v1ti ashr_113(v1ti x) { return x >> 113; }
+v1ti ashr_119(v1ti x) { return x >> 119; }
+v1ti ashr_120(v1ti x) { return x >> 120; }
+v1ti ashr_121(v1ti x) { return x >> 121; }
+v1ti ashr_126(v1ti x) { return x >> 126; }
+v1ti ashr_127(v1ti x) { return x >> 127; }
+
+typedef v1ti (*fun)(v1ti);
+
+struct {
+ unsigned int i;
+ fun ashr;
+} table[35] = {
+ { 1, ashr_1 },
+ { 2, ashr_2 },
+ { 7, ashr_7 },
+ { 8, ashr_8 },
+ { 9, ashr_9 },
+ { 15, ashr_15 },
+ { 16, ashr_16 },
+ { 17, ashr_17 },
+ { 23, ashr_23 },
+ { 24, ashr_24 },
+ { 25, ashr_25 },
+ { 31, ashr_31 },
+ { 32, ashr_32 },
+ { 33, ashr_33 },
+ { 47, ashr_47 },
+ { 48, ashr_48 },
+ { 49, ashr_49 },
+ { 63, ashr_63 },
+ { 64, ashr_64 },
+ { 65, ashr_65 },
+ { 72, ashr_72 },
+ { 79, ashr_79 },
+ { 80, ashr_80 },
+ { 81, ashr_81 },
+ { 95, ashr_95 },
+ { 96, ashr_96 },
+ { 97, ashr_97 },
+ { 111, ashr_111 },
+ { 112, ashr_112 },
+ { 113, ashr_113 },
+ { 119, ashr_119 },
+ { 120, ashr_120 },
+ { 121, ashr_121 },
+ { 126, ashr_126 },
+ { 127, ashr_127 }
+};
+
+void test(ti x)
+{
+ unsigned int i;
+ v1ti t = (v1ti)x;
+
+ for (i=0; i<(sizeof(table)/sizeof(table[0])); i++) {
+ if ((ti)(*table[i].ashr)(t) != ashr(x,table[i].i))
+ __builtin_abort();
+ }
+}
+
+int main()
+{
+ ti x;
+
+ x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+ test(x);
+ x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = 0;
+ test(x);
+ x = 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+ test(x);
+ x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+ test(x);
+ x = 0xffull;
+ test(x);
+ x = 0xff00ull;
+ test(x);
+ x = 0xff0000ull;
+ test(x);
+ x = 0xff000000ull;
+ test(x);
+ x = 0xff00000000ull;
+ test(x);
+ x = 0xff0000000000ull;
+ test(x);
+ x = 0xff000000000000ull;
+ test(x);
+ x = 0xff00000000000000ull;
+ test(x);
+ x = ((ti)0xffull)<<64;
+ test(x);
+ x = ((ti)0xff00ull)<<64;
+ test(x);
+ x = ((ti)0xff0000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000ull)<<64;
+ test(x);
+ x = ((ti)0xff0000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000000000ull)<<64;
+ test(x);
+ x = 0xdeadbeefcafebabeull;
+ test(x);
+ x = ((ti)0xdeadbeefcafebabeull)<<64;
+ test(x);
+
+ return 0;
+}
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-2.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-2.c
new file mode 100644
index 0000000..18da2ef
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-2.c
@@ -0,0 +1,13 @@
+/* PR target/102986 */
+/* { dg-do compile { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 sv1ti __attribute__ ((__vector_size__ (16)));
+
+uv1ti ashl(uv1ti x, unsigned int i) { return x << i; }
+uv1ti lshr(uv1ti x, unsigned int i) { return x >> i; }
+sv1ti ashr(sv1ti x, unsigned int i) { return x >> i; }
+uv1ti rotr(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uv1ti rotl(uv1ti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
diff --git a/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-3.c b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-3.c
new file mode 100644
index 0000000..8d5c122
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/sse2-v1ti-shift-3.c
@@ -0,0 +1,113 @@
+/* PR target/102986 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -msse2" } */
+/* { dg-require-effective-target sse2 } */
+
+typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 sv1ti __attribute__ ((__vector_size__ (16)));
+typedef __int128 v1ti __attribute__ ((__vector_size__ (16)));
+
+typedef unsigned __int128 uti;
+typedef __int128 sti;
+typedef __int128 ti;
+
+uv1ti ashl_v1ti(uv1ti x, unsigned int i) { return x << i; }
+uv1ti lshr_v1ti(uv1ti x, unsigned int i) { return x >> i; }
+sv1ti ashr_v1ti(sv1ti x, unsigned int i) { return x >> i; }
+uv1ti rotr_v1ti(uv1ti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uv1ti rotl_v1ti(uv1ti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
+uti ashl_ti(uti x, unsigned int i) { return x << i; }
+uti lshr_ti(uti x, unsigned int i) { return x >> i; }
+sti ashr_ti(sti x, unsigned int i) { return x >> i; }
+uti rotr_ti(uti x, unsigned int i) { return (x >> i) | (x << (128-i)); }
+uti rotl_ti(uti x, unsigned int i) { return (x << i) | (x >> (128-i)); }
+
+void test(ti x)
+{
+ unsigned int i;
+ uv1ti ut = (uv1ti)x;
+ sv1ti st = (sv1ti)x;
+
+ for (i=0; i<128; i++) {
+ if ((ti)ashl_v1ti(ut,i) != (ti)ashl_ti(x,i))
+ __builtin_abort();
+ if ((ti)lshr_v1ti(ut,i) != (ti)lshr_ti(x,i))
+ __builtin_abort();
+ if ((ti)ashr_v1ti(st,i) != (ti)ashr_ti(x,i))
+ __builtin_abort();
+ if ((ti)rotr_v1ti(ut,i) != (ti)rotr_ti(x,i))
+ __builtin_abort();
+ if ((ti)rotl_v1ti(ut,i) != (ti)rotl_ti(x,i))
+ __builtin_abort();
+ }
+}
+
+int main()
+{
+ ti x;
+
+ x = ((ti)0x0011223344556677ull)<<64 | 0x8899aabbccddeeffull;
+ test(x);
+ x = ((ti)0xffeeddccbbaa9988ull)<<64 | 0x7766554433221100ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0x0123456789abcdefull)<<64 | 0xfedcba9876543210ull;
+ test(x);
+ x = ((ti)0xfedcba9876543210ull)<<64 | 0x0123456789abcdefull;
+ test(x);
+ x = 0;
+ test(x);
+ x = 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64;
+ test(x);
+ x = ((ti)0xffffffffffffffffull)<<64 | 0xffffffffffffffffull;
+ test(x);
+ x = ((ti)0x5a5a5a5a5a5a5a5aull)<<64 | 0x5a5a5a5a5a5a5a5aull;
+ test(x);
+ x = ((ti)0xa5a5a5a5a5a5a5a5ull)<<64 | 0xa5a5a5a5a5a5a5a5ull;
+ test(x);
+ x = 0xffull;
+ test(x);
+ x = 0xff00ull;
+ test(x);
+ x = 0xff0000ull;
+ test(x);
+ x = 0xff000000ull;
+ test(x);
+ x = 0xff00000000ull;
+ test(x);
+ x = 0xff0000000000ull;
+ test(x);
+ x = 0xff000000000000ull;
+ test(x);
+ x = 0xff00000000000000ull;
+ test(x);
+ x = ((ti)0xffull)<<64;
+ test(x);
+ x = ((ti)0xff00ull)<<64;
+ test(x);
+ x = ((ti)0xff0000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000ull)<<64;
+ test(x);
+ x = ((ti)0xff0000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff000000000000ull)<<64;
+ test(x);
+ x = ((ti)0xff00000000000000ull)<<64;
+ test(x);
+ x = 0xdeadbeefcafebabeull;
+ test(x);
+ x = ((ti)0xdeadbeefcafebabeull)<<64;
+ test(x);
+
+ return 0;
+}
+
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH Take #2] x86_64: Expand ashrv1ti (and PR target/102986)
2021-10-31 10:02 [PATCH Take #2] x86_64: Expand ashrv1ti (and PR target/102986) Roger Sayle
@ 2021-11-01 7:27 ` Uros Bizjak
2021-11-01 8:43 ` Jakub Jelinek
0 siblings, 1 reply; 4+ messages in thread
From: Uros Bizjak @ 2021-11-01 7:27 UTC (permalink / raw)
To: Roger Sayle; +Cc: GCC Patches, Jakub Jelinek
On Sun, Oct 31, 2021 at 11:02 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> Very many thanks to Jakub for proof-reading my patch, catching my silly
> GNU-style
> mistakes and making excellent suggestions. This revised patch incorporates
> all of
> his feedback, and has been tested on x86_64-pc-linux-gnu with make bootstrap
> and
> make -k check with no new failures.
>
> 2021-10-31 Roger Sayle <roger@nextmovesoftware.com>
> Jakub Jelinek <jakub@redhat.com>
>
> gcc/ChangeLog
> PR target/102986
> * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
> ix86_expand_ti_to_v1ti): New helper functions.
> (ix86_expand_v1ti_shift): Check if the amount operand is an
> integer constant, and expand as a TImode shift if it isn't.
> (ix86_expand_v1ti_rotate): Check if the amount operand is an
> integer constant, and expand as a TImode rotate if it isn't.
> (ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
> right shifts of V1TImode quantities.
> * config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
> * config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
> to QImode general_operand, and let the helper functions lower
> shifts by non-constant operands, as TImode shifts. Make
> conditional on TARGET_64BIT.
> (ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
> (rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
> Make conditional on TARGET_64BIT.
>
> gcc/testsuite/ChangeLog
> PR target/102986
> * gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
> * gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
> * gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
> * gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
> * gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
>
> Thanks.
> Roger
> --
>
> -----Original Message-----
> From: Jakub Jelinek <jakub@redhat.com>
> Sent: 30 October 2021 11:30
> To: Roger Sayle <roger@nextmovesoftware.com>
> Cc: 'GCC Patches' <gcc-patches@gcc.gnu.org>; 'Uros Bizjak'
> <ubizjak@gmail.com>
> Subject: Re: [PATCH] x86_64: Expand ashrv1ti (and PR target/102986)
>
> On Sat, Oct 30, 2021 at 11:16:41AM +0100, Roger Sayle wrote:
> > 2021-10-30 Roger Sayle <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> > PR target/102986
> > * config/i386/i386-expand.c (ix86_expand_v1ti_to_ti,
> > ix86_expand_ti_to_v1ti): New helper functions.
> > (ix86_expand_v1ti_shift): Check if the amount operand is an
> > integer constant, and expand as a TImode shift if it isn't.
> > (ix86_expand_v1ti_rotate): Check if the amount operand is an
> > integer constant, and expand as a TImode rotate if it isn't.
> > (ix86_expand_v1ti_ashiftrt): New function to expand arithmetic
> > right shifts of V1TImode quantities.
> > * config/i386/i386-protos.h (ix86_expand_v1ti_ashift): Prototype.
> > * config/i386/sse.md (ashlv1ti3, lshrv1ti3): Change constraints
> > to QImode general_operand, and let the helper functions lower
> > shifts by non-constant operands, as TImode shifts.
> > (ashrv1ti3): New expander calling ix86_expand_v1ti_ashiftrt.
> > (rotlv1ti3, rotrv1ti3): Change shift operand to QImode.
> >
> > gcc/testsuite/ChangeLog
> > PR target/102986
> > * gcc.target/i386/sse2-v1ti-ashiftrt-1.c: New test case.
> > * gcc.target/i386/sse2-v1ti-ashiftrt-2.c: New test case.
> > * gcc.target/i386/sse2-v1ti-ashiftrt-3.c: New test case.
> > * gcc.target/i386/sse2-v1ti-shift-2.c: New test case.
> > * gcc.target/i386/sse2-v1ti-shift-3.c: New test case.
> >
> > Sorry again for the breakage in my last patch. I wasn't testing things
> > that shouldn't have been affected/changed.
>
> Not a review, will defer that to Uros, but just nits:
>
> > +/* Expand move of V1TI mode register X to a new TI mode register. */
> > +static rtx ix86_expand_v1ti_to_ti (rtx x)
>
> ix86_expand_v1ti_to_ti should be at the start of next line, so static rtx
> ix86_expand_v1ti_to_ti (rtx x)
>
> Ditto for other functions and also in functions you've added by the previous
> patch.
> > + emit_insn (code == ASHIFT ? gen_ashlti3(tmp2, tmp1, operands[2])
> > + : gen_lshrti3(tmp2, tmp1, operands[2]));
>
> Space before ( twice.
>
> > + emit_insn (code == ROTATE ? gen_rotlti3(tmp2, tmp1, operands[2])
> > + : gen_rotrti3(tmp2, tmp1, operands[2]));
>
> Likewise.
>
> > + emit_insn (gen_ashrti3(tmp2, tmp1, operands[2]));
>
> Similarly.
>
> Also, I wonder for all these patterns (previously and now added), shouldn't
> they have && TARGET_64BIT in conditions? I mean, we don't really support
> scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
> the constant shifts can be done, I think the variable shifts can't, there
> are no TImode shift patterns...
- (match_operand:SI 2 "const_int_operand")))]
- "TARGET_SSE2"
+ (match_operand:QI 2 "general_operand")))]
+ "TARGET_SSE2 && TARGET_64BIT"
I wonder if this change is too restrictive, as it disables V1TI shifts
by constant on 32bit targets. Perhaps we can introduce a conditional
predicate, like:
(define_predicate "shiftv1ti_input_operand"
(if_then_else (match_test "TARGET_64BIT")
(match_operand 0 "general_operand")
(match_operand 0 "const_int_operand")))
However, I'm not familiar with how the middle-end behaves with the
above approach - will it try to put the constant in a register under
some circumstances and consequently fail the expansion?
And one mandatory :) nit:
+ rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
+ rtx tmp2 = gen_reg_rtx (TImode);
+ emit_insn (code == ASHIFT ? gen_ashlti3 (tmp2, tmp1, operands[2])
+ : gen_lshrti3 (tmp2, tmp1, operands[2]));
+ rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
+ emit_move_insn (operands[0], tmp3);
+ return;
I'd write this as:
rtx tmp1 = ix86_expand_v1ti_to_ti (op1);
rtx tmp2 = gen_reg_rtx (TImode);
rtx (*shift) (rtx, rtx, rtx)
= (code == ASHIFT) ? gen_ashlti3 : gen_lshrti3;
emit_insn (shift (tmp2, tmp1, operands[2]));
rtx tmp3 = ix86_expand_ti_to_v1ti (tmp2);
emit_move_insn (operands[0], tmp3);
return;
Otherwise LGTM (and kudos for writing out all those sequences).
Thanks,
Uros.
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH Take #2] x86_64: Expand ashrv1ti (and PR target/102986)
2021-11-01 7:27 ` Uros Bizjak
@ 2021-11-01 8:43 ` Jakub Jelinek
2021-11-01 9:03 ` Uros Bizjak
0 siblings, 1 reply; 4+ messages in thread
From: Jakub Jelinek @ 2021-11-01 8:43 UTC (permalink / raw)
To: Uros Bizjak; +Cc: Roger Sayle, GCC Patches
On Mon, Nov 01, 2021 at 08:27:12AM +0100, Uros Bizjak wrote:
> > Also, I wonder for all these patterns (previously and now added), shouldn't
> > they have && TARGET_64BIT in conditions? I mean, we don't really support
> > scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
> > the constant shifts can be done, I think the variable shifts can't, there
> > are no TImode shift patterns...
>
> - (match_operand:SI 2 "const_int_operand")))]
> - "TARGET_SSE2"
> + (match_operand:QI 2 "general_operand")))]
> + "TARGET_SSE2 && TARGET_64BIT"
>
> I wonder if this change is too restrictive, as it disables V1TI shifts
> by constant on 32bit targets. Perhaps we can introduce a conditional
> predicate, like:
>
> (define_predicate "shiftv1ti_input_operand"
> (if_then_else (match_test "TARGET_64BIT")
> (match_operand 0 "general_operand")
> (match_operand 0 "const_int_operand")))
>
> However, I'm not familiar with how the middle-end behaves with the
> above approach - will it try to put the constant in a register under
> some circumstances and consequently fail the expansion?
That would run again into the assertions that shift expanders must never
fail.
The question is if a V1TImode shift can ever appear in 32-bit x86, because
typedef __int128 V __attribute__((vector_size (16)));
is rejected with
error: ‘__int128’ is not supported on this target
when -m32 is in use, no matter what ISA flags are used.
Jakub
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [PATCH Take #2] x86_64: Expand ashrv1ti (and PR target/102986)
2021-11-01 8:43 ` Jakub Jelinek
@ 2021-11-01 9:03 ` Uros Bizjak
0 siblings, 0 replies; 4+ messages in thread
From: Uros Bizjak @ 2021-11-01 9:03 UTC (permalink / raw)
To: Jakub Jelinek; +Cc: Roger Sayle, GCC Patches
On Mon, Nov 1, 2021 at 9:43 AM Jakub Jelinek <jakub@redhat.com> wrote:
>
> On Mon, Nov 01, 2021 at 08:27:12AM +0100, Uros Bizjak wrote:
> > > Also, I wonder for all these patterns (previously and now added), shouldn't
> > > they have && TARGET_64BIT in conditions? I mean, we don't really support
> > > scalar TImode for ia32, but VALID_SSE_REG_MODE includes V1TImode and while
> > > the constant shifts can be done, I think the variable shifts can't, there
> > > are no TImode shift patterns...
> >
> > - (match_operand:SI 2 "const_int_operand")))]
> > - "TARGET_SSE2"
> > + (match_operand:QI 2 "general_operand")))]
> > + "TARGET_SSE2 && TARGET_64BIT"
> >
> > I wonder if this change is too restrictive, as it disables V1TI shifts
> > by constant on 32bit targets. Perhaps we can introduce a conditional
> > predicate, like:
> >
> > (define_predicate "shiftv1ti_input_operand"
> > (if_then_else (match_test "TARGET_64BIT")
> > (match_operand 0 "general_operand")
> > (match_operand 0 "const_int_operand")))
> >
> > However, I'm not familiar with how the middle-end behaves with the
> > above approach - will it try to put the constant in a register under
> > some circumstances and consequently fail the expansion?
>
> That would run again into the assertions that shift expanders must never
> fail.
> The question is if a V1TImode shift can ever appear in 32-bit x86, because
> typedef __int128 V __attribute__((vector_size (16)));
> is rejected with
> error: ‘__int128’ is not supported on this target
> when -m32 is in use, no matter what ISA flags are used.
We can do:
typedef int __v1ti __attribute__((mode (V1TI)));
__v1ti foo (__v1ti a)
{
return a << 11;
}
gcc -O2 -msse2 -m32:
v1ti.c:1:1: warning: specifying vector types with ‘__attribute__
((mode))’ is deprecated [-Wattributes]
1 | typedef int __v1ti __attribute__((mode (V1TI)));
| ^~~~~~~
v1ti.c:1:1: note: use ‘__attribute__ ((vector_size))’ instead
during RTL pass: expand
v1ti.c: In function ‘foo’:
v1ti.c:5:12: internal compiler error: in expand_shift_1, at expmed.c:2668
5 | return a << 11;
| ~~^~~~~
which looks like an oversight of some kind, since TI (and V2TI) mode
errors out with:
v1ti.c:1:1: error: unable to emulate ‘TI’
and
v1ti.c:1:1: error: unable to emulate ‘V2TI’
I will submit a PR with the above issue.
But I agree, V1TI is x86_64 specific, so the added insn constraint is OK.
Thanks,
Uros.
> Jakub
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-11-01 9:03 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-31 10:02 [PATCH Take #2] x86_64: Expand ashrv1ti (and PR target/102986) Roger Sayle
2021-11-01 7:27 ` Uros Bizjak
2021-11-01 8:43 ` Jakub Jelinek
2021-11-01 9:03 ` Uros Bizjak
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).