* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
@ 2019-02-11 13:11 graham stott via gcc-patches
2019-02-11 13:14 ` H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: graham stott via gcc-patches @ 2019-02-11 13:11 UTC (permalink / raw)
To: Uros Bizjak, H.J. Lu; +Cc: GCC Patches
All these patches from HJL have no testcases. Are they even sutable for gcc 9 at this stage
-------- Original message --------
From: Uros Bizjak <ubizjak@gmail.com>
Date: 11/02/2019 12:51 (GMT+00:00)
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
On Mon, Feb 11, 2019 at 1:26 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > > > > > + [(const_int 0)]
> > > > > > > > +{
> > > > > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > > > + emit_insn (insn);
> > > > > > > > + DONE;
> > > > > > >
> > > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > > > > >
> > > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > >
> > > > > > is easy. How do I write
> > > > > >
> > > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > >
> > > > > > in place of (const_int 0)?
> > > > >
> > > > > [(set (match_dup 2)
> > > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > > >
> > > > > with
> > > > >
> > > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > > >
> > > > > or even better:
> > > > >
> > > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> > > > >
> > > > > in the preparation statement.
> > > >
> > > > Even shorter is
> > > >
> > > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"
> > > >
> > > > and use (match_dup 0) instead of (match_dup 2) in the RTX.
> > > >
> > > > There is plenty of examples throughout sse.md.
> > > >
> > >
> > > This works:
> > >
> > > (define_insn_and_split "*vec_dupv2si"
> > > [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > > (vec_duplicate:V2SI
> > > (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > > "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > > "@
> > > punpckldq\t%0, %0
> > > #
> > > #"
> > > "TARGET_MMX_WITH_SSE && reload_completed"
> > > [(set (match_dup 0)
> > > (vec_duplicate:V4SI (match_dup 1)))]
> > > "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > (set_attr "type" "mmxcvt,ssemov,ssemov")
> > > (set_attr "mode" "DI,TI,TI")])
> >
> > If it works, then gen_lowpart is preferred due to extra checks.
> > However, it would result in a paradoxical subreg, so I wonder if these
> > extra checks allow this transformation.
>
> gen_lowpart dosn't work:
Ah, we need lowpart_subreg after reload.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-11 13:11 [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE graham stott via gcc-patches
@ 2019-02-11 13:14 ` H.J. Lu
0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2019-02-11 13:14 UTC (permalink / raw)
To: graham stott; +Cc: Uros Bizjak, GCC Patches
On Mon, Feb 11, 2019 at 5:11 AM graham stott
<graham.stott@btinternet.com> wrote:
>
> All these patches from HJL have no testcases. Are they even sutable for gcc 9 at this stage
All my changes are covered by
https://gcc.gnu.org/ml/gcc-patches/2019-02/msg00632.html
> -------- Original message --------
> From: Uros Bizjak <ubizjak@gmail.com>
> Date: 11/02/2019 12:51 (GMT+00:00)
> To: "H.J. Lu" <hjl.tools@gmail.com>
> Cc: GCC Patches <gcc-patches@gcc.gnu.org>
> Subject: Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
>
> On Mon, Feb 11, 2019 at 1:26 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > > > > > + [(const_int 0)]
> > > > > > > > > +{
> > > > > > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > > > > + emit_insn (insn);
> > > > > > > > > + DONE;
> > > > > > > >
> > > > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > > > > > >
> > > > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > >
> > > > > > > is easy. How do I write
> > > > > > >
> > > > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > >
> > > > > > > in place of (const_int 0)?
> > > > > >
> > > > > > [(set (match_dup 2)
> > > > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > > > >
> > > > > > with
> > > > > >
> > > > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > > > >
> > > > > > or even better:
> > > > > >
> > > > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> > > > > >
> > > > > > in the preparation statement.
> > > > >
> > > > > Even shorter is
> > > > >
> > > > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"
> > > > >
> > > > > and use (match_dup 0) instead of (match_dup 2) in the RTX.
> > > > >
> > > > > There is plenty of examples throughout sse.md.
> > > > >
> > > >
> > > > This works:
> > > >
> > > > (define_insn_and_split "*vec_dupv2si"
> > > > [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > > > (vec_duplicate:V2SI
> > > > (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > > > "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > > > "@
> > > > punpckldq\t%0, %0
> > > > #
> > > > #"
> > > > "TARGET_MMX_WITH_SSE && reload_completed"
> > > > [(set (match_dup 0)
> > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > > "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > > (set_attr "type" "mmxcvt,ssemov,ssemov")
> > > > (set_attr "mode" "DI,TI,TI")])
> > >
> > > If it works, then gen_lowpart is preferred due to extra checks.
> > > However, it would result in a paradoxical subreg, so I wonder if these
> > > extra checks allow this transformation.
> >
> > gen_lowpart dosn't work:
>
> Ah, we need lowpart_subreg after reload.
>
> Uros.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 00/43] V3: Emulate MMX intrinsics with SSE
@ 2019-02-10 0:19 H.J. Lu
2019-02-10 0:20 ` [PATCH 12/43] i386: Emulate MMX vec_dupv2si " H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2019-02-10 0:19 UTC (permalink / raw)
To: gcc-patches; +Cc: Uros Bizjak
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added
#define TARGET_MMX_WITH_SSE \
(TARGET_64BIT && TARGET_SSE2 && !TARGET_3DNOW)
SSE emulation is disabled for 3DNOW since 3DNOW patterns haven't been
updated with SSE emulation.
;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx" (const_string "base"))
(eq_attr "mmx_isa" "native")
(symbol_ref "!TARGET_MMX_WITH_SSE")
(eq_attr "mmx_isa" "x64")
(symbol_ref "TARGET_MMX_WITH_SSE")
(eq_attr "mmx_isa" "x64_avx")
(symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
(eq_attr "mmx_isa" "x64_noavx")
(symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.
Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX. Thee are
couple tricky cases:
1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent. We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.
2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.
3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.
4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.
Tests are also added to check each SSE emulation of MMX intrinsics.
With SSE emulation in 64-bit mode, 8-byte vectorizer is enabled with SSE2.
There are no regressions on i686 and x86-64. For x86-64, GCC is also
tested with
--with-arch=native --with-cpu=native
on AVX2 and AVX512F machines.
H.J. Lu (43):
i386: Allow 64-bit vector modes in SSE registers
i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
i386: Emulate MMX plusminus/sat_plusminus with SSE
i386: Emulate MMX mulv4hi3 with SSE
i386: Emulate MMX smulv4hi3_highpart with SSE
i386: Emulate MMX mmx_pmaddwd with SSE
i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE
i386: Emulate MMX <any_logic><mode>3 with SSE
i386: Emulate MMX mmx_andnot<mode>3 with SSE
i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE
i386: Emulate MMX vec_dupv2si with SSE
i386: Emulate MMX pshufw with SSE
i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
i386: Emulate MMX sse_cvtpi2ps with SSE
i386: Emulate MMX mmx_pextrw with SSE
i386: Emulate MMX mmx_pinsrw with SSE
i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
i386: Emulate MMX mmx_pmovmskb with SSE
i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
i386: Emulate MMX maskmovq with SSE2 maskmovdqu
i386: Emulate MMX mmx_uavgv8qi3 with SSE
i386: Emulate MMX mmx_uavgv4hi3 with SSE
i386: Emulate MMX mmx_psadbw with SSE
i386: Emulate MMX movntq with SSE2 movntidi
i386: Emulate MMX umulv1siv1di3 with SSE2
i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE
i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE
i386: Emulate MMX ssse3_pmaddubsw with SSE
i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
i386: Emulate MMX pshufb with SSE version
i386: Emulate MMX ssse3_psign<mode>3 with SSE
i386: Emulate MMX ssse3_palignrdi with SSE
i386: Emulate MMX abs<mode>2 with SSE
i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
i386: Allow MMX intrinsic emulation with SSE
i386: Add tests for MMX intrinsic emulations with SSE
i386: Also enable SSSE3 __m64 tests in 64-bit mode
i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE
i386: Implement V2SF add/sub/mul with SSE
i386: Implement V2SF <-> V2SI conversions with SSE
i386: Implement V2SF comparisons with SSE
gcc/config/i386/i386-builtin.def | 126 +--
gcc/config/i386/i386-protos.h | 4 +
gcc/config/i386/i386.c | 206 +++-
gcc/config/i386/i386.h | 14 +
gcc/config/i386/i386.md | 15 +-
gcc/config/i386/mmintrin.h | 10 +-
gcc/config/i386/mmx.md | 962 +++++++++++++-----
gcc/config/i386/sse.md | 460 +++++++--
gcc/config/i386/xmmintrin.h | 61 ++
gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 2 +-
gcc/testsuite/gcc.target/i386/mmx-vals.h | 77 ++
gcc/testsuite/gcc.target/i386/pr82483-1.c | 2 +-
gcc/testsuite/gcc.target/i386/pr82483-2.c | 2 +-
gcc/testsuite/gcc.target/i386/pr89028-1.c | 10 +
gcc/testsuite/gcc.target/i386/pr89028-10.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-11.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-12.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-13.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-2.c | 11 +
gcc/testsuite/gcc.target/i386/pr89028-3.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-4.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-5.c | 11 +
gcc/testsuite/gcc.target/i386/pr89028-6.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-7.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-8.c | 12 +
gcc/testsuite/gcc.target/i386/pr89028-9.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-10.c | 42 +
gcc/testsuite/gcc.target/i386/sse2-mmx-11.c | 39 +
gcc/testsuite/gcc.target/i386/sse2-mmx-12.c | 41 +
gcc/testsuite/gcc.target/i386/sse2-mmx-13.c | 40 +
gcc/testsuite/gcc.target/i386/sse2-mmx-14.c | 30 +
gcc/testsuite/gcc.target/i386/sse2-mmx-15.c | 35 +
gcc/testsuite/gcc.target/i386/sse2-mmx-16.c | 39 +
gcc/testsuite/gcc.target/i386/sse2-mmx-17.c | 50 +
gcc/testsuite/gcc.target/i386/sse2-mmx-18.c | 13 +
gcc/testsuite/gcc.target/i386/sse2-mmx-19.c | 11 +
gcc/testsuite/gcc.target/i386/sse2-mmx-2.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-20.c | 11 +
gcc/testsuite/gcc.target/i386/sse2-mmx-21.c | 13 +
gcc/testsuite/gcc.target/i386/sse2-mmx-3.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-4.c | 4 +
gcc/testsuite/gcc.target/i386/sse2-mmx-5.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-6.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-7.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-8.c | 4 +
gcc/testsuite/gcc.target/i386/sse2-mmx-9.c | 79 ++
.../gcc.target/i386/sse2-mmx-cvtpi2ps.c | 42 +
.../gcc.target/i386/sse2-mmx-cvtps2pi.c | 35 +
.../gcc.target/i386/sse2-mmx-cvttps2pi.c | 35 +
.../gcc.target/i386/sse2-mmx-maskmovq.c | 98 ++
.../gcc.target/i386/sse2-mmx-packssdw.c | 51 +
.../gcc.target/i386/sse2-mmx-packsswb.c | 51 +
.../gcc.target/i386/sse2-mmx-packuswb.c | 51 +
.../gcc.target/i386/sse2-mmx-paddb.c | 47 +
.../gcc.target/i386/sse2-mmx-paddd.c | 47 +
.../gcc.target/i386/sse2-mmx-paddq.c | 42 +
.../gcc.target/i386/sse2-mmx-paddsb.c | 47 +
.../gcc.target/i386/sse2-mmx-paddsw.c | 47 +
.../gcc.target/i386/sse2-mmx-paddusb.c | 47 +
.../gcc.target/i386/sse2-mmx-paddusw.c | 47 +
.../gcc.target/i386/sse2-mmx-paddw.c | 47 +
gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c | 43 +
.../gcc.target/i386/sse2-mmx-pandn.c | 43 +
.../gcc.target/i386/sse2-mmx-pavgb.c | 51 +
.../gcc.target/i386/sse2-mmx-pavgw.c | 51 +
.../gcc.target/i386/sse2-mmx-pcmpeqb.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpeqd.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpeqw.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpgtb.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpgtd.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpgtw.c | 47 +
.../gcc.target/i386/sse2-mmx-pextrw.c | 58 ++
.../gcc.target/i386/sse2-mmx-pinsrw.c | 60 ++
.../gcc.target/i386/sse2-mmx-pmaddwd.c | 46 +
.../gcc.target/i386/sse2-mmx-pmaxsw.c | 47 +
.../gcc.target/i386/sse2-mmx-pmaxub.c | 47 +
.../gcc.target/i386/sse2-mmx-pminsw.c | 47 +
.../gcc.target/i386/sse2-mmx-pminub.c | 47 +
.../gcc.target/i386/sse2-mmx-pmovmskb.c | 45 +
.../gcc.target/i386/sse2-mmx-pmulhuw.c | 50 +
.../gcc.target/i386/sse2-mmx-pmulhw.c | 52 +
.../gcc.target/i386/sse2-mmx-pmullw.c | 51 +
.../gcc.target/i386/sse2-mmx-pmuludq.c | 46 +
gcc/testsuite/gcc.target/i386/sse2-mmx-por.c | 43 +
.../gcc.target/i386/sse2-mmx-psadbw.c | 57 ++
.../gcc.target/i386/sse2-mmx-pshufw.c | 247 +++++
.../gcc.target/i386/sse2-mmx-pslld.c | 51 +
.../gcc.target/i386/sse2-mmx-pslldi.c | 152 +++
.../gcc.target/i386/sse2-mmx-psllq.c | 46 +
.../gcc.target/i386/sse2-mmx-psllqi.c | 244 +++++
.../gcc.target/i386/sse2-mmx-psllw.c | 51 +
.../gcc.target/i386/sse2-mmx-psllwi.c | 104 ++
.../gcc.target/i386/sse2-mmx-psrad.c | 51 +
.../gcc.target/i386/sse2-mmx-psradi.c | 152 +++
.../gcc.target/i386/sse2-mmx-psraw.c | 51 +
.../gcc.target/i386/sse2-mmx-psrawi.c | 104 ++
.../gcc.target/i386/sse2-mmx-psrld.c | 51 +
.../gcc.target/i386/sse2-mmx-psrldi.c | 152 +++
.../gcc.target/i386/sse2-mmx-psrlq.c | 46 +
.../gcc.target/i386/sse2-mmx-psrlqi.c | 244 +++++
.../gcc.target/i386/sse2-mmx-psrlw.c | 51 +
.../gcc.target/i386/sse2-mmx-psrlwi.c | 104 ++
.../gcc.target/i386/sse2-mmx-psubb.c | 47 +
.../gcc.target/i386/sse2-mmx-psubd.c | 47 +
.../gcc.target/i386/sse2-mmx-psubq.c | 42 +
.../gcc.target/i386/sse2-mmx-psubusb.c | 47 +
.../gcc.target/i386/sse2-mmx-psubusw.c | 47 +
.../gcc.target/i386/sse2-mmx-psubw.c | 47 +
.../gcc.target/i386/sse2-mmx-punpckhbw.c | 52 +
.../gcc.target/i386/sse2-mmx-punpckhdq.c | 46 +
.../gcc.target/i386/sse2-mmx-punpckhwd.c | 48 +
.../gcc.target/i386/sse2-mmx-punpcklbw.c | 52 +
.../gcc.target/i386/sse2-mmx-punpckldq.c | 46 +
.../gcc.target/i386/sse2-mmx-punpcklwd.c | 48 +
gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c | 43 +
gcc/testsuite/gcc.target/i386/sse2-mmx.c | 1 -
gcc/testsuite/gcc.target/i386/ssse3-pabsb.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-pabsd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-pabsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-palignr.c | 6 +-
gcc/testsuite/gcc.target/i386/ssse3-phaddd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phaddsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phaddw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phsubd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phsubsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phsubw.c | 4 +-
.../gcc.target/i386/ssse3-pmaddubsw.c | 4 +-
.../gcc.target/i386/ssse3-pmulhrsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-pshufb.c | 6 +-
gcc/testsuite/gcc.target/i386/ssse3-psignb.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-psignd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-psignw.c | 4 +-
132 files changed, 6722 insertions(+), 480 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/mmx-vals.h
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-9.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtpi2ps.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtps2pi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvttps2pi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-maskmovq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packssdw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packsswb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packuswb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pandn.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pextrw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxsw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxub.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminsw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminub.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmovmskb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhuw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmullw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmuludq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psadbw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pshufw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslld.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslldi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllqi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllwi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrad.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psradi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psraw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrawi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrld.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrldi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlqi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlwi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhbw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhdq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhwd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklbw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckldq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklwd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-10 0:19 [PATCH 00/43] V3: Emulate MMX intrinsics " H.J. Lu
@ 2019-02-10 0:20 ` H.J. Lu
2019-02-10 10:36 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2019-02-10 0:20 UTC (permalink / raw)
To: gcc-patches; +Cc: Uros Bizjak
Emulate MMX vec_dupv2si with SSE. Only SSE register source operand is
allowed.
PR target/89021
* config/i386/mmx.md (*vec_dupv2si): Changed to
define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
support SSE emulation.
* config/i386/sse.md (*vec_dupv4si): Renamed to ...
(vec_dupv4si): This.
---
gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------
gcc/config/i386/sse.md | 2 +-
2 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index d360e97c98b..1ee51c5deb7 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1420,14 +1420,27 @@
(set_attr "length_immediate" "1")
(set_attr "mode" "DI")])
-(define_insn "*vec_dupv2si"
- [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+ [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
(vec_duplicate:V2SI
- (match_operand:SI 1 "register_operand" "0")))]
- "TARGET_MMX"
- "punpckldq\t%0, %0"
- [(set_attr "type" "mmxcvt")
- (set_attr "mode" "DI")])
+ (match_operand:SI 1 "register_operand" "0,0,Yv")))]
+ "TARGET_MMX || TARGET_MMX_WITH_SSE"
+ "@
+ punpckldq\t%0, %0
+ #
+ #"
+ "&& reload_completed && TARGET_MMX_WITH_SSE"
+ [(const_int 0)]
+{
+ /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
+ rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+ rtx insn = gen_vec_dupv4si (op0, operands[1]);
+ emit_insn (insn);
+ DONE;
+}
+ [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+ (set_attr "type" "mmxcvt,ssemov,ssemov")
+ (set_attr "mode" "DI,TI,TI")])
(define_insn "*mmx_concatv2si"
[(set (match_operand:V2SI 0 "register_operand" "=y,y")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5dc0930ac1f..7d2c0367911 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18976,7 +18976,7 @@
(set_attr "prefix" "maybe_evex,maybe_evex,orig")
(set_attr "mode" "V4SF")])
-(define_insn "*vec_dupv4si"
+(define_insn "vec_dupv4si"
[(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
(vec_duplicate:V4SI
(match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-10 0:20 ` [PATCH 12/43] i386: Emulate MMX vec_dupv2si " H.J. Lu
@ 2019-02-10 10:36 ` Uros Bizjak
2019-02-10 21:01 ` H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2019-02-10 10:36 UTC (permalink / raw)
To: H.J. Lu; +Cc: gcc-patches
On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:
> Emulate MMX vec_dupv2si with SSE. Only SSE register source operand is
> allowed.
>
> PR target/89021
> * config/i386/mmx.md (*vec_dupv2si): Changed to
> define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
> support SSE emulation.
> * config/i386/sse.md (*vec_dupv4si): Renamed to ...
> (vec_dupv4si): This.
> ---
> gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------
> gcc/config/i386/sse.md | 2 +-
> 2 files changed, 21 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> index d360e97c98b..1ee51c5deb7 100644
> --- a/gcc/config/i386/mmx.md
> +++ b/gcc/config/i386/mmx.md
> @@ -1420,14 +1420,27 @@
> (set_attr "length_immediate" "1")
> (set_attr "mode" "DI")])
>
> -(define_insn "*vec_dupv2si"
> - [(set (match_operand:V2SI 0 "register_operand" "=y")
> +(define_insn_and_split "*vec_dupv2si"
> + [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> (vec_duplicate:V2SI
> - (match_operand:SI 1 "register_operand" "0")))]
> - "TARGET_MMX"
> - "punpckldq\t%0, %0"
> - [(set_attr "type" "mmxcvt")
> - (set_attr "mode" "DI")])
> + (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> + "TARGET_MMX || TARGET_MMX_WITH_SSE"
> + "@
> + punpckldq\t%0, %0
> + #
> + #"
> + "&& reload_completed && TARGET_MMX_WITH_SSE"
Please fix above.
> + [(const_int 0)]
> +{
> + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> + emit_insn (insn);
> + DONE;
Please write this simple RTX explicitly in the place of (const_int 0) above.
Uros.
> +}
> + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> + (set_attr "type" "mmxcvt,ssemov,ssemov")
> + (set_attr "mode" "DI,TI,TI")])
>
> (define_insn "*mmx_concatv2si"
> [(set (match_operand:V2SI 0 "register_operand" "=y,y")
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 5dc0930ac1f..7d2c0367911 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -18976,7 +18976,7 @@
> (set_attr "prefix" "maybe_evex,maybe_evex,orig")
> (set_attr "mode" "V4SF")])
>
> -(define_insn "*vec_dupv4si"
> +(define_insn "vec_dupv4si"
> [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> (vec_duplicate:V4SI
> (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> --
> 2.20.1
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-10 10:36 ` Uros Bizjak
@ 2019-02-10 21:01 ` H.J. Lu
2019-02-10 21:46 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2019-02-10 21:01 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Sun, Feb 10, 2019 at 2:36 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:
> > Emulate MMX vec_dupv2si with SSE. Only SSE register source operand is
> > allowed.
> >
> > PR target/89021
> > * config/i386/mmx.md (*vec_dupv2si): Changed to
> > define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
> > support SSE emulation.
> > * config/i386/sse.md (*vec_dupv4si): Renamed to ...
> > (vec_dupv4si): This.
> > ---
> > gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------
> > gcc/config/i386/sse.md | 2 +-
> > 2 files changed, 21 insertions(+), 8 deletions(-)
> >
> > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > index d360e97c98b..1ee51c5deb7 100644
> > --- a/gcc/config/i386/mmx.md
> > +++ b/gcc/config/i386/mmx.md
> > @@ -1420,14 +1420,27 @@
> > (set_attr "length_immediate" "1")
> > (set_attr "mode" "DI")])
> >
> > -(define_insn "*vec_dupv2si"
> > - [(set (match_operand:V2SI 0 "register_operand" "=y")
> > +(define_insn_and_split "*vec_dupv2si"
> > + [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > (vec_duplicate:V2SI
> > - (match_operand:SI 1 "register_operand" "0")))]
> > - "TARGET_MMX"
> > - "punpckldq\t%0, %0"
> > - [(set_attr "type" "mmxcvt")
> > - (set_attr "mode" "DI")])
> > + (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > + "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > + "@
> > + punpckldq\t%0, %0
> > + #
> > + #"
> > + "&& reload_completed && TARGET_MMX_WITH_SSE"
>
> Please fix above.
I will use
"TARGET_MMX_WITH_SSE && reload_completed"
> > + [(const_int 0)]
> > +{
> > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > + emit_insn (insn);
> > + DONE;
>
> Please write this simple RTX explicitly in the place of (const_int 0) above.
rtx insn = gen_vec_dupv4si (op0, operands[1]);
is easy. How do I write
rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
in place of (const_int 0)?
> Uros.
>
> > +}
> > + [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > + (set_attr "type" "mmxcvt,ssemov,ssemov")
> > + (set_attr "mode" "DI,TI,TI")])
> >
> > (define_insn "*mmx_concatv2si"
> > [(set (match_operand:V2SI 0 "register_operand" "=y,y")
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index 5dc0930ac1f..7d2c0367911 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -18976,7 +18976,7 @@
> > (set_attr "prefix" "maybe_evex,maybe_evex,orig")
> > (set_attr "mode" "V4SF")])
> >
> > -(define_insn "*vec_dupv4si"
> > +(define_insn "vec_dupv4si"
> > [(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
> > (vec_duplicate:V4SI
> > (match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
> > --
> > 2.20.1
> >
> >
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-10 21:01 ` H.J. Lu
@ 2019-02-10 21:46 ` Uros Bizjak
2019-02-10 21:49 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2019-02-10 21:46 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
On Sun, Feb 10, 2019 at 10:01 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Feb 10, 2019 at 2:36 AM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On 2/10/19, H.J. Lu <hjl.tools@gmail.com> wrote:
> > > Emulate MMX vec_dupv2si with SSE. Only SSE register source operand is
> > > allowed.
> > >
> > > PR target/89021
> > > * config/i386/mmx.md (*vec_dupv2si): Changed to
> > > define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
> > > support SSE emulation.
> > > * config/i386/sse.md (*vec_dupv4si): Renamed to ...
> > > (vec_dupv4si): This.
> > > ---
> > > gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------
> > > gcc/config/i386/sse.md | 2 +-
> > > 2 files changed, 21 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
> > > index d360e97c98b..1ee51c5deb7 100644
> > > --- a/gcc/config/i386/mmx.md
> > > +++ b/gcc/config/i386/mmx.md
> > > @@ -1420,14 +1420,27 @@
> > > (set_attr "length_immediate" "1")
> > > (set_attr "mode" "DI")])
> > >
> > > -(define_insn "*vec_dupv2si"
> > > - [(set (match_operand:V2SI 0 "register_operand" "=y")
> > > +(define_insn_and_split "*vec_dupv2si"
> > > + [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > > (vec_duplicate:V2SI
> > > - (match_operand:SI 1 "register_operand" "0")))]
> > > - "TARGET_MMX"
> > > - "punpckldq\t%0, %0"
> > > - [(set_attr "type" "mmxcvt")
> > > - (set_attr "mode" "DI")])
> > > + (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > > + "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > > + "@
> > > + punpckldq\t%0, %0
> > > + #
> > > + #"
> > > + "&& reload_completed && TARGET_MMX_WITH_SSE"
> >
> > Please fix above.
>
> I will use
>
> "TARGET_MMX_WITH_SSE && reload_completed"
>
> > > + [(const_int 0)]
> > > +{
> > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > + emit_insn (insn);
> > > + DONE;
> >
> > Please write this simple RTX explicitly in the place of (const_int 0) above.
>
> rtx insn = gen_vec_dupv4si (op0, operands[1]);
>
> is easy. How do I write
>
> rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
>
> in place of (const_int 0)?
[(set (match_dup 2)
(vec_duplicate:V4SI (match_dup 1)))]
with
"operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
or even better:
"operands[2] = gen_lowpart (V4SImode, operands[0]);"
in the preparation statement.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-10 21:46 ` Uros Bizjak
@ 2019-02-10 21:49 ` Uros Bizjak
2019-02-11 1:04 ` H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2019-02-10 21:49 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > + [(const_int 0)]
> > > > +{
> > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > + emit_insn (insn);
> > > > + DONE;
> > >
> > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> >
> > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> >
> > is easy. How do I write
> >
> > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> >
> > in place of (const_int 0)?
>
> [(set (match_dup 2)
> (vec_duplicate:V4SI (match_dup 1)))]
>
> with
>
> "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
>
> or even better:
>
> "operands[2] = gen_lowpart (V4SImode, operands[0]);"
>
> in the preparation statement.
Even shorter is
"operands[0] = gen_lowpart (V4SImode, operands[0]);"
and use (match_dup 0) instead of (match_dup 2) in the RTX.
There is plenty of examples throughout sse.md.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-10 21:49 ` Uros Bizjak
@ 2019-02-11 1:04 ` H.J. Lu
2019-02-11 7:25 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2019-02-11 1:04 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> > > > > + [(const_int 0)]
> > > > > +{
> > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > + emit_insn (insn);
> > > > > + DONE;
> > > >
> > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > >
> > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > >
> > > is easy. How do I write
> > >
> > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > >
> > > in place of (const_int 0)?
> >
> > [(set (match_dup 2)
> > (vec_duplicate:V4SI (match_dup 1)))]
> >
> > with
> >
> > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> >
> > or even better:
> >
> > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> >
> > in the preparation statement.
>
> Even shorter is
>
> "operands[0] = gen_lowpart (V4SImode, operands[0]);"
>
> and use (match_dup 0) instead of (match_dup 2) in the RTX.
>
> There is plenty of examples throughout sse.md.
>
This works:
(define_insn_and_split "*vec_dupv2si"
[(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
(vec_duplicate:V2SI
(match_operand:SI 1 "register_operand" "0,0,Yv")))]
"TARGET_MMX || TARGET_MMX_WITH_SSE"
"@
punpckldq\t%0, %0
#
#"
"TARGET_MMX_WITH_SSE && reload_completed"
[(set (match_dup 0)
(vec_duplicate:V4SI (match_dup 1)))]
"operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
[(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
(set_attr "type" "mmxcvt,ssemov,ssemov")
(set_attr "mode" "DI,TI,TI")])
Thanks.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-11 1:04 ` H.J. Lu
@ 2019-02-11 7:25 ` Uros Bizjak
2019-02-11 12:27 ` H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2019-02-11 7:25 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > > > > > + [(const_int 0)]
> > > > > > +{
> > > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > + emit_insn (insn);
> > > > > > + DONE;
> > > > >
> > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > > >
> > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > >
> > > > is easy. How do I write
> > > >
> > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > >
> > > > in place of (const_int 0)?
> > >
> > > [(set (match_dup 2)
> > > (vec_duplicate:V4SI (match_dup 1)))]
> > >
> > > with
> > >
> > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > >
> > > or even better:
> > >
> > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> > >
> > > in the preparation statement.
> >
> > Even shorter is
> >
> > "operands[0] = gen_lowpart (V4SImode, operands[0]);"
> >
> > and use (match_dup 0) instead of (match_dup 2) in the RTX.
> >
> > There is plenty of examples throughout sse.md.
> >
>
> This works:
>
> (define_insn_and_split "*vec_dupv2si"
> [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> (vec_duplicate:V2SI
> (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> "TARGET_MMX || TARGET_MMX_WITH_SSE"
> "@
> punpckldq\t%0, %0
> #
> #"
> "TARGET_MMX_WITH_SSE && reload_completed"
> [(set (match_dup 0)
> (vec_duplicate:V4SI (match_dup 1)))]
> "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> (set_attr "type" "mmxcvt,ssemov,ssemov")
> (set_attr "mode" "DI,TI,TI")])
If it works, then gen_lowpart is preferred due to extra checks.
However, it would result in a paradoxical subreg, so I wonder if these
extra checks allow this transformation.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-11 7:25 ` Uros Bizjak
@ 2019-02-11 12:27 ` H.J. Lu
2019-02-11 12:51 ` Uros Bizjak
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2019-02-11 12:27 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > > > > > + [(const_int 0)]
> > > > > > > +{
> > > > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > > + emit_insn (insn);
> > > > > > > + DONE;
> > > > > >
> > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > > > >
> > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > >
> > > > > is easy. How do I write
> > > > >
> > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > >
> > > > > in place of (const_int 0)?
> > > >
> > > > [(set (match_dup 2)
> > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > >
> > > > with
> > > >
> > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > >
> > > > or even better:
> > > >
> > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> > > >
> > > > in the preparation statement.
> > >
> > > Even shorter is
> > >
> > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"
> > >
> > > and use (match_dup 0) instead of (match_dup 2) in the RTX.
> > >
> > > There is plenty of examples throughout sse.md.
> > >
> >
> > This works:
> >
> > (define_insn_and_split "*vec_dupv2si"
> > [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > (vec_duplicate:V2SI
> > (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > "@
> > punpckldq\t%0, %0
> > #
> > #"
> > "TARGET_MMX_WITH_SSE && reload_completed"
> > [(set (match_dup 0)
> > (vec_duplicate:V4SI (match_dup 1)))]
> > "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > (set_attr "type" "mmxcvt,ssemov,ssemov")
> > (set_attr "mode" "DI,TI,TI")])
>
> If it works, then gen_lowpart is preferred due to extra checks.
> However, it would result in a paradoxical subreg, so I wonder if these
> extra checks allow this transformation.
gen_lowpart dosn't work:
#include <mmintrin.h>
__m64
foo (int i)
{
__v2si x = { i, i };
return (__m64) x;
}
(gdb) f 1
#1 0x0000000000ba7cca in gen_reg_rtx (mode=E_V2SImode)
at /export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c:1155
1155 gcc_assert (can_create_pseudo_p ());
(gdb) bt
#0 fancy_abort (
file=0x22180e0 "/export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c",
line=1155,
function=0x22193a8 <gen_reg_rtx(machine_mode)::__FUNCTION__> "gen_reg_rtx")
at /export/gnu/import/git/gitlab/x86-gcc/gcc/diagnostic.c:1607
#1 0x0000000000ba7cca in gen_reg_rtx (mode=E_V2SImode)
at /export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c:1155
#2 0x0000000000bd3044 in copy_to_reg (x=0x7fffea99b528)
at /export/gnu/import/git/gitlab/x86-gcc/gcc/explow.c:594
#3 0x00000000010c7c0a in gen_lowpart_general (mode=E_V4SImode,
x=0x7fffea99b528)
at /export/gnu/import/git/gitlab/x86-gcc/gcc/rtlhooks.c:56
...
#1 0x0000000000ba7cca in gen_reg_rtx (mode=E_V2SImode)
at /export/gnu/import/git/gitlab/x86-gcc/gcc/emit-rtl.c:1155
1155 gcc_assert (can_create_pseudo_p ());
(gdb)
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-11 12:27 ` H.J. Lu
@ 2019-02-11 12:51 ` Uros Bizjak
2019-02-11 13:12 ` H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: Uros Bizjak @ 2019-02-11 12:51 UTC (permalink / raw)
To: H.J. Lu; +Cc: GCC Patches
On Mon, Feb 11, 2019 at 1:26 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> >
> > On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > >
> > > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > >
> > > > > > > > + [(const_int 0)]
> > > > > > > > +{
> > > > > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > > > + emit_insn (insn);
> > > > > > > > + DONE;
> > > > > > >
> > > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > > > > >
> > > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > >
> > > > > > is easy. How do I write
> > > > > >
> > > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > >
> > > > > > in place of (const_int 0)?
> > > > >
> > > > > [(set (match_dup 2)
> > > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > > >
> > > > > with
> > > > >
> > > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > > >
> > > > > or even better:
> > > > >
> > > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> > > > >
> > > > > in the preparation statement.
> > > >
> > > > Even shorter is
> > > >
> > > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"
> > > >
> > > > and use (match_dup 0) instead of (match_dup 2) in the RTX.
> > > >
> > > > There is plenty of examples throughout sse.md.
> > > >
> > >
> > > This works:
> > >
> > > (define_insn_and_split "*vec_dupv2si"
> > > [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > > (vec_duplicate:V2SI
> > > (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > > "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > > "@
> > > punpckldq\t%0, %0
> > > #
> > > #"
> > > "TARGET_MMX_WITH_SSE && reload_completed"
> > > [(set (match_dup 0)
> > > (vec_duplicate:V4SI (match_dup 1)))]
> > > "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > (set_attr "type" "mmxcvt,ssemov,ssemov")
> > > (set_attr "mode" "DI,TI,TI")])
> >
> > If it works, then gen_lowpart is preferred due to extra checks.
> > However, it would result in a paradoxical subreg, so I wonder if these
> > extra checks allow this transformation.
>
> gen_lowpart dosn't work:
Ah, we need lowpart_subreg after reload.
Uros.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-11 12:51 ` Uros Bizjak
@ 2019-02-11 13:12 ` H.J. Lu
0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2019-02-11 13:12 UTC (permalink / raw)
To: Uros Bizjak; +Cc: GCC Patches
In Mon, Feb 11, 2019 at 4:51 AM Uros Bizjak <ubizjak@gmail.com> wrote:
>
> On Mon, Feb 11, 2019 at 1:26 PM H.J. Lu <hjl.tools@gmail.com> wrote:
> >
> > On Sun, Feb 10, 2019 at 11:25 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > >
> > > On Mon, Feb 11, 2019 at 2:04 AM H.J. Lu <hjl.tools@gmail.com> wrote:
> > > >
> > > > On Sun, Feb 10, 2019 at 1:49 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > On Sun, Feb 10, 2019 at 10:45 PM Uros Bizjak <ubizjak@gmail.com> wrote:
> > > > >
> > > > > > > > > + [(const_int 0)]
> > > > > > > > > +{
> > > > > > > > > + /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
> > > > > > > > > + rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > > > > + rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > > > > + emit_insn (insn);
> > > > > > > > > + DONE;
> > > > > > > >
> > > > > > > > Please write this simple RTX explicitly in the place of (const_int 0) above.
> > > > > > >
> > > > > > > rtx insn = gen_vec_dupv4si (op0, operands[1]);
> > > > > > >
> > > > > > > is easy. How do I write
> > > > > > >
> > > > > > > rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
> > > > > > >
> > > > > > > in place of (const_int 0)?
> > > > > >
> > > > > > [(set (match_dup 2)
> > > > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > > > >
> > > > > > with
> > > > > >
> > > > > > "operands[2] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > > > >
> > > > > > or even better:
> > > > > >
> > > > > > "operands[2] = gen_lowpart (V4SImode, operands[0]);"
> > > > > >
> > > > > > in the preparation statement.
> > > > >
> > > > > Even shorter is
> > > > >
> > > > > "operands[0] = gen_lowpart (V4SImode, operands[0]);"
> > > > >
> > > > > and use (match_dup 0) instead of (match_dup 2) in the RTX.
> > > > >
> > > > > There is plenty of examples throughout sse.md.
> > > > >
> > > >
> > > > This works:
> > > >
> > > > (define_insn_and_split "*vec_dupv2si"
> > > > [(set (match_operand:V2SI 0 "register_operand" "=y,x,Yv")
> > > > (vec_duplicate:V2SI
> > > > (match_operand:SI 1 "register_operand" "0,0,Yv")))]
> > > > "TARGET_MMX || TARGET_MMX_WITH_SSE"
> > > > "@
> > > > punpckldq\t%0, %0
> > > > #
> > > > #"
> > > > "TARGET_MMX_WITH_SSE && reload_completed"
> > > > [(set (match_dup 0)
> > > > (vec_duplicate:V4SI (match_dup 1)))]
> > > > "operands[0] = gen_rtx_REG (V4SImode, REGNO (operands[0]));"
> > > > [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
> > > > (set_attr "type" "mmxcvt,ssemov,ssemov")
> > > > (set_attr "mode" "DI,TI,TI")])
> > >
> > > If it works, then gen_lowpart is preferred due to extra checks.
> > > However, it would result in a paradoxical subreg, so I wonder if these
> > > extra checks allow this transformation.
> >
> > gen_lowpart dosn't work:
>
> Ah, we need lowpart_subreg after reload.
>
> Uros.
"operands[0] = lowpart_subreg (V4SImode, operands[0],
GET_MODE (operands[0]));"
works.
--
H.J.
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 00/43] V2: Emulate MMX intrinsics with SSE
@ 2019-02-09 13:24 H.J. Lu
2019-02-09 13:24 ` [PATCH 12/43] i386: Emulate MMX vec_dupv2si " H.J. Lu
0 siblings, 1 reply; 13+ messages in thread
From: H.J. Lu @ 2019-02-09 13:24 UTC (permalink / raw)
To: gcc-patches; +Cc: Uros Bizjak
On x86-64, since __m64 is returned and passed in XMM registers, we can
emulate MMX intrinsics with SSE instructions. To support it, we added
#define TARGET_MMX_WITH_SSE \
(TARGET_64BIT && TARGET_SSE2 && !TARGET_3DNOW)
SSE emulation is disabled for 3DNOW since 3DNOW patterns haven't been
updated with SSE emulation.
;; Define instruction set of MMX instructions
(define_attr "mmx_isa" "base,native,x64,x64_noavx,x64_avx" (const_string "base"))
(eq_attr "mmx_isa" "native")
(symbol_ref "!TARGET_MMX_WITH_SSE")
(eq_attr "mmx_isa" "x64")
(symbol_ref "TARGET_MMX_WITH_SSE")
(eq_attr "mmx_isa" "x64_avx")
(symbol_ref "TARGET_MMX_WITH_SSE && TARGET_AVX")
(eq_attr "mmx_isa" "x64_noavx")
(symbol_ref "TARGET_MMX_WITH_SSE && !TARGET_AVX")
(define_register_constraint "Yx" "TARGET_MMX_WITH_SSE ? SSE_REGS : NO_REGS"
"@internal Any SSE register if MMX is disabled in 64-bit mode.")
(define_register_constraint "Yy"
"TARGET_MMX_WITH_SSE ? (TARGET_AVX512VL ? ALL_SSE_REGS : TARGET_SSE ? SSE_REGS : NO_REGS) : NO_REGS"
"@internal Any EVEX encodable SSE register for AVX512VL target, otherwise any SSE register if MMX is disabled in 64-bit mode.")
We added SSE emulation to MMX patterns and disabled MMX alternatives with
TARGET_MMX_WITH_SSE.
Most of MMX instructions have equivalent SSE versions and results of some
SSE versions need to be reshuffled to the right order for MMX. Thee are
couple tricky cases:
1. MMX maskmovq and SSE2 maskmovdqu aren't equivalent. We emulate MMX
maskmovq with SSE2 maskmovdqu by zeroing out the upper 64 bits of the
mask operand and handle unmapped bits 64:127 at memory address by
adjusting source and mask operands together with memory address.
2. MMX movntq is emulated with SSE2 DImode movnti, which is available
in 64-bit mode.
3. MMX pshufb takes a 3-bit index while SSE pshufb takes a 4-bit index.
SSE emulation must clear the bit 4 in the shuffle control mask.
4. To emulate MMX cvtpi2p with SSE2 cvtdq2ps, we must properly preserve
the upper 64 bits of destination XMM register.
Tests are also added to check each SSE emulation of MMX intrinsics.
With SSE emulation in 64-bit mode, 8-byte vectorizer is enabled with SSE2.
There are no regressions on i686 and x86-64. For x86-64, GCC is also
tested with
--with-arch=native --with-cpu=native
on AVX2 and AVX512F machines.
H.J. Lu (43):
i386: Allow 64-bit vector modes in SSE registers
i386: Emulate MMX packsswb/packssdw/packuswb with SSE2
i386: Emulate MMX punpcklXX/punpckhXX with SSE punpcklXX
i386: Emulate MMX plusminus/sat_plusminus with SSE
i386: Emulate MMX mulv4hi3 with SSE
i386: Emulate MMX smulv4hi3_highpart with SSE
i386: Emulate MMX mmx_pmaddwd with SSE
i386: Emulate MMX ashr<mode>3/<shift_insn><mode>3 with SSE
i386: Emulate MMX <any_logic><mode>3 with SSE
i386: Emulate MMX mmx_andnot<mode>3 with SSE
i386: Emulate MMX mmx_eq/mmx_gt<mode>3 with SSE
i386: Emulate MMX vec_dupv2si with SSE
i386: Emulate MMX pshufw with SSE
i386: Emulate MMX sse_cvtps2pi/sse_cvttps2pi with SSE
i386: Emulate MMX sse_cvtpi2ps with SSE
i386: Emulate MMX mmx_pextrw with SSE
i386: Emulate MMX mmx_pinsrw with SSE
i386: Emulate MMX V4HI smaxmin/V8QI umaxmin with SSE
i386: Emulate MMX mmx_pmovmskb with SSE
i386: Emulate MMX mmx_umulv4hi3_highpart with SSE
i386: Emulate MMX maskmovq with SSE2 maskmovdqu
i386: Emulate MMX mmx_uavgv8qi3 with SSE
i386: Emulate MMX mmx_uavgv4hi3 with SSE
i386: Emulate MMX mmx_psadbw with SSE
i386: Emulate MMX movntq with SSE2 movntidi
i386: Emulate MMX umulv1siv1di3 with SSE2
i386: Emulate MMX ssse3_ph<plusminus_mnemonic>wv4hi3 with SSE
i386: Emulate MMX ssse3_ph<plusminus_mnemonic>dv2si3 with SSE
i386: Emulate MMX ssse3_pmaddubsw with SSE
i386: Emulate MMX ssse3_pmulhrswv4hi3 with SSE
i386: Emulate MMX pshufb with SSE version
i386: Emulate MMX ssse3_psign<mode>3 with SSE
i386: Emulate MMX ssse3_palignrdi with SSE
i386: Emulate MMX abs<mode>2 with SSE
i386: Allow MMXMODE moves with TARGET_MMX_WITH_SSE
i386: Allow MMX vector expanders with TARGET_MMX_WITH_SSE
i386: Allow MMX intrinsic emulation with SSE
i386: Add tests for MMX intrinsic emulations with SSE
i386: Also enable SSSE3 __m64 tests in 64-bit mode
i386: Enable 8-byte vectorizer for TARGET_MMX_WITH_SSE
i386: Implement V2SF add/sub/mul with SEE
i386: Implement V2SF <-> V2SI conversions with SEE
i386: Implement V2SF comparisons with SSE
gcc/config/i386/constraints.md | 10 +
gcc/config/i386/i386-builtin.def | 126 +--
gcc/config/i386/i386-protos.h | 4 +
gcc/config/i386/i386.c | 186 +++-
gcc/config/i386/i386.h | 20 +-
gcc/config/i386/i386.md | 15 +-
gcc/config/i386/mmintrin.h | 10 +-
gcc/config/i386/mmx.md | 909 +++++++++++++-----
gcc/config/i386/sse.md | 440 +++++++--
gcc/config/i386/xmmintrin.h | 61 ++
gcc/testsuite/gcc.dg/tree-ssa/pr84512.c | 2 +-
gcc/testsuite/gcc.target/i386/mmx-vals.h | 77 ++
gcc/testsuite/gcc.target/i386/pr82483-1.c | 2 +-
gcc/testsuite/gcc.target/i386/pr82483-2.c | 2 +-
gcc/testsuite/gcc.target/i386/pr89028-1.c | 10 +
gcc/testsuite/gcc.target/i386/pr89028-10.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-11.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-12.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-13.c | 39 +
gcc/testsuite/gcc.target/i386/pr89028-2.c | 11 +
gcc/testsuite/gcc.target/i386/pr89028-3.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-4.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-5.c | 11 +
gcc/testsuite/gcc.target/i386/pr89028-6.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-7.c | 14 +
gcc/testsuite/gcc.target/i386/pr89028-8.c | 12 +
gcc/testsuite/gcc.target/i386/pr89028-9.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-10.c | 42 +
gcc/testsuite/gcc.target/i386/sse2-mmx-11.c | 39 +
gcc/testsuite/gcc.target/i386/sse2-mmx-12.c | 41 +
gcc/testsuite/gcc.target/i386/sse2-mmx-13.c | 40 +
gcc/testsuite/gcc.target/i386/sse2-mmx-14.c | 30 +
gcc/testsuite/gcc.target/i386/sse2-mmx-15.c | 35 +
gcc/testsuite/gcc.target/i386/sse2-mmx-16.c | 39 +
gcc/testsuite/gcc.target/i386/sse2-mmx-17.c | 50 +
gcc/testsuite/gcc.target/i386/sse2-mmx-18.c | 13 +
gcc/testsuite/gcc.target/i386/sse2-mmx-19.c | 11 +
gcc/testsuite/gcc.target/i386/sse2-mmx-2.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-20.c | 11 +
gcc/testsuite/gcc.target/i386/sse2-mmx-21.c | 13 +
gcc/testsuite/gcc.target/i386/sse2-mmx-3.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-4.c | 4 +
gcc/testsuite/gcc.target/i386/sse2-mmx-5.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-6.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-7.c | 12 +
gcc/testsuite/gcc.target/i386/sse2-mmx-8.c | 4 +
gcc/testsuite/gcc.target/i386/sse2-mmx-9.c | 79 ++
.../gcc.target/i386/sse2-mmx-cvtpi2ps.c | 42 +
.../gcc.target/i386/sse2-mmx-cvtps2pi.c | 35 +
.../gcc.target/i386/sse2-mmx-cvttps2pi.c | 35 +
.../gcc.target/i386/sse2-mmx-maskmovq.c | 98 ++
.../gcc.target/i386/sse2-mmx-packssdw.c | 51 +
.../gcc.target/i386/sse2-mmx-packsswb.c | 51 +
.../gcc.target/i386/sse2-mmx-packuswb.c | 51 +
.../gcc.target/i386/sse2-mmx-paddb.c | 47 +
.../gcc.target/i386/sse2-mmx-paddd.c | 47 +
.../gcc.target/i386/sse2-mmx-paddq.c | 42 +
.../gcc.target/i386/sse2-mmx-paddsb.c | 47 +
.../gcc.target/i386/sse2-mmx-paddsw.c | 47 +
.../gcc.target/i386/sse2-mmx-paddusb.c | 47 +
.../gcc.target/i386/sse2-mmx-paddusw.c | 47 +
.../gcc.target/i386/sse2-mmx-paddw.c | 47 +
gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c | 43 +
.../gcc.target/i386/sse2-mmx-pandn.c | 43 +
.../gcc.target/i386/sse2-mmx-pavgb.c | 51 +
.../gcc.target/i386/sse2-mmx-pavgw.c | 51 +
.../gcc.target/i386/sse2-mmx-pcmpeqb.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpeqd.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpeqw.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpgtb.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpgtd.c | 47 +
.../gcc.target/i386/sse2-mmx-pcmpgtw.c | 47 +
.../gcc.target/i386/sse2-mmx-pextrw.c | 58 ++
.../gcc.target/i386/sse2-mmx-pinsrw.c | 60 ++
.../gcc.target/i386/sse2-mmx-pmaddwd.c | 46 +
.../gcc.target/i386/sse2-mmx-pmaxsw.c | 47 +
.../gcc.target/i386/sse2-mmx-pmaxub.c | 47 +
.../gcc.target/i386/sse2-mmx-pminsw.c | 47 +
.../gcc.target/i386/sse2-mmx-pminub.c | 47 +
.../gcc.target/i386/sse2-mmx-pmovmskb.c | 45 +
.../gcc.target/i386/sse2-mmx-pmulhuw.c | 50 +
.../gcc.target/i386/sse2-mmx-pmulhw.c | 52 +
.../gcc.target/i386/sse2-mmx-pmullw.c | 51 +
.../gcc.target/i386/sse2-mmx-pmuludq.c | 46 +
gcc/testsuite/gcc.target/i386/sse2-mmx-por.c | 43 +
.../gcc.target/i386/sse2-mmx-psadbw.c | 57 ++
.../gcc.target/i386/sse2-mmx-pshufw.c | 247 +++++
.../gcc.target/i386/sse2-mmx-pslld.c | 51 +
.../gcc.target/i386/sse2-mmx-pslldi.c | 152 +++
.../gcc.target/i386/sse2-mmx-psllq.c | 46 +
.../gcc.target/i386/sse2-mmx-psllqi.c | 244 +++++
.../gcc.target/i386/sse2-mmx-psllw.c | 51 +
.../gcc.target/i386/sse2-mmx-psllwi.c | 104 ++
.../gcc.target/i386/sse2-mmx-psrad.c | 51 +
.../gcc.target/i386/sse2-mmx-psradi.c | 152 +++
.../gcc.target/i386/sse2-mmx-psraw.c | 51 +
.../gcc.target/i386/sse2-mmx-psrawi.c | 104 ++
.../gcc.target/i386/sse2-mmx-psrld.c | 51 +
.../gcc.target/i386/sse2-mmx-psrldi.c | 152 +++
.../gcc.target/i386/sse2-mmx-psrlq.c | 46 +
.../gcc.target/i386/sse2-mmx-psrlqi.c | 244 +++++
.../gcc.target/i386/sse2-mmx-psrlw.c | 51 +
.../gcc.target/i386/sse2-mmx-psrlwi.c | 104 ++
.../gcc.target/i386/sse2-mmx-psubb.c | 47 +
.../gcc.target/i386/sse2-mmx-psubd.c | 47 +
.../gcc.target/i386/sse2-mmx-psubq.c | 42 +
.../gcc.target/i386/sse2-mmx-psubusb.c | 47 +
.../gcc.target/i386/sse2-mmx-psubusw.c | 47 +
.../gcc.target/i386/sse2-mmx-psubw.c | 47 +
.../gcc.target/i386/sse2-mmx-punpckhbw.c | 52 +
.../gcc.target/i386/sse2-mmx-punpckhdq.c | 46 +
.../gcc.target/i386/sse2-mmx-punpckhwd.c | 48 +
.../gcc.target/i386/sse2-mmx-punpcklbw.c | 52 +
.../gcc.target/i386/sse2-mmx-punpckldq.c | 46 +
.../gcc.target/i386/sse2-mmx-punpcklwd.c | 48 +
gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c | 43 +
gcc/testsuite/gcc.target/i386/sse2-mmx.c | 1 -
gcc/testsuite/gcc.target/i386/ssse3-pabsb.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-pabsd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-pabsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-palignr.c | 6 +-
gcc/testsuite/gcc.target/i386/ssse3-phaddd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phaddsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phaddw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phsubd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phsubsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-phsubw.c | 4 +-
.../gcc.target/i386/ssse3-pmaddubsw.c | 4 +-
.../gcc.target/i386/ssse3-pmulhrsw.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-pshufb.c | 6 +-
gcc/testsuite/gcc.target/i386/ssse3-psignb.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-psignd.c | 4 +-
gcc/testsuite/gcc.target/i386/ssse3-psignw.c | 4 +-
133 files changed, 6675 insertions(+), 450 deletions(-)
create mode 100644 gcc/testsuite/gcc.target/i386/mmx-vals.h
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-1.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/pr89028-9.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-10.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-11.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-12.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-13.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-14.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-15.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-16.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-17.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-18.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-19.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-2.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-20.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-21.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-3.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-4.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-5.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-6.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-7.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-8.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-9.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtpi2ps.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvtps2pi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-cvttps2pi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-maskmovq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packssdw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packsswb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-packuswb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddsw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddusw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-paddw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pand.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pandn.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pavgw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpeqw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pcmpgtw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pextrw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pinsrw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaddwd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxsw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmaxub.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminsw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pminub.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmovmskb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhuw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmulhw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmullw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pmuludq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-por.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psadbw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pshufw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslld.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pslldi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllqi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psllwi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrad.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psradi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psraw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrawi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrld.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrldi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlqi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psrlwi.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusb.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubusw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-psubw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhbw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhdq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckhwd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklbw.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpckldq.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-punpcklwd.c
create mode 100644 gcc/testsuite/gcc.target/i386/sse2-mmx-pxor.c
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
* [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE
2019-02-09 13:24 [PATCH 00/43] V2: Emulate MMX intrinsics " H.J. Lu
@ 2019-02-09 13:24 ` H.J. Lu
0 siblings, 0 replies; 13+ messages in thread
From: H.J. Lu @ 2019-02-09 13:24 UTC (permalink / raw)
To: gcc-patches; +Cc: Uros Bizjak
Emulate MMX vec_dupv2si with SSE. Only SSE register source operand is
allowed.
PR target/89021
* config/i386/mmx.md (*vec_dupv2si): Changed to
define_insn_and_split and also allow TARGET_MMX_WITH_SSE to
support SSE emulation.
* config/i386/sse.md (*vec_dupv4si): Renamed to ...
(vec_dupv4si): This.
---
gcc/config/i386/mmx.md | 27 ++++++++++++++++++++-------
gcc/config/i386/sse.md | 2 +-
2 files changed, 21 insertions(+), 8 deletions(-)
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 73fdef3ba1e..e31c3f5c366 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1420,14 +1420,27 @@
(set_attr "length_immediate" "1")
(set_attr "mode" "DI")])
-(define_insn "*vec_dupv2si"
- [(set (match_operand:V2SI 0 "register_operand" "=y")
+(define_insn_and_split "*vec_dupv2si"
+ [(set (match_operand:V2SI 0 "register_operand" "=y,Yx,Yy")
(vec_duplicate:V2SI
- (match_operand:SI 1 "register_operand" "0")))]
- "TARGET_MMX"
- "punpckldq\t%0, %0"
- [(set_attr "type" "mmxcvt")
- (set_attr "mode" "DI")])
+ (match_operand:SI 1 "register_operand" "0,0,Yy")))]
+ "TARGET_MMX || TARGET_MMX_WITH_SSE"
+ "@
+ punpckldq\t%0, %0
+ #
+ #"
+ "&& reload_completed && TARGET_MMX_WITH_SSE"
+ [(const_int 0)]
+{
+ /* Emulate MMX vec_dupv2si with SSE vec_dupv4si. */
+ rtx op0 = gen_rtx_REG (V4SImode, REGNO (operands[0]));
+ rtx insn = gen_vec_dupv4si (op0, operands[1]);
+ emit_insn (insn);
+ DONE;
+}
+ [(set_attr "mmx_isa" "native,x64_noavx,x64_avx")
+ (set_attr "type" "mmxcvt,ssemov,ssemov")
+ (set_attr "mode" "DI,TI,TI")])
(define_insn "*mmx_concatv2si"
[(set (match_operand:V2SI 0 "register_operand" "=y,y")
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 5dc0930ac1f..7d2c0367911 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -18976,7 +18976,7 @@
(set_attr "prefix" "maybe_evex,maybe_evex,orig")
(set_attr "mode" "V4SF")])
-(define_insn "*vec_dupv4si"
+(define_insn "vec_dupv4si"
[(set (match_operand:V4SI 0 "register_operand" "=v,v,x")
(vec_duplicate:V4SI
(match_operand:SI 1 "nonimmediate_operand" "Yv,m,0")))]
--
2.20.1
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-02-11 13:14 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-11 13:11 [PATCH 12/43] i386: Emulate MMX vec_dupv2si with SSE graham stott via gcc-patches
2019-02-11 13:14 ` H.J. Lu
-- strict thread matches above, loose matches on Subject: below --
2019-02-10 0:19 [PATCH 00/43] V3: Emulate MMX intrinsics " H.J. Lu
2019-02-10 0:20 ` [PATCH 12/43] i386: Emulate MMX vec_dupv2si " H.J. Lu
2019-02-10 10:36 ` Uros Bizjak
2019-02-10 21:01 ` H.J. Lu
2019-02-10 21:46 ` Uros Bizjak
2019-02-10 21:49 ` Uros Bizjak
2019-02-11 1:04 ` H.J. Lu
2019-02-11 7:25 ` Uros Bizjak
2019-02-11 12:27 ` H.J. Lu
2019-02-11 12:51 ` Uros Bizjak
2019-02-11 13:12 ` H.J. Lu
2019-02-09 13:24 [PATCH 00/43] V2: Emulate MMX intrinsics " H.J. Lu
2019-02-09 13:24 ` [PATCH 12/43] i386: Emulate MMX vec_dupv2si " H.J. Lu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).