PING^2: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: "H.J. Lu" <hjl.tools@gmail.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>,
	Jan Hubicka <hubicka@ucw.cz>, 	Uros Bizjak <ubizjak@gmail.com>
Subject: PING^2: [PATCH] i386: Properly encode xmm16-xmm31/ymm16-ymm31 for vector move
Date: Fri, 31 May 2019 17:43:00 -0000	[thread overview]
Message-ID: <CAMe9rOr9x4-4hSNSAs-y4Yrm0WuGhmKXZ5Nsf=TEh0vt3rPjOw@mail.gmail.com> (raw)
In-Reply-To: <CAMe9rOoh-HSExH7BBcg6vX3D0cjXkxA2tu1j5+JBkRR31Bg0zg@mail.gmail.com>

On Tue, May 21, 2019 at 2:43 PM H.J. Lu <hjl.tools@gmail.com> wrote:
>
> On Fri, Feb 22, 2019 at 8:25 AM H.J. Lu <hongjiu.lu@intel.com> wrote:
> >
> > Hi Jan, Uros,
> >
> > This patch fixes the wrong code bug:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89229
> >
> > Tested on AVX2 and AVX512 with and without --with-arch=native.
> >
> > OK for trunk?
> >
> > Thanks.
> >
> > H.J.
> > --
> > i386 backend has
> >
> > INT_MODE (OI, 32);
> > INT_MODE (XI, 64);
> >
> > So, XI_MODE represents 64 INTEGER bytes = 64 * 8 = 512 bit operation,
> > in case of const_1, all 512 bits set.
> >
> > We can load zeros with narrower instruction, (e.g. 256 bit by inherent
> > zeroing of highpart in case of 128 bit xor), so TImode in this case.
> >
> > Some targets prefer V4SF mode, so they will emit float xorps for zeroing.
> >
> > sse.md has
> >
> > (define_insn "mov<mode>_internal"
> >   [(set (match_operand:VMOVE 0 "nonimmediate_operand"
> >          "=v,v ,v ,m")
> >         (match_operand:VMOVE 1 "nonimmediate_or_sse_const_operand"
> >          " C,BC,vm,v"))]
> > ....
> >       /* There is no evex-encoded vmov* for sizes smaller than 64-bytes
> >          in avx512f, so we need to use workarounds, to access sse registers
> >          16-31, which are evex-only. In avx512vl we don't need workarounds.  */
> >       if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
> >           && (EXT_REX_SSE_REG_P (operands[0])
> >               || EXT_REX_SSE_REG_P (operands[1])))
> >         {
> >           if (memory_operand (operands[0], <MODE>mode))
> >             {
> >               if (<MODE_SIZE> == 32)
> >                 return "vextract<shuffletype>64x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
> >               else if (<MODE_SIZE> == 16)
> >                 return "vextract<shuffletype>32x4\t{$0x0, %g1, %0|%0, %g1, 0x0}";
> >               else
> >                 gcc_unreachable ();
> >             }
> > ...
> >
> > However, since ix86_hard_regno_mode_ok has
> >
> >      /* TODO check for QI/HI scalars.  */
> >       /* AVX512VL allows sse regs16+ for 128/256 bit modes.  */
> >       if (TARGET_AVX512VL
> >           && (mode == OImode
> >               || mode == TImode
> >               || VALID_AVX256_REG_MODE (mode)
> >               || VALID_AVX512VL_128_REG_MODE (mode)))
> >         return true;
> >
> >       /* xmm16-xmm31 are only available for AVX-512.  */
> >       if (EXT_REX_SSE_REGNO_P (regno))
> >         return false;
> >
> >       if (TARGET_AVX512F && <MODE_SIZE> < 64 && !TARGET_AVX512VL
> >           && (EXT_REX_SSE_REG_P (operands[0])
> >               || EXT_REX_SSE_REG_P (operands[1])))
> >
> > is a dead code.
> >
> > Also for
> >
> > long long *p;
> > volatile __m256i yy;
> >
> > void
> > foo (void)
> > {
> >    _mm256_store_epi64 (p, yy);
> > }
> >
> > with AVX512VL, we should generate
> >
> >         vmovdqa         %ymm0, (%rax)
> >
> > not
> >
> >         vmovdqa64       %ymm0, (%rax)
> >
> > All TYPE_SSEMOV vector moves are consolidated to ix86_output_ssemov:
> >
> > 1. If xmm16-xmm31/ymm16-ymm31 registers aren't used, SSE/AVX vector
> > moves will be generated.
> > 2. If xmm16-xmm31/ymm16-ymm31 registers are used:
> >    a. With AVX512VL, AVX512VL vector moves will be generated.
> >    b. Without AVX512VL, xmm16-xmm31/ymm16-ymm31 register to register
> >       move will be done with zmm register move.
> >
> > ext_sse_reg_operand is removed since it is no longer needed.
> >
> > Tested on AVX2 and AVX512 with and without --with-arch=native.
> >
> > gcc/
> >
> >         PR target/89229
> >         PR target/89346
> >         * config/i386/i386-protos.h (ix86_output_ssemov): New prototype.
> >         * config/i386/i386.c (ix86_get_ssemov): New function.
> >         (ix86_output_ssemov): Likewise.
> >         * config/i386/i386.md (*movxi_internal_avx512f): Call
> >         ix86_output_ssemov for TYPE_SSEMOV.
> >         (*movoi_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
> >         Remove ext_sse_reg_operand and TARGET_AVX512VL check.
> >         (*movti_internal): Likewise.
> >         (*movdi_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
> >         Remove ext_sse_reg_operand check.
> >         (*movsi_internal): Likewise.
> >         (*movtf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
> >         (*movdf_internal): Call ix86_output_ssemov for TYPE_SSEMOV.
> >         Remove TARGET_AVX512F, TARGET_PREFER_AVX256, TARGET_AVX512VL
> >         and ext_sse_reg_operand check.
> >         (*movsf_internal_avx): Call ix86_output_ssemov for TYPE_SSEMOV.
> >         Remove TARGET_PREFER_AVX256, TARGET_AVX512VL and
> >         ext_sse_reg_operand check.
> >         * config/i386/mmx.md (MMXMODE:*mov<mode>_internal): Call
> >         ix86_output_ssemov for TYPE_SSEMOV.  Remove ext_sse_reg_operand
> >         check.
> >         * config/i386/sse.md (VMOVE:mov<mode>_internal): Call
> >         ix86_output_ssemov for TYPE_SSEMOV.  Remove TARGET_AVX512VL
> >         check.
> >         * config/i386/predicates.md (ext_sse_reg_operand): Removed.
> >
> > gcc/testsuite/
> >
> >         PR target/89229
> >         PR target/89346
> >         * gcc.target/i386/avx512vl-vmovdqa64-1.c: Updated.
> >         * gcc.target/i386/pr89229-2a.c: New test.
> >         * gcc.target/i386/pr89229-2b.c: Likewise.
> >         * gcc.target/i386/pr89229-2c.c: Likewise.
> >         * gcc.target/i386/pr89229-3a.c: Likewise.
> >         * gcc.target/i386/pr89229-3b.c: Likewise.
> >         * gcc.target/i386/pr89229-3c.c: Likewise.
> >         * gcc.target/i386/pr89229-4a.c: Likewise.
> >         * gcc.target/i386/pr89229-4b.c: Likewise.
> >         * gcc.target/i386/pr89229-4c.c: Likewise.
> >         * gcc.target/i386/pr89229-5a.c: Likewise.
> >         * gcc.target/i386/pr89229-5b.c: Likewise.
> >         * gcc.target/i386/pr89229-5c.c: Likewise.
> >         * gcc.target/i386/pr89229-6a.c: Likewise.
> >         * gcc.target/i386/pr89229-6b.c: Likewise.
> >         * gcc.target/i386/pr89229-6c.c: Likewise.
> >         * gcc.target/i386/pr89229-7a.c: Likewise.
> >         * gcc.target/i386/pr89229-7b.c: Likewise.
> >         * gcc.target/i386/pr89229-7c.c: Likewise.
> > ---
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2019-02/msg01841.html
>
>

PING.

-- 
H.J.

next prev parent reply	other threads:[~2019-05-31 17:39 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-22 16:42 H.J. Lu
2019-05-21 21:44 ` PING^1: " H.J. Lu
2019-05-31 17:43   ` H.J. Lu [this message]
2019-06-18 16:00     ` PING^3: " H.J. Lu
2019-07-08 15:28       ` PING^4: " H.J. Lu
2020-01-27 20:03         ` PING^5: " H.J. Lu
2020-02-07  4:18           ` PING^6: " H.J. Lu
2020-02-13 13:08             ` PING^7: " H.J. Lu
2019-07-22 23:17 ` Jeff Law

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMe9rOr9x4-4hSNSAs-y4Yrm0WuGhmKXZ5Nsf=TEh0vt3rPjOw@mail.gmail.com' \
    --to=hjl.tools@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hubicka@ucw.cz \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).