Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Hongtao Liu <crazylht@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [x86 PATCH] UNSPEC_PALIGNR optimizations and clean-ups.
Date: Fri, 1 Jul 2022 10:40:03 +0800	[thread overview]
Message-ID: <CAMZc-byU6460h3GrPDVdBVpma47PUhdpCXfSRnpam6d5E9wSrA@mail.gmail.com> (raw)
In-Reply-To: <CAMZc-bzjAP8PDz1=OWsaYPgLQ1qDh=p5kSL7eiXAGbEACT1irg@mail.gmail.com>

On Fri, Jul 1, 2022 at 10:12 AM Hongtao Liu <crazylht@gmail.com> wrote:
>
> On Fri, Jul 1, 2022 at 2:42 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
> >
> >
> > This patch is a follow-up to Hongtao's fix for PR target/105854.  That
> > fix is perfectly correct, but the thing that caught my eye was why is
> > the compiler generating a shift by zero at all.  Digging deeper it
> > turns out that we can easily optimize __builtin_ia32_palignr for
> > alignments of 0 and 64 respectively, which may be simplified to moves
> > from the highpart or lowpart.
> >
> > After adding optimizations to simplify the 64-bit DImode palignr,
> > I started to add the corresponding optimizations for vpalignr (i.e.
> > 128-bit).  The first oddity is that sse.md uses TImode and a special
> > SSESCALARMODE iterator, rather than V1TImode, and indeed the comment
> > above SSESCALARMODE hints that this should be "dropped in favor of
> > VIMAX_AVX2_AVX512BW".  Hence this patch includes the migration of
> > <ssse3_avx2>_palignr<mode> to use VIMAX_AVX2_AVX512BW, basically
> > using V1TImode instead of TImode for 128-bit palignr.
> >
> > But it was only after I'd implemented this clean-up that I stumbled
> > across the strange semantics of 128-bit [v]palignr.  According to
> > https://www.felixcloutier.com/x86/palignr, the semantics are subtly
> > different based upon how the instruction is encoded.  PALIGNR leaves
> > the highpart unmodified, whilst VEX.128 encoded VPALIGNR clears the
> > highpart, and (unless I'm mistaken) it looks like GCC currently uses
> > the exact same RTL/templates for both, treating one as an alternative
> > for the other.
> I think as long as patterns or intrinsics only care about the low
> part, they should be ok.
> But if we want to use default behavior for upper bits, we need to
> restrict them under specific isa(.i.e. vmovq in vec_set<mode>_0).
> Generally, 128-bit sse legacy instructions have different behaviors
> for upper bits from AVX ones, and that's why vzeroupper is introduced
> for sse <-> avx instructions transition.
> >
> > Hence I thought I'd post what I have so far (part optimization and
> > part clean-up), to then ask the x86 experts for their opinions.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-,32},
> > with no new failures.  Ok for mainline?
> >
> >
> > 2022-06-30  Roger Sayle  <roger@nextmovesoftware.com>
> >
> > gcc/ChangeLog
> >         * config/i386/i386-builtin.def (__builtin_ia32_palignr128): Change
> >         CODE_FOR_ssse3_palignrti to CODE_FOR_ssse3_palignrv1ti.
> >         * config/i386/i386-expand.cc (expand_vec_perm_palignr): Use V1TImode
> >         and gen_ssse3_palignv1ti instead of TImode.
> >         * config/i386/sse.md (SSESCALARMODE): Delete.
> >         (define_mode_attr ssse3_avx2): Handle V1TImode instead of TImode.
> >         (<ssse3_avx2>_palignr<mode>): Use VIMAX_AVX2_AVX512BW as a mode
> >         iterator instead of SSESCALARMODE.
> >
> >         (ssse3_palignrdi): Optimize cases when operands[3] is 0 or 64,
> >         using a single move instruction (if required).
> >         (define_split): Likewise split UNSPEC_PALIGNR $0 into a move.
> >         (define_split): Likewise split UNSPEC_PALIGNR $64 into a move.
> >
> > gcc/testsuite/ChangeLog
> >         * gcc.target/i386/ssse3-palignr-2.c: New test case.
> >
> >
> > Thanks in advance,
> > Roger
> > --
> >
>
> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> + (unspec:DI [(match_operand:DI 1 "register_operand")
> +    (match_operand:DI 2 "register_mmxmem_operand")
> +    (const_int 0)]
> +   UNSPEC_PALIGNR))]
> +  ""
> +  [(set (match_dup 0) (match_dup 2))])
> +
> +(define_split
> +  [(set (match_operand:DI 0 "register_operand")
> + (unspec:DI [(match_operand:DI 1 "register_operand")
> +    (match_operand:DI 2 "register_mmxmem_operand")
> +    (const_int 64)]
> +   UNSPEC_PALIGNR))]
> +  ""
> +  [(set (match_dup 0) (match_dup 1))])
> +
> define_split is assumed to be splitted to 2(or more) insns, hence
> pass_combine will only try define_split if the number of merged insns
> is greater than 2.
> For palignr, i think most time there would be only 2 merged
> insns(constant propagation), so better to change them as pre_reload
> splitter.
> (.i.e. (define_insn_and_split "*avx512bw_permvar_truncv16siv16hi_1").
I think you can just merge 2 define_split into define_insn_and_split
"ssse3_palignrdi" by relaxing split condition as

-  "TARGET_SSSE3 && reload_completed
-   && SSE_REGNO_P (REGNO (operands[0]))"
+  "(TARGET_SSSE3 && reload_completed
+   && SSE_REGNO_P (REGNO (operands[0])))
+   || INVAL(operands[3]) == 0
+   || INVAL(operands[3]) == 64"

and you have already handled them by

+  if (operands[3] == const0_rtx)
+    {
+      if (!rtx_equal_p (operands[0], operands[2]))
+ emit_move_insn (operands[0], operands[2]);
+      else
+ emit_note (NOTE_INSN_DELETED);
+      DONE;
+    }
+  else if (INTVAL (operands[3]) == 64)
+    {
+      if (!rtx_equal_p (operands[0], operands[1]))
+ emit_move_insn (operands[0], operands[1]);
+      else
+ emit_note (NOTE_INSN_DELETED);
+      DONE;
+    }
+

>

>
> --
> BR,
> Hongtao



--
BR,
Hongtao

next prev parent reply	other threads:[~2022-07-01  2:40 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-30 18:42 Roger Sayle
2022-07-01  2:12 ` Hongtao Liu
2022-07-01  2:40   ` Hongtao Liu [this message]
2022-07-04 17:48     ` Roger Sayle
2022-07-05  0:30       ` Hongtao Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAMZc-byU6460h3GrPDVdBVpma47PUhdpCXfSRnpam6d5E9wSrA@mail.gmail.com \
    --to=crazylht@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).