public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>,
	"H. J. Lu" <hjl.tools@gmail.com>
Subject: Re: [x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.
Date: Sun, 10 Jul 2022 20:05:31 +0200	[thread overview]
Message-ID: <CAFULd4YY=ZG089XjQjrxmDR2Hy_ZD0eoXsjfVFz0xe=JioxhcQ@mail.gmail.com> (raw)
In-Reply-To: <000901d8938d$ead4dc40$c07e94c0$@nextmovesoftware.com>

On Sat, Jul 9, 2022 at 2:17 PM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch upgrades x86_64's scalar-to-vector (STV) pass to more
> aggressively transform 128-bit scalar TImode operations into vector
> V1TImode operations performed on SSE registers.  TImode functionality
> already exists in STV, but only for move operations, this changes
> brings support for logical operations (AND, IOR, XOR, NOT and ANDN)
> and comparisons.
>
> The effect of these changes are conveniently demonstrated by the new
> sse4_1-stv-5.c test case:
>
> __int128 a[16];
> __int128 b[16];
> __int128 c[16];
>
> void foo()
> {
>   for (unsigned int i=0; i<16; i++)
>     a[i] = b[i] & ~c[i];
> }
>
> which when currently compiled on mainline wtih -O2 -msse4 produces:
>
> foo:    xorl    %eax, %eax
> .L2:    movq    c(%rax), %rsi
>         movq    c+8(%rax), %rdi
>         addq    $16, %rax
>         notq    %rsi
>         notq    %rdi
>         andq    b-16(%rax), %rsi
>         andq    b-8(%rax), %rdi
>         movq    %rsi, a-16(%rax)
>         movq    %rdi, a-8(%rax)
>         cmpq    $256, %rax
>         jne     .L2
>         ret
>
> but with this patch now produces:
>
> foo:    xorl    %eax, %eax
> .L2:    movdqa  c(%rax), %xmm0
>         pandn   b(%rax), %xmm0
>         addq    $16, %rax
>         movaps  %xmm0, a-16(%rax)
>         cmpq    $256, %rax
>         jne     .L2
>         ret
>
> Technically, the STV pass is implemented by three C++ classes, a common
> abstract base class "scalar_chain" that contains common functionality,
> and two derived classes: general_scalar_chain (which handles SI and
> DI modes) and timode_scalar_chain (which handles TI modes).  As
> mentioned previously, because only TI mode moves were handled the
> two worker classes behaved significantly differently.  These changes
> bring the functionality of these two classes closer together, which
> is reflected by refactoring more shared code from general_scalar_chain
> to the parent scalar_chain and reusing it from timode.  There still
> remain significant differences (and simplifications) so the existing
> division of classes (as specializations) continues to make sense.

Please note that there are in fact two STV passes, one before combine
and the other after combine. The TImode pass that previously handled
only loads and stores is positioned before combine (there was a reason
for this decision, but I don't remember the details - let's ask
HJ...). However, DImode STV pass transforms much more instructions and
the reason it was positioned after the combine pass was that STV pass
transforms optimized insn stream where forward propagation was already
performed.

What is not clear to me from the above explanation is: is the new
TImode STV pass positioned after the combine pass, and if this is the
case, how the change affects current load/store TImode STV pass. I
must admit, I don't like two separate STV passess, so if TImode is now
similar to DImode, I suggest we abandon STV1 pass and do everything
concerning TImode after the combine pass. HJ, what is your opinion on
this?

Other than the above, the patch LGTM to me.

Uros.

> Obviously, there are more changes to come (shifts and rotates),
> and compute_convert_gain doesn't yet have its final (tuned) form,
> but is already an improvement over the "return 1;" used previously.
>
> This patch has been tested on x86_64-pc-linux-gnu with make boostrap
> and make -k check, both with and without --target_board=unix{-m32}
> with no new failures.  Ok for mainline?
>
>
> 2022-07-09  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         * config/i386/i386-features.h (scalar_chain): Add fields
>         insns_conv, n_sse_to_integer and n_integer_to_sse to this
>         parent class, moved from general_scalar_chain.
>         (scalar_chain::convert_compare): Protected method moved
>         from general_scalar_chain.
>         (mark_dual_mode_def): Make protected, not private virtual.
>         (scalar_chain:convert_op): New private virtual method.
>
>         (general_scalar_chain::general_scalar_chain): Simplify constructor.
>         (general_scalar_chain::~general_scalar_chain): Delete destructor.
>         (general_scalar_chain): Move insns_conv, n_sse_to_integer and
>         n_integer_to_sse fields to parent class, scalar_chain.
>         (general_scalar_chain::mark_dual_mode_def): Delete prototype.
>         (general_scalar_chain::convert_compare): Delete prototype.
>
>         (timode_scalar_chain::compute_convert_gain): Remove simplistic
>         implementation, convert to a method prototype.
>         (timode_scalar_chain::mark_dual_mode_def): Delete prototype.
>         (timode_scalar_chain::convert_op): Prototype new virtual method.
>
>         * config/i386/i386-features.cc (scalar_chain::scalar_chain):
>         Allocate insns_conv and initialize n_sse_to_integer and
>         n_integer_to_sse fields in constructor.
>         (scalar_chain::scalar_chain): Free insns_conv in destructor.
>
>         (general_scalar_chain::general_scalar_chain): Delete
>         constructor, now defined in the class declaration.
>         (general_scalar_chain::~general_scalar_chain): Delete destructor.
>
>         (scalar_chain::mark_dual_mode_def): Renamed from
>         general_scalar_chain::mark_dual_mode_def.
>         (timode_scalar_chain::mark_dual_mode_def): Delete.
>         (scalar_chain::convert_compare): Renamed from
>         general_scalar_chain::convert_compare.
>
>         (timode_scalar_chain::compute_convert_gain): New method to
>         determine the gain from converting a TImode chain to V1TImode.
>         (timode_scalar_chain::convert_op): New method to convert an
>         operand from TImode to V1TImode.
>
>         (timode_scalar_chain::convert_insn) <case REG>: Only PUT_MODE
>         on REG_EQUAL notes that were originally TImode (not CONST_INT).
>         Handle AND, ANDN, XOR, IOR, NOT and COMPARE.
>         (timode_mem_p): Helper predicate to check where operand is
>         memory reference with sufficient alignment for TImode STV.
>         (timode_scalar_to_vector_candidate_p): Use convertible_comparison_p
>         to check whether COMPARE is convertible.  Handle SET_DESTs that
>         that are REG_P or MEM_P and SET_SRCs that are REG, CONST_INT,
>         CONST_WIDE_INT, MEM, AND, ANDN, IOR, XOR or NOT.
>
> gcc/testsuite/ChangeLog
>         * gcc.target/i386/sse4_1-stv-2.c: New test case, pand.
>         * gcc.target/i386/sse4_1-stv-3.c: New test case, por.
>         * gcc.target/i386/sse4_1-stv-4.c: New test case, pxor.
>         * gcc.target/i386/sse4_1-stv-5.c: New test case, pandn.
>         * gcc.target/i386/sse4_1-stv-6.c: New test case, ptest.
>
> Roger
> --
>

  reply	other threads:[~2022-07-10 18:05 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-09 12:17 Roger Sayle
2022-07-10 18:05 ` Uros Bizjak [this message]
2022-07-10 18:36   ` Roger Sayle
2022-07-10 19:15     ` H.J. Lu
2022-07-10 21:38       ` Roger Sayle
2022-07-10 23:56         ` H.J. Lu
2022-07-14  5:31           ` Roger Sayle
2022-07-14  7:10             ` Richard Biener
2022-07-11  6:41     ` Uros Bizjak

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFULd4YY=ZG089XjQjrxmDR2Hy_ZD0eoXsjfVFz0xe=JioxhcQ@mail.gmail.com' \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=hjl.tools@gmail.com \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).