Re: [x86 PATCH] PR target/70321: Split double word equality/inequality after STV.

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Uros Bizjak <ubizjak@gmail.com>
To: Roger Sayle <roger@nextmovesoftware.com>
Cc: "gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [x86 PATCH] PR target/70321: Split double word equality/inequality after STV.
Date: Wed, 13 Apr 2022 20:51:04 +0200	[thread overview]
Message-ID: <CAFULd4a+hSD5TyjwgaH+0f2fupG+0f5zMWoO2z17hZeDDXvcKA@mail.gmail.com> (raw)
In-Reply-To: <001f01d84f15$52729c60$f757d520$@nextmovesoftware.com>

On Wed, Apr 13, 2022 at 11:03 AM Roger Sayle <roger@nextmovesoftware.com> wrote:
>
>
> This patch resolves the last piece of PR target/70321 a code quality
> (P2 regression) affecting mainline.  Currently, for HJ's testcase:
>
> void foo (long long ixi)
> {
>   if (ixi != 14348907)
>     __builtin_abort ();
> }
>
> GCC with -m32 -O2 generates four instructions for the comparison:
>
>         movl    16(%esp), %eax
>         movl    20(%esp), %edx
>         xorl    $14348907, %eax
>         orl     %eax, %edx
>
> but with this patch it now requires only three, making better use of
> x86's addressing modes:
>
>         movl    16(%esp), %eax
>         xorl    $14348907, %eax
>         orl     20(%esp), %eax
>
> The solution is to expand "doubleword" equality/inequality expressions
> using flag setting COMPARE instructions for the early RTL passes, and
> then split them during split1, after STV and before reload.
> Hence on x86_64, we now see/allow things like:
>
> (insn 11 8 12 2 (set (reg:CCZ 17 flags)
>         (compare:CCZ (reg/v:TI 84 [ x ])
>             (reg:TI 96))) "cmpti.c":2:43 30 {*cmpti_doubleword}
>
> This allows the STV pass to decide whether it's preferrable to perform
> this comparison using vector operations, i.e. a pxor/ptest sequence,
> or as scalar integer operations, i.e. a xor/xor/or sequence.  Alas
> this required tweaking of the STV pass to recognize the "new" form of
> these comparisons and split out the pxor operation itself.  To confirm
> this still works as expected I've added a new STV test case:
>
> long long a[1024];
> long long b[1024];
>
> int foo()
> {
>   for (int i=0; i<1024; i++)
>   {
>     long long t = (a[i]<<8) | (b[i]<<24);
>     if (t == 0)
>       return 1;
>   }
>   return 0;
> }
>
> where with -m32 -O2 -msse4.1 the above comparison with zero should look
> like:
>
>         punpcklqdq      %xmm0, %xmm0
>         ptest   %xmm0, %xmm0
>
> Although this patch includes one or two minor tweaks to provide all the
> necessary infrastructure to support conversion of TImode comparisons to
> V1TImode (and SImode comparisons to V4SImode), STV doesn't yet implement
> these transformations, but this is something that can be considered after
> stage 4.  Indeed the new convert_compare functionality is split out
> into a method to simplify its potential reuse by the timode_scalar_chain
> class.
>
>
> This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> and make -k check, both with and without --target_board=unix{-m32},
> with no new failures for -m64, and two (missed-optimization) FAILs on
> -m32 (both around pandn).  The first gcc.target/i386/pr65105-5.c is
> resolved by the patch I posted yesterday, the other i386/pr78794.c is
> easily fixed by tweaking STV's "gain" estimates, but it's preferrable to
> generate better code for both the scalar and vector implementations
> before tweaking these parameters (patches to follow shortly).
>
> Ok for mainline?

Unless there is a really compelling reason that warrants committing
this patch in stage 4, I'd propose to postpone it to next stage 1. The
patch is quite involved and touches a somewhat complex part of the
compiler (STV). Also, looking at the above description, the patch
introduces new functionality that represents only a part of the
complete solution, and as such opens a door for possible regressions,
which are not welcomed in stage 4 at all.

Uros.

>
> 2022-04-13  Roger Sayle  <roger@nextmovesoftware.com>
>
> gcc/ChangeLog
>         PR target/70321
>         * config/i386/i386-expand.cc (ix86_expand_branch): Don't decompose
>         DI mode equality/inequality using XOR here.  Instead generate a
>         COMPARE for doubleword modes (DImode on !TARGET_64BIT or TImode).
>         * config/i386/i386-features.cc (gen_gpr_to_xmm_move_src): Use
>         gen_rtx_SUBREG when NUNITS is 1, i.e. for TImode to V1TImode.
>         (general_scalar_chain::convert_compare): New function to convert
>         scalar equality/inequality comparison into vector operations.
>         (general_scalar_chain::convert_insn) [COMPARE]: Refactor. Call
>         new convert_compare helper method.
>         (convertible_comparion_p): Update to match doubleword COMPARE
>         of two register, memory or integer constant operands.
>         * config/i386/i386-features.h
> (general_scalar_chain::convert_compare):
>         Prototype/declare member function here.
>         * config/i386/i386.md (cstore<mode>4): Change mode to SDWIM, but
>         only allow new doubleword modes for EQ and NE operators.
>         (*cmp<dwi>_doubleword): New define_insn_and_split, to split a
>         doubleword comparison into a pair of XORs followed by an IOR to
>         set the (zero) flags register, optimizing the XORs if possible.
>         * config/i386/sse.md (V_AVX): Include V1TI and V2TI in mode
> iterator;
>         V_AVX is (currently) only used by ptest.
>         (sse4_1 mode attribute): Update to support V1TI and V2TI.
>
> gcc/testsuite/ChangeLog
>         PR target/70321
>         * gcc.target/i386/pr70321.c: New test case.
>         * gcc.target/i386/sse4_1-stv-1.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>

     prev parent reply	other threads:[~2022-04-13 18:51 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-13  9:03 Roger Sayle
2022-04-13 18:51 ` Uros Bizjak [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFULd4a+hSD5TyjwgaH+0f2fupG+0f5zMWoO2z17hZeDDXvcKA@mail.gmail.com \
    --to=ubizjak@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=roger@nextmovesoftware.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).