Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]

public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed

From: Jakub Jelinek <jakub@redhat.com>
To: Richard Biener <rguenther@suse.de>
Cc: Uros Bizjak <ubizjak@gmail.com>, gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] widening_mul, i386, v2: Improve spaceship expansion on x86 [PR103973]
Date: Mon, 17 Jan 2022 13:36:11 +0100	[thread overview]
Message-ID: <20220117123611.GJ2646553@tucnak> (raw)
In-Reply-To: <p6o4rrr-s635-rpr7-n81s-1464on4s85s6@fhfr.qr>

On Mon, Jan 17, 2022 at 01:04:40PM +0100, Richard Biener wrote:
> > I guess it depends, for code that can only be called during the expand pass
> > dropping it should be just fine, for code that can be called also (or only)
> > later I think adding JUMP_LABEL and correct LABEL_NUSES is needed because
> > nothing will fix it up afterwards.
> 
> I'm noting that
> 
> +  /* BB must have no executable statements.  */
> +  gimple_stmt_iterator gsi = gsi_after_labels (bb);
> +  if (phi_nodes (bb))
> +    return false;
> 
> disallows blocks with just a virtual PHI which wouldn't be
> "executable".  Not sure if anything will break when we fix that.

Note I'm only moving the existing function from phiopt to tree-cfg.c
so that I can use it from tree-ssa-math-opts.c.  But all the
cond_only_block_p uses in phiopt and now in tree-ssa-maht-opts.c too
only call it on single_pred_p (bb) basic blocks, so I don't see
what the virtual PHI on those would be good for.

> For code generation we rely on RTL opts to merge compare/scc
> and the subsequent branches on -1/0/1/[-2], correct?  I wonder
> whether that works on other targets as well or whether a
> asm-goto with "optab" UNSPEC text would be more forward looking?

Yes, we rely on some RTL opts, like we rely on it for e.g. the overflow
builtins or various other cases and they seem to be doing their job
well on my testing.  Initially I thought the optab would have 6 arguments,
2 comparison args and 4 labels and I'd emit a switch in the
tree-ssa-math-opts.c (I even wrote such code).  But it didn't work really
well, the switch in some cases wasn't really optimized, and optimization
passes after the widening_mul liked e.g. to propagate the .SPACESHIP
lhs into some but not all the PHI args if there were any etc.
Emitting a function that returns -1/0/1/2 worked better, especially if
the target attempts to emit it as a series of conditional jumps
with small bbs that just set those values.  RTL opts later on will
merge the jumps with further jumps that test the .SPACESHIP result,
or will turn some of the conditional jumps into scc etc.

> The restriction to scalar floats is probably because with
> scalar integers we're doing fine and with vectors we'd need some
> very much different tricks, right?

Sure, for vectors we couldn't use branches etc.  
I'm not really sure how would one write a vector version of the
spaceship actually.  The primary use case is C++ with <=>, but <=>
returns std::*_ordering which is an aggregate and one that isn't
very easy to turn into an integer even, switch doesn't work,
only if (... == std::partial_ordering::equality) ... else if (...
(unless I'm missing something).
But even in C, maybe:
typedef float V __attribute__((vector_size (16)));
typedef int W __attribute__((vector_size (16)));

W
foo (V x, V y)
{
  return (x != y) & (((x < y) & (W) { -1, -1, -1, -1 }) | ((x > y) & (W) { 1, 1, 1, 1 }) | ((W) { 2, 2, 2, 2 } & ~(x < y) & ~(x > y)));
}

but it isn't clear how I'd optimize it at the assembly, where
we currently emit:
        vbroadcastss    .LC2(%rip), %xmm2
        vcmpltps        %xmm0, %xmm1, %xmm3
        vbroadcastss    .LC3(%rip), %xmm4
        vpand   %xmm3, %xmm2, %xmm2
        vcmpltps        %xmm1, %xmm0, %xmm3
        vpor    %xmm3, %xmm2, %xmm2
        vcmpneq_oqps    %xmm1, %xmm0, %xmm3
        vcmpneqps       %xmm1, %xmm0, %xmm0
        vpandn  %xmm4, %xmm3, %xmm3
        vpor    %xmm3, %xmm2, %xmm2
        vpand   %xmm0, %xmm2, %xmm0
        ret
.LC2:
        .long   1
        .align 4
.LC3:
        .long   2

> The middle-end changes look OK, I don't see anything that
> couldn't be changed if other targets run into problems with
> getting similar optimized code.

Thanks.

	Jakub

     prev parent reply	other threads:[~2022-01-17 12:36 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-14 22:56 [PATCH] widening_mul, i386: " Jakub Jelinek
2022-01-15  8:29 ` Uros Bizjak
2022-01-15  9:56   ` Jakub Jelinek
2022-01-15 10:42     ` Uros Bizjak
2022-01-15 11:22       ` [PATCH] widening_mul, i386, v2: " Jakub Jelinek
2022-01-15 16:40         ` Uros Bizjak
2022-01-17 12:04         ` Richard Biener
2022-01-17 12:36           ` Jakub Jelinek [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220117123611.GJ2646553@tucnak \
    --to=jakub@redhat.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=rguenther@suse.de \
    --cc=ubizjak@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).