public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: "Roger Sayle" <roger@nextmovesoftware.com>
To: "'Richard Biener'" <richard.guenther@gmail.com>
Cc: "'GCC Patches'" <gcc-patches@gcc.gnu.org>
Subject: RE: [PATCH] Fold truncations of left shifts in match.pd
Date: Thu, 2 Jun 2022 11:55:08 +0100	[thread overview]
Message-ID: <03e201d8766f$3a597be0$af0c73a0$@nextmovesoftware.com> (raw)
In-Reply-To: <CAFiYyc0azu2aSV-Lbqv86Ap5X+BL3kBpraRXyRbcDnxxgAO35w@mail.gmail.com>


Hi Richard,
> +  /* RTL expansion knows how to expand rotates using shift/or.  */  if
> + (icode == CODE_FOR_nothing
> +      && (code == LROTATE_EXPR || code == RROTATE_EXPR)
> +      && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing
> +      && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing)
> +    icode = (int) optab_handler (lshr_optab, vec_mode);
> 
> but we then get the vector costing wrong. 

The issue is that we currently get the (relative) vector costing wrong.
Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar
code requires two shifts and an ior, so believes its profitable to vectorize
this loop using two vector shifts and an vector ior.  But once match.pd
simplifies the truncate and recognizes the HImode rotate we end up with:

pr98674.c:6:16: note:   ==> examining statement: _6 = _1 r>> 8;
pr98674.c:6:16: note:   vect_is_simple_use: vectype vector(8) short int
pr98674.c:6:16: note:   vect_is_simple_use: operand 8, type of def: constant
pr98674.c:6:16: missed:   op not supported by target.
pr98674.c:8:33: missed:   not vectorized: relevant stmt not supported: _6 = _1 r>> 8;
pr98674.c:6:16: missed:  bad operation or unsupported loop bound.


Clearly, it's a win to vectorize HImode rotates, when the backend can perform
8 (or 16) rotations at a time, but using 3 vector instructions, even when a scalar
rotate can performed in a single instruction.  Fundamentally, vectorization may
still be desirable/profitable even when the backend doesn't provide an optab.

The current situation where the i386's backend provides expanders to lower
rotations (or vcond) into individual instruction sequences, also interferes with
vector costing.   It's the vector cost function that needs to be fixed, not the
generated code made worse (or the backend bloated performing its own
RTL expansion workarounds).

Is it instead ok to mark pr98674.c as XFAIL (a regression)?
The tweak to tree-vect-stmts.cc was based on the assumption that we wished
to continue vectorizing this loop.  Improving scalar code generation really
shouldn't disable vectorization like this.


Cheers,
Roger
--



  reply	other threads:[~2022-06-02 10:55 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-30 12:23 Roger Sayle
2022-06-01 15:00 ` Jeff Law
2022-06-02  8:48 ` Richard Biener
2022-06-02 10:55   ` Roger Sayle [this message]
2022-06-02 11:02     ` Richard Biener

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='03e201d8766f$3a597be0$af0c73a0$@nextmovesoftware.com' \
    --to=roger@nextmovesoftware.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=richard.guenther@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).