[Bug tree-optimization/107096] Fully masking vectorization with AVX512 ICEs gcc.dg/vect/vect-over-widen-*.c

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug tree-optimization/107096] Fully masking vectorization with AVX512 ICEs gcc.dg/vect/vect-over-widen-*.c
Date: Mon, 10 Oct 2022 09:49:56 +0000	[thread overview]
Message-ID: <bug-107096-4-xY7iykeImI@http.gcc.gnu.org/bugzilla/> (raw)
In-Reply-To: <bug-107096-4@http.gcc.gnu.org/bugzilla/>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107096

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |ams at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to rsandifo@gcc.gnu.org from comment #2)
> See the comment above rgroup_controls in tree-vectorizer.h for the
> current assumptions around loop predication.  If AVX512 wants something
> different then some extensions will be needed :-)

Coming back to this now.  Crucially

   If only the first three lanes are active, the masks we need are:

     f rgroup: 1 1 | 1 1 | 1 1 | 0 0
     d rgroup:  1  |  1  |  1  |  0

   Here we can use a mask calculated for f's rgroup for d's, but not
   vice versa.

that seems to assume that the space in the mask vector for the "bools"
in the d rgroup is twice as large as in that for the f rgroup.

For AVX512 there's a single bit for each lane, independently on the
width of the actual data.  So instead of 

   Thus for each value of nV, it is enough to provide nV masks, with the
   mask being calculated based on the highest nL (or, equivalently, based
   on the highest nS) required by any rgroup with that nV.  We therefore
   represent the entire collection of masks as a two-level table, with the
   first level being indexed by nV - 1 (since nV == 0 doesn't exist) and
   the second being indexed by the mask index 0 <= i < nV.

we need a set of nV masks for each value of nS, and we can pick the
smallest nV for each nS and generate the corresponding larger nV
masks by a series of shifts.  In fact we can re-use the first vector
(excess bits are OK).  So for the example in tree-vectorizer.h

     float *f;
     double *d;
     for (int i = 0; i < n; ++i)
       {
         f[i * 2 + 0] += 1.0f;
         f[i * 2 + 1] += 2.0f;
         d[i] += 3.0;
       }

we'd need to perform two WHILE_ULT.  For

     float *f;
     double *d;
     for (int i = 0; i < n; ++i)
       {
         f[i] += 1.0f;
         d[i] += 3.0;
       }

we'd compute the mask for the f rgroup with a WHILE_ULT and we'll
have nV_d = 2 * nV_f and the first mask vector from f can be reused
for d (but not the other way around).  The second mask vector for
d can be obtained by kshiftr.

There's no other way to do N bit to two N/2 bit hi/lo (un)packing
(there's a 2x N/2 bit -> N bit operation, for whatever reason).
There's also no way to transform the d rgroup mask into the
f rgroup mask for the first example aka duplicate bits in place,
{ b0, b1, b2, ... bN } -> { b0, b0, b1, b1, b2, b2, ... bN, bN },
nor the reverse.

So in reality it seems we need a set of mask vectors for the full
set of nS * nV combinations with AVX512.  Doing fully masking with
AVX2 style vectors would work with the existing rgroup control scheme.

Currently the "key" to the AVX512 behavior is the use of scalar modes
for the mask vectors but then that's also what GCN uses.  Do GCN
mask bits really map to bytes to allow the currently used rgroup scheme?

next prev parent reply	other threads:[~2022-10-10  9:49 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-30 11:41 [Bug tree-optimization/107096] New: " rguenth at gcc dot gnu.org
2022-09-30 12:30 ` [Bug tree-optimization/107096] " rguenth at gcc dot gnu.org
2022-09-30 13:07 ` rsandifo at gcc dot gnu.org
2022-10-10  9:49 ` rguenth at gcc dot gnu.org [this message]
2022-10-10 11:01 ` ams at gcc dot gnu.org
2022-10-10 11:33 ` rguenth at gcc dot gnu.org
2022-10-10 12:38 ` rsandifo at gcc dot gnu.org
2022-10-10 12:55 ` rguenther at suse dot de
2022-10-10 14:20 ` rsandifo at gcc dot gnu.org
2022-10-11 12:43 ` rguenth at gcc dot gnu.org
2023-02-15  7:08 ` crazylht at gmail dot com
2023-02-15  7:09 ` crazylht at gmail dot com
2023-02-15  7:26 ` rguenther at suse dot de
2023-02-16  2:09 ` crazylht at gmail dot com
2023-06-14 12:55 ` rguenth at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-107096-4-xY7iykeImI@http.gcc.gnu.org/bugzilla/ \
    --to=gcc-bugzilla@gcc.gnu.org \
    --cc=gcc-bugs@gcc.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).