public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load
@ 2024-03-18 10:23 rguenth at gcc dot gnu.org
  2024-03-18 10:29 ` [Bug tree-optimization/114375] [11/12/13/14 Regression] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-18 10:23 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

            Bug ID: 114375
           Summary: Wrong vectorization of permuted mask load
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

I've figured we never end up with a load permutation on .MASK_LOAD SLP nodes,
instead we put one on the mask load only (if that ends up permuted).  It took
me a while to produce a testcase we're happy to vectorize and not by accident
produce correct values.  Here's one:

int a[512];
int b[512];
int c[512];
void __attribute__((noipa))
foo(int * __restrict p)
{
  for (int i = 0; i < 64; ++i)
    {
      int tem = 2, tem2 = 2;
      if (a[4*i + 1])
        tem = p[4*i];
      if (a[4*i])
        tem2 = p[4*i + 2];
      b[2*i] = tem2;
      b[2*i+1] = tem;
      if (a[4*i + 2])
        tem = p[4*i + 1];
      if (a[4*i + 3])
        tem2 = p[4*i + 3];
      c[2*i] = tem2;
      c[2*i+1] = tem;
    }
}
int main()
{
  for (int i = 0; i < 512; ++i)
    a[i] = (i >> 1) & 1;
  foo (a);
  if (c[2] != 1 || c[3] != 0)
    __builtin_abort ();
}

miscompiled on x86_64 with -O3 -mavx2.  Note b[] is correct in the end,
but c[] is { 1, 0, 0, 1, 1, 0, 0, 1, ... } instead of { 1, 0, 1, 0, ... }.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
@ 2024-03-18 10:29 ` rguenth at gcc dot gnu.org
  2024-03-18 12:09 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-18 10:29 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
      Known to fail|                            |11.4.0, 12.3.0, 13.2.1,
                   |                            |14.0
   Target Milestone|---                         |11.5
           Keywords|                            |wrong-code
            Summary|Wrong vectorization of      |[11/12/13/14 Regression]
                   |permuted mask load          |Wrong vectorization of
                   |                            |permuted mask load
      Known to work|                            |10.5.0
                 CC|                            |rsandifo at gcc dot gnu.org

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think it's most sensible to reject permuted mask (and also gather) SLP loads
for now, code generation definitely doesn't handle it right now.

A first step of support might be computing the load-permutation and hope
SLP permute optimization will make it identity and only reject it later.
SLP permute optimization migh also be able to push the permute to the mask
and re-materialize it after the masked load as well.

I'll note we don't handle SLP masked loads with gaps either.

I'll see to produce the fix on the SLP discovery side (rejecting it there).

Richard - any other thoughts on this?

It seems GCC 10 is fine, but it can already do SLP of masked loads so I'm
not really sure.  Guess the testcase is not good enough there.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
  2024-03-18 10:29 ` [Bug tree-optimization/114375] [11/12/13/14 Regression] " rguenth at gcc dot gnu.org
@ 2024-03-18 12:09 ` rguenth at gcc dot gnu.org
  2024-03-19  8:02 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-18 12:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
I see that get_load_store_type and get_group_load_store_type return VMAT_*
kinds that do not handle the masked case.  There's some rejection in the
callers for that case but it's also incomplete (VMAT_ELEMENTWISE and
VMAT_STRIDED_SLP but I also think VMAT_CONTIGUOUS_PERMUTE/REVERSE misses
some handling, at least the optab check).

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index e8617439a48..5a4eb136c6d 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -10080,6 +10080,14 @@ vectorizable_load (vec_info *vinfo,
                             "unsupported masked emulated gather.\n");
          return false;
        }
+      else if (memory_access_type == VMAT_ELEMENTWISE
+              || memory_access_type == VMAT_STRIDED_SLP)
+       {
+         if (dump_enabled_p ())
+           dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+                            "unsupported masked strided access.\n");
+         return false;
+       }
     }

   bool costing_p = !vec_stmt;

should plug the elementwise/SLP case.  Interestingly

void __attribute__((noipa))
foo (int s, int * __restrict p)
{
  for (int i = 0; i < 64; ++i)
    {
      int tem = 0;
      if (a[i])
        tem = p[s*i];
      b[i] = tem;
    }
}

uses VMAT_GATHER_SCATTER, I failed to get it produce VMAT_ELEMENTWISE.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12/13/14 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
  2024-03-18 10:29 ` [Bug tree-optimization/114375] [11/12/13/14 Regression] " rguenth at gcc dot gnu.org
  2024-03-18 12:09 ` rguenth at gcc dot gnu.org
@ 2024-03-19  8:02 ` cvs-commit at gcc dot gnu.org
  2024-03-19  8:04 ` [Bug tree-optimization/114375] [11/12/13 " rguenth at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-03-19  8:02 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

--- Comment #3 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a

commit r14-9533-g94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Mar 18 12:39:03 2024 +0100

    tree-optimization/114375 - disallow SLP discovery of permuted mask loads

    We cannot currently handle permutations of mask loads in code generation
    or permute optimization.  But we simply drop any permutation on the
    floor, so the following instead rejects the SLP build rather than
    producing wrong-code.  I've also made sure to reject them in
    vectorizable_load for completeness.

            PR tree-optimization/114375
            * tree-vect-slp.cc (vect_build_slp_tree_2): Compute the
            load permutation for masked loads but reject it when any
            such is necessary.
            * tree-vect-stmts.cc (vectorizable_load): Reject masked
            VMAT_ELEMENTWISE and VMAT_STRIDED_SLP as those are not
            supported.

            * gcc.dg/vect/vect-pr114375.c: New testcase.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12/13 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2024-03-19  8:02 ` cvs-commit at gcc dot gnu.org
@ 2024-03-19  8:04 ` rguenth at gcc dot gnu.org
  2024-05-08 12:26 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-03-19  8:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|[11/12/13/14 Regression]    |[11/12/13 Regression] Wrong
                   |Wrong vectorization of      |vectorization of permuted
                   |permuted mask load          |mask load
     Ever confirmed|0                           |1
             Status|UNCONFIRMED                 |NEW
   Last reconfirmed|                            |2024-03-19
           Priority|P3                          |P2
      Known to work|                            |14.0
      Known to fail|14.0                        |

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed on trunk sofar.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12/13 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2024-03-19  8:04 ` [Bug tree-optimization/114375] [11/12/13 " rguenth at gcc dot gnu.org
@ 2024-05-08 12:26 ` rguenth at gcc dot gnu.org
  2024-05-08 13:38 ` cvs-commit at gcc dot gnu.org
  2024-05-16  9:56 ` [Bug tree-optimization/114375] [11/12 " cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2024-05-08 12:26 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot gnu.org
             Status|NEW                         |ASSIGNED

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12/13 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2024-05-08 12:26 ` rguenth at gcc dot gnu.org
@ 2024-05-08 13:38 ` cvs-commit at gcc dot gnu.org
  2024-05-16  9:56 ` [Bug tree-optimization/114375] [11/12 " cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-08 13:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

--- Comment #5 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:4f2a35a76cca503749c696e7772d2e8eadc77ba5

commit r13-8727-g4f2a35a76cca503749c696e7772d2e8eadc77ba5
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Mar 18 12:39:03 2024 +0100

    tree-optimization/114375 - disallow SLP discovery of permuted mask loads

    We cannot currently handle permutations of mask loads in code generation
    or permute optimization.  But we simply drop any permutation on the
    floor, so the following instead rejects the SLP build rather than
    producing wrong-code.  I've also made sure to reject them in
    vectorizable_load for completeness.

            PR tree-optimization/114375
            * tree-vect-slp.cc (vect_build_slp_tree_2): Compute the
            load permutation for masked loads but reject it when any
            such is necessary.
            * tree-vect-stmts.cc (vectorizable_load): Reject masked
            VMAT_ELEMENTWISE and VMAT_STRIDED_SLP as those are not
            supported.

            * gcc.dg/vect/vect-pr114375.c: New testcase.

    (cherry picked from commit 94c3508c5a14d1948fe3bffa9e16c6f3d9c2836a)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug tree-optimization/114375] [11/12 Regression] Wrong vectorization of permuted mask load
  2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2024-05-08 13:38 ` cvs-commit at gcc dot gnu.org
@ 2024-05-16  9:56 ` cvs-commit at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2024-05-16  9:56 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114375

--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by Richard Biener
<rguenth@gcc.gnu.org>:

https://gcc.gnu.org/g:c1b21855e131bb818aedc953f403812b494917fc

commit r12-10449-gc1b21855e131bb818aedc953f403812b494917fc
Author: Richard Biener <rguenther@suse.de>
Date:   Mon Mar 18 12:39:03 2024 +0100

    tree-optimization/114375 - disallow SLP discovery of permuted mask loads

    We cannot currently handle permutations of mask loads in code generation
    or permute optimization.  But we simply drop any permutation on the
    floor, so the following instead rejects the SLP build rather than
    producing wrong-code.  I've also made sure to reject them in
    vectorizable_load for completeness.

            PR tree-optimization/114375
            * tree-vect-slp.cc (vect_build_slp_tree_2): Compute the
            load permutation for masked loads but reject it when any
            such is necessary.
            * tree-vect-stmts.cc (vectorizable_load): Reject masked
            VMAT_ELEMENTWISE and VMAT_STRIDED_SLP as those are not
            supported.

            * gcc.dg/vect/vect-pr114375.c: New testcase.

    (cherry picked from commit 4f2a35a76cca503749c696e7772d2e8eadc77ba5)

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2024-05-16  9:56 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-18 10:23 [Bug tree-optimization/114375] New: Wrong vectorization of permuted mask load rguenth at gcc dot gnu.org
2024-03-18 10:29 ` [Bug tree-optimization/114375] [11/12/13/14 Regression] " rguenth at gcc dot gnu.org
2024-03-18 12:09 ` rguenth at gcc dot gnu.org
2024-03-19  8:02 ` cvs-commit at gcc dot gnu.org
2024-03-19  8:04 ` [Bug tree-optimization/114375] [11/12/13 " rguenth at gcc dot gnu.org
2024-05-08 12:26 ` rguenth at gcc dot gnu.org
2024-05-08 13:38 ` cvs-commit at gcc dot gnu.org
2024-05-16  9:56 ` [Bug tree-optimization/114375] [11/12 " cvs-commit at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).