[Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize
@ 2022-04-07 16:59 acoplan at gcc dot gnu.org
  2022-04-08  8:28 ` [Bug target/105197] " rguenth at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: acoplan at gcc dot gnu.org @ 2022-04-07 16:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

            Bug ID: 105197
           Summary: [12 Regression] SVE: wrong code with -O
                    -ftree-vectorize
           Product: gcc
           Version: 12.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: acoplan at gcc dot gnu.org
  Target Milestone: ---

The following C code:

unsigned char arr_7[9][3];
unsigned char (*main_arr_7)[3] = arr_7;
int main() {
  char arr_2[9];
  int arr_6[9];
  int x;
  unsigned i;
  for (i = 0; i < 9; ++i) {
    arr_2[i] = 21;
    arr_6[i] = 6;
  }
  for (i = arr_2[8] - 21; i < 2; i++)
    x = arr_6[i] ? (main_arr_7[8][i] ? main_arr_7[8][i] : 8) : (char)arr_6[i];
  if (x != 8)
    __builtin_abort ();
}

appears to be miscompiled with -march=armv8.2-a+sve -O -ftree-vectorize. The
issue doesn't seem to occur with GCC 11.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
@ 2022-04-08  8:28 ` rguenth at gcc dot gnu.org
  2022-04-08  8:41 ` rguenth at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-08  8:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.0
      Known to work|                            |11.2.1

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
  2022-04-08  8:28 ` [Bug target/105197] " rguenth at gcc dot gnu.org
@ 2022-04-08  8:41 ` rguenth at gcc dot gnu.org
  2022-04-10  7:04 ` tnfchris at gcc dot gnu.org
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-08  8:41 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1

--- Comment #1 from Richard Biener <rguenth at gcc dot gnu.org> ---
the GIMPLE doesn't look wrong.  We're using an EXTRACT_LAST, so that might be
the special thing.  Vectorization of the first loop is probably not necessary
to trigger the failure.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
  2022-04-08  8:28 ` [Bug target/105197] " rguenth at gcc dot gnu.org
  2022-04-08  8:41 ` rguenth at gcc dot gnu.org
@ 2022-04-10  7:04 ` tnfchris at gcc dot gnu.org
  2022-04-11  6:59 ` tnfchris at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-04-10  7:04 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |tnfchris at gcc dot gnu.org
                 CC|                            |tnfchris at gcc dot gnu.org
   Last reconfirmed|                            |2022-04-10
     Ever confirmed|0                           |1

--- Comment #2 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #1)
> the GIMPLE doesn't look wrong.  We're using an EXTRACT_LAST, so that might
> be the special thing.  Vectorization of the first loop is probably not
> necessary to trigger the failure.

Hmmm looks like the GIMPLE is wrong, the masks it combines creates a
contradiction

At GIMPLE we have

  mask__44.14_114 = vect__4.13_112 != { 0, ... };
  mask__26.22_128 = vect__6.17_121 == { 0, ... };
  mask_patt_65.24_130 = mask__44.14_114 & mask__26.22_128;
  mask__43.26_135 = vect__4.13_112 == { 0, ... };
  mask__25.18_123 = vect__6.17_121 != { 0, ... };
  _137 = mask__43.26_135 & loop_mask_111;
  _163 = mask_patt_65.24_130 & _137;

where _163 demands vect__4.13_112 != 0 && vect__4.13_112 == 0

  _163 should have been _163 = mask_patt_65.24_130 & loop_mask_111;

So it looks like the wrong loop masks are combined.

Mine.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2022-04-10  7:04 ` tnfchris at gcc dot gnu.org
@ 2022-04-11  6:59 ` tnfchris at gcc dot gnu.org
  2022-04-11  8:05 ` rguenth at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-04-11  6:59 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org

--- Comment #3 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Looks like this started with

commit d846f225c25c5885250c303c8d118caa08c447ab
Author: Richard Biener <rguenther@suse.de>
Date:   Tue May 4 15:51:20 2021 +0200

    tree-optimization/79333 - fold stmts following SSA edges in VN

    This makes sure to follow SSA edges when folding eliminated stmts.
    This reaps the same benefit as forwprop folding all stmts, not
    waiting for one to produce copysign in the new testcase.

    2021-05-04  Richard Biener  <rguenther@suse.de>

            PR tree-optimization/79333
            * tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt):
            Fold stmt following SSA edges.

            * gcc.dg/tree-ssa/ssa-fre-94.c: New testcase.
            * gcc.dg/graphite/fuse-1.c: Adjust.
            * gcc.dg/pr43864-4.c: Likewise.

and what's happening is that the vectorize relies on a mask A and it's inverse
~A be represented by a negation of the mask. Ifcvt used to enforce this but
with the change it now pushes the ~ into the mask operation if it can.

So previously we would generate

  _26 = ~_25;
  _43 = ~_44;

out of ifcvt and how we generate

  _26 = _6 == 0;
  _43 = _4 == 0;

and force the creation of two new extra mask as it de-optimizes the vectorizers
ability to immediately see a mask invert.  

We however still detect that those two are inverses of

  _25 = _6 != 0;
  _44 = _4 != 0;

and when generating the second VEC_COND for the operation we end up flipping
the arguments somehow

------>vectorizing statement: _ifc__41 = _43 ? 0 : _ifc__40;
created new init_stmt: vect_cst__136 = { 0, ... }
add new stmt: _137 = mask__43.26_135 & loop_mask_111
note:  add new stmt: vect__ifc__41.27_138 = VEC_COND_EXPR <_137,
vect__ifc__40.25_133, vect_cst__136>;

so we've vectorized_ifc__41 = _43 ?_ifc__40 : 0; instead without negating _137
which is where the contradiction gets introduced. I'll fix that bug, but the
question remains whether we want this simplification to now happen in ifcvt for
masks.

It makes the vectorizer generate a lot more intermediate masks that are cleaned
up by the RPO pass I added at the end but we also lose the fact that they are
simple inverses, i.e. at -O3 on these integer masks we could have just
generated a NOT.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2022-04-11  6:59 ` tnfchris at gcc dot gnu.org
@ 2022-04-11  8:05 ` rguenth at gcc dot gnu.org
  2022-04-11 14:09 ` cvs-commit at gcc dot gnu.org
  2022-04-11 14:25 ` tnfchris at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-04-11  8:05 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
The following improves the output from if-conversion by simplifying ~cond ? a :
b
to cond ? b : a, possibly reducing the number of conds.  Ideally if-conversion
would track predicates in a more concious way of course.

diff --git a/gcc/tree-if-conv.cc b/gcc/tree-if-conv.cc
index 7495ed653c0..dd3d5255a38 100644
--- a/gcc/tree-if-conv.cc
+++ b/gcc/tree-if-conv.cc
@@ -472,6 +472,14 @@ fold_build_cond_expr (tree type, tree cond, tree rhs, tree
lhs)
          && (integer_zerop (op1)))
        cond = op0;
     }
+  gassign *ass;
+  if (TREE_CODE (cond) == SSA_NAME
+      && (ass = dyn_cast <gassign *> (SSA_NAME_DEF_STMT (cond)))
+      && gimple_assign_rhs_code (ass) == BIT_NOT_EXPR)
+    {
+      cond = gimple_assign_rhs1 (ass);
+      std::swap (rhs, lhs);
+    }
   cond_expr = fold_ternary (COND_EXPR, type, cond, rhs, lhs);

   if (cond_expr == NULL_TREE)

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2022-04-11  8:05 ` rguenth at gcc dot gnu.org
@ 2022-04-11 14:09 ` cvs-commit at gcc dot gnu.org
  2022-04-11 14:25 ` tnfchris at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2022-04-11 14:09 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Tamar Christina <tnfchris@gcc.gnu.org>:

https://gcc.gnu.org/g:78c718490bc2843d4dadcef8a0ae14aed1d15a32

commit r12-8080-g78c718490bc2843d4dadcef8a0ae14aed1d15a32
Author: Tamar Christina <tamar.christina@arm.com>
Date:   Mon Apr 11 15:09:05 2022 +0100

    middle-end: Prevent the use of the cond inversion detection code when both
conditions are external. [PR105197]

    Previously ifcvt used to enforce that a mask A and the inverse of said mask
be
    represented as ~A. So for the masks

      _25 = _6 != 0;
      _44 = _4 != 0;

    ifcvt would produce for an operation requiring the inverse of said mask

      _26 = ~_25;
      _43 = ~_44;

    but now that VN is applied to the entire function body we get a
simplification
    on the mask and produce:

      _26 = _6 == 0;
      _43 = _4 == 0;

    This in itself is not a problem semantically speaking (though it does
create
    more masks that need to be tracked) but when vectorizing the masked
conditional
    we would still detect _26 and _43 to be inverses of _25 and _44 and mark
them
    as requiring their operands be swapped.

    When vectorizing we swap the operands but don't find the BIT_NOT_EXPR to
remove
    and so we leave the condition as is which produces invalid code:

    ------>vectorizing statement: _ifc__41 = _43 ? 0 : _ifc__40;
    created new init_stmt: vect_cst__136 = { 0, ... }
    add new stmt: _137 = mask__43.26_135 & loop_mask_111
    note:  add new stmt: vect__ifc__41.27_138 = VEC_COND_EXPR <_137,
vect__ifc__40.25_133, vect_cst__136>;

    This fixes disabling the inversion detection code when the loop isn't
masked
    since both conditional would be external.  We'd then not use the new
cond_code
    and would incorrectly still swap the operands.

    The resulting code is also better than GCC-11 with most operations now
    predicated on the loop mask rather than a ptrue.

    gcc/ChangeLog:

            PR target/105197
            * tree-vect-stmts.cc (vectorizable_condition): Prevent cond swap
when
            not masked.

    gcc/testsuite/ChangeLog:

            PR target/105197
            * gcc.target/aarch64/sve/pr105197-1.c: New test.
            * gcc.target/aarch64/sve/pr105197-2.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/105197] [12 Regression] SVE: wrong code with -O -ftree-vectorize
  2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2022-04-11 14:09 ` cvs-commit at gcc dot gnu.org
@ 2022-04-11 14:25 ` tnfchris at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: tnfchris at gcc dot gnu.org @ 2022-04-11 14:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105197

Tamar Christina <tnfchris at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|---                         |FIXED

--- Comment #6 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
Fixed and no regression from codegen in GCC-11.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-04-11 14:25 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-07 16:59 [Bug target/105197] New: [12 Regression] SVE: wrong code with -O -ftree-vectorize acoplan at gcc dot gnu.org
2022-04-08  8:28 ` [Bug target/105197] " rguenth at gcc dot gnu.org
2022-04-08  8:41 ` rguenth at gcc dot gnu.org
2022-04-10  7:04 ` tnfchris at gcc dot gnu.org
2022-04-11  6:59 ` tnfchris at gcc dot gnu.org
2022-04-11  8:05 ` rguenth at gcc dot gnu.org
2022-04-11 14:09 ` cvs-commit at gcc dot gnu.org
2022-04-11 14:25 ` tnfchris at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).