[Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076

public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed

* [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
@ 2023-11-03  7:20 jakub at gcc dot gnu.org
  2023-11-03  7:20 ` [Bug tree-optimization/112361] " jakub at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-03  7:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

            Bug ID: 112361
           Summary: [14 Regression] avx512f-reduce-op-1.c miscompiled
                    since r14-5076
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jakub at gcc dot gnu.org
  Target Milestone: ---

Since r14-5076-g01c18f58d37865d5f3bbe93e666183b54ec608c7 I see
gcc.target/i386/avx512f-reduce-op-1.c execution failure.
Reduced -O2 -mavx512f:
__attribute__((noipa)) float
foo (void)
{
  float a[16] = { 1, 2, 3, 4, 5, 6, 6, 5, 4, 3, 2, 1, 7, 6, 5, 4 };
  float r3 = 0.0f;
  for (int i = 0; i < 16; i++)
    if ((1 << i) & 0xA6BA)
      r3 = r3 + a[i];
  return r3;
}

int
main ()
{
  if (foo () != 37.0f)
    __builtin_abort ();
}
where before r14-5076 r3 has been correctly computed as 37.0f, but starting
with r14-5076 it is 64.0f, i.e. the masking is ignored and all elements are
added, not just the ones under the mask.
The ifcvt dump has
  _1 = 42682 >> i_31;
  _2 = _1 & 1;
  _24 = _2 != 0;
  _3 = a[i_31];
  _ifc__43 = .COND_ADD (_24, r3_29, _3, r3_29);
which I assume is correct, and vect dump shows computation of the mask but then
instead of using masked addition and at the end of loop reduction (well, can we
vectorize it at all without -ffast-math?) it scalarizes the addition but
doesn't actually conditionalize it.  Note, with -O2 -mavx512f -ffast-math the
.COND_ADD is actually vectorized and so correctly returns 37.0f.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
@ 2023-11-03  7:20 ` jakub at gcc dot gnu.org
  2023-11-03  7:22 ` pinskia at gcc dot gnu.org
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-03  7:20 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Priority|P3                          |P1
                 CC|                            |rdapp at gcc dot gnu.org,
                   |                            |rguenth at gcc dot gnu.org
   Target Milestone|---                         |14.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
  2023-11-03  7:20 ` [Bug tree-optimization/112361] " jakub at gcc dot gnu.org
@ 2023-11-03  7:22 ` pinskia at gcc dot gnu.org
  2023-11-03  7:51 ` rdapp at gcc dot gnu.org
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-03  7:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
     Ever confirmed|0                           |1
   Last reconfirmed|                            |2023-11-03

--- Comment #1 from Andrew Pinski <pinskia at gcc dot gnu.org> ---

Confirmed by me too:
https://gcc.gnu.org/pipermail/gcc-patches/2023-November/635063.html

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
  2023-11-03  7:20 ` [Bug tree-optimization/112361] " jakub at gcc dot gnu.org
  2023-11-03  7:22 ` pinskia at gcc dot gnu.org
@ 2023-11-03  7:51 ` rdapp at gcc dot gnu.org
  2023-11-03  8:13 ` rguenth at gcc dot gnu.org
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-11-03  7:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #2 from Robin Dapp <rdapp at gcc dot gnu.org> ---
I can have a look.  Of course I tested it but neither the compile farm machine
(gcc188) I used nor my local device have AVX512 run capability.  Anywhere else
I can test it?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (2 preceding siblings ...)
  2023-11-03  7:51 ` rdapp at gcc dot gnu.org
@ 2023-11-03  8:13 ` rguenth at gcc dot gnu.org
  2023-11-03  8:14 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-03  8:13 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
You can use SDE (the simulator from Intel), but I suppose just inspecting the
vectorized code should work fine as well.  I suspect we fail to fail
vectorization when we have a masked op but no native masked_fold_left as
we cannot open-code that variant.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (3 preceding siblings ...)
  2023-11-03  8:13 ` rguenth at gcc dot gnu.org
@ 2023-11-03  8:14 ` jakub at gcc dot gnu.org
  2023-11-04  5:24 ` pinskia at gcc dot gnu.org
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: jakub at gcc dot gnu.org @ 2023-11-03  8:14 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
You can just look at the dumps.
Generally, I'd expect that we shouldn't be creating .COND_ADD etc. calls for
conditional reductions for SCALAR_FLOAT_TYPE_P if !flag_associative_math, but
probably
also the fold left reduction code needs to either assert it isn't conditional
or needs to handle it.
E.g. needs_fold_left_reduction_p returns true if it has to do an in-order
reduction.
But guess Richard will know the details much better.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (4 preceding siblings ...)
  2023-11-03  8:14 ` jakub at gcc dot gnu.org
@ 2023-11-04  5:24 ` pinskia at gcc dot gnu.org
  2023-11-06  9:51 ` rdapp at gcc dot gnu.org
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-04  5:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
The patch that caused this one, also causes a bootstrap comparison failure with
--with-arch=skylake-avx512, see PR 112374 .

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (5 preceding siblings ...)
  2023-11-04  5:24 ` pinskia at gcc dot gnu.org
@ 2023-11-06  9:51 ` rdapp at gcc dot gnu.org
  2023-11-06 10:06 ` rguenther at suse dot de
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 11+ messages in thread
From: rdapp at gcc dot gnu.org @ 2023-11-06  9:51 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #6 from Robin Dapp <rdapp at gcc dot gnu.org> ---
So "before" we created

  vect__3.12_55 = MEM <vector(16) float> [(float *)vectp_a.10_53];
  vect__ifc__43.13_57 = VEC_COND_EXPR <mask__24.9_52, vect__3.12_55, { 0.0,
0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }>;
//  _ifc__43 = _24 ? _3 : 0.0;
  stmp__44.14_58 = BIT_FIELD_REF <vect__ifc__43.13_57, 32, 0>;
  stmp__44.14_59 = r3_29 + stmp__44.14_58;
  ...

in vect_expand_fold_left.

Now, as intended, there is no VEC_COND anymore and we just create the bit-field
reduction over the unmasked vector.

We could refrain from creating the COND_OP in the first place as Jakub
mentioned (I guess we know already in if-conv that we shouldn't), re-insert a
VEC_COND or create a COND_OP chain (instead of an OP chain) in
vect_expand_fold_left by passing it the mask (and is_cond_op).
Having several COND_OPs here might make analysis of subsequent passes more
difficult?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (6 preceding siblings ...)
  2023-11-06  9:51 ` rdapp at gcc dot gnu.org
@ 2023-11-06 10:06 ` rguenther at suse dot de
  2023-11-07 21:38 ` cvs-commit at gcc dot gnu.org
  2023-11-08  7:01 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenther at suse dot de @ 2023-11-06 10:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #7 from rguenther at suse dot de <rguenther at suse dot de> ---
On Mon, 6 Nov 2023, rdapp at gcc dot gnu.org wrote:

> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361
> 
> --- Comment #6 from Robin Dapp <rdapp at gcc dot gnu.org> ---
> So "before" we created
> 
>   vect__3.12_55 = MEM <vector(16) float> [(float *)vectp_a.10_53];
>   vect__ifc__43.13_57 = VEC_COND_EXPR <mask__24.9_52, vect__3.12_55, { 0.0,
> 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0 }>;
> //  _ifc__43 = _24 ? _3 : 0.0;
>   stmp__44.14_58 = BIT_FIELD_REF <vect__ifc__43.13_57, 32, 0>;
>   stmp__44.14_59 = r3_29 + stmp__44.14_58;
>   ...
> 
> in vect_expand_fold_left.

Note that this wasn't correct in all cases (wrt signed zeros and
sign-dependent rounding).

> Now, as intended, there is no VEC_COND anymore and we just create the bit-field
> reduction over the unmasked vector.

That's invalid for a COND_OP.  We either have to emulate that COND_OP
by materializing a VEC_COND_EXPR as before when that's semantically
valid, or refrain from vectorizing (I don't think we want to emit
N compare & jump to scalarize the mask effect).

> We could refrain from creating the COND_OP in the first place as Jakub
> mentioned (I guess we know already in if-conv that we shouldn't), re-insert a
> VEC_COND or create a COND_OP chain (instead of an OP chain) in
> vect_expand_fold_left by passing it the mask (and is_cond_op).
> Having several COND_OPs here might make analysis of subsequent passes more
> difficult?

pass in the mask and is_cond_op and create the VEC_COND_EXPR in
vect_expand_fold_left.  But make sure to disallow vectorizing the
invalid cases.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (7 preceding siblings ...)
  2023-11-06 10:06 ` rguenther at suse dot de
@ 2023-11-07 21:38 ` cvs-commit at gcc dot gnu.org
  2023-11-08  7:01 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-07 21:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

--- Comment #8 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Robin Dapp <rdapp@gcc.gnu.org>:

https://gcc.gnu.org/g:fd940d248bfccb6994794152681dc4c693160919

commit r14-5231-gfd940d248bfccb6994794152681dc4c693160919
Author: Robin Dapp <rdapp@ventanamicro.com>
Date:   Mon Nov 6 11:24:37 2023 +0100

    vect/ifcvt: Add vec_cond fallback and check for vector versioning.

    This restricts tree-ifcvt to only create COND_OPs when we versioned the
    loop for vectorization.  Apart from that it re-creates a VEC_COND_EXPR
    in vect_expand_fold_left if we emitted a COND_OP.

    gcc/ChangeLog:

            PR tree-optimization/112361
            PR target/112359
            PR middle-end/112406

            * tree-if-conv.cc (convert_scalar_cond_reduction): Remember if
            loop was versioned and only then create COND_OPs.
            (predicate_scalar_phi): Do not create COND_OP when not
            vectorizing.
            * tree-vect-loop.cc (vect_expand_fold_left): Re-create
            VEC_COND_EXPR.
            (vectorize_fold_left_reduction): Pass mask to
            vect_expand_fold_left.

    gcc/testsuite/ChangeLog:

            * gcc.dg/pr112359.c: New test.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [Bug tree-optimization/112361] [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076
  2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
                   ` (8 preceding siblings ...)
  2023-11-07 21:38 ` cvs-commit at gcc dot gnu.org
@ 2023-11-08  7:01 ` rguenth at gcc dot gnu.org
  9 siblings, 0 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-08  7:01 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112361

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
Fixed.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-11-08  7:01 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-03  7:20 [Bug tree-optimization/112361] New: [14 Regression] avx512f-reduce-op-1.c miscompiled since r14-5076 jakub at gcc dot gnu.org
2023-11-03  7:20 ` [Bug tree-optimization/112361] " jakub at gcc dot gnu.org
2023-11-03  7:22 ` pinskia at gcc dot gnu.org
2023-11-03  7:51 ` rdapp at gcc dot gnu.org
2023-11-03  8:13 ` rguenth at gcc dot gnu.org
2023-11-03  8:14 ` jakub at gcc dot gnu.org
2023-11-04  5:24 ` pinskia at gcc dot gnu.org
2023-11-06  9:51 ` rdapp at gcc dot gnu.org
2023-11-06 10:06 ` rguenther at suse dot de
2023-11-07 21:38 ` cvs-commit at gcc dot gnu.org
2023-11-08  7:01 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).