public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
@ 2023-11-08 14:27 alexander.grund@tu-dresden.de
  2023-11-09  0:54 ` [Bug target/112443] [12/13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: alexander.grund@tu-dresden.de @ 2023-11-08 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

            Bug ID: 112443
           Summary: Misoptimization of _mm256_blendv_epi8 intrinsic on
                    avx512bw+avx512vl
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.grund@tu-dresden.de
  Target Milestone: ---

Created attachment 56533
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56533&action=edit
Reproducer code extracted from actual source

I came around some piece of code in PyTorch using AVX2 intrinsics that is
misoptimized producing wrong results, when compiled for newer CPUS.
In particular I was able to reproduce this with `-mavx512bw -mavx512vl -O2`

We usually compile with `-march=native` which on the Sapphire Rapids system
enables the above AVX512 flags, but so does `-march=cannonlake` and above.

The piece of code in question is a call to `_mm256_blendv_epi8(a, b, mask)`
that seemingly produces inverted semantics, i.e. I have a mask with all bits
set and it returns a and for a mask with all bits unset it returns b.

It is also a bit complicated to reproduce as it seems to require hiding some
details behind a lambda called through `std::function`.
In the attached example a zero and one vector is created once and copied into
the lambda where it is reused for potentially many iterations (removing the
loop also reproduces the issue)
Either of the following actions causes the bug to disappear:
- Removing either of the 2 `-mavx512` flags
- Reducing to `-O1` or lower
- Moving the zero_vec inside the lambda (moving one_vec makes no difference)
- Not calling through std::function (either run the lambda directly or pass
through as a template param instead of std::function)
- `-DREGEN_MASK` to create a new mask through a (superflous)
`_mm256_cmpeq_epi8` against all 1 bits

Reproducing:
g++ -std=c++17 -mavx512bw -mavx512vl -O2 bug.cpp && ./a.out

Expected output (last line, first line shows the inverted semantic):
vec[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1]

Actual output:
vec[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
@ 2023-11-09  0:54 ` pinskia at gcc dot gnu.org
  2023-11-09  2:54 ` crazylht at gmail dot com
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-11-09  0:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|---                         |12.4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
  2023-11-09  0:54 ` [Bug target/112443] [12/13/14 Regression] " pinskia at gcc dot gnu.org
@ 2023-11-09  2:54 ` crazylht at gmail dot com
  2023-11-09  7:43 ` rguenth at gcc dot gnu.org
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-11-09  2:54 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #1 from Hongtao.liu <crazylht at gmail dot com> ---
The below can fix that, there's typo for 2 splitters.

@@ -17082,7 +17082,7 @@ (define_insn_and_split "*avx2_pcmp<mode>3_4"
              (match_dup 4))]
              UNSPEC_BLENDV))]
 {
-  if (INTVAL (operands[5]) == 1)
+  if (INTVAL (operands[5]) == 5)
     std::swap (operands[1], operands[2]);
   operands[3] = gen_lowpart (<MODE>mode, operands[3]);
 })
@@ -17112,7 +17112,7 @@ (define_insn_and_split "*avx2_pcmp<mode>3_5"
              (match_dup 4))]
              UNSPEC_BLENDV))]
 {
-  if (INTVAL (operands[5]) == 1)
+  if (INTVAL (operands[5]) == 5)
     std::swap (operands[1], operands[2]);
 })

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
  2023-11-09  0:54 ` [Bug target/112443] [12/13/14 Regression] " pinskia at gcc dot gnu.org
  2023-11-09  2:54 ` crazylht at gmail dot com
@ 2023-11-09  7:43 ` rguenth at gcc dot gnu.org
  2023-11-09  9:11 ` alexander.grund@tu-dresden.de
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: rguenth at gcc dot gnu.org @ 2023-11-09  7:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |NEW
           Priority|P3                          |P2
             Target|                            |x86_64-*-*
   Last reconfirmed|                            |2023-11-09
     Ever confirmed|0                           |1

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (2 preceding siblings ...)
  2023-11-09  7:43 ` rguenth at gcc dot gnu.org
@ 2023-11-09  9:11 ` alexander.grund@tu-dresden.de
  2023-11-09 12:38 ` alexander.grund@tu-dresden.de
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: alexander.grund@tu-dresden.de @ 2023-11-09  9:11 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #2 from Alexander Grund <alexander.grund@tu-dresden.de> ---
I can confirm that the suggested patch can be applied to 12.2.0 and fixes the
issue I observed

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (3 preceding siblings ...)
  2023-11-09  9:11 ` alexander.grund@tu-dresden.de
@ 2023-11-09 12:38 ` alexander.grund@tu-dresden.de
  2023-11-10  0:22 ` cvs-commit at gcc dot gnu.org
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: alexander.grund@tu-dresden.de @ 2023-11-09 12:38 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #3 from Alexander Grund <alexander.grund@tu-dresden.de> ---
> I can confirm that the suggested patch can be applied to 12.2.0 and fixes
> the issue I observed

Also tested 12.1, 12.3, 13.1, 13.2 with this patch and it works (as expected)
too

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (4 preceding siblings ...)
  2023-11-09 12:38 ` alexander.grund@tu-dresden.de
@ 2023-11-10  0:22 ` cvs-commit at gcc dot gnu.org
  2023-11-10  0:24 ` cvs-commit at gcc dot gnu.org
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-10  0:22 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #4 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:9a0cc04b9c9b02426762892b88efc5c44ba546bd

commit r14-5305-g9a0cc04b9c9b02426762892b88efc5c44ba546bd
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Nov 9 13:20:05 2023 +0800

    Fix wrong code due to vec_merge + pcmp to blendvb splitter.

    gcc/ChangeLog:

            PR target/112443
            * config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition
            from LT to GT since there's not in the pattern.
            (*avx2_pcmp<mode>3_5): Ditto.

    gcc/testsuite/ChangeLog:

            * g++.target/i386/pr112443.C: New test.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (5 preceding siblings ...)
  2023-11-10  0:22 ` cvs-commit at gcc dot gnu.org
@ 2023-11-10  0:24 ` cvs-commit at gcc dot gnu.org
  2023-11-10  0:25 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-10  0:24 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-13 branch has been updated by hongtao Liu
<liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:5437160f4e205f991c829661f221448a97bef2d3

commit r13-8035-g5437160f4e205f991c829661f221448a97bef2d3
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Nov 9 13:20:05 2023 +0800

    Fix wrong code due to vec_merge + pcmp to blendvb splitter.

    gcc/ChangeLog:

            PR target/112443
            * config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition
            from LT to GT since there's not in the pattern.
            (*avx2_pcmp<mode>3_5): Ditto.

    gcc/testsuite/ChangeLog:

            * g++.target/i386/pr112443.C: New test.

    (cherry picked from commit 9a0cc04b9c9b02426762892b88efc5c44ba546bd)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (6 preceding siblings ...)
  2023-11-10  0:24 ` cvs-commit at gcc dot gnu.org
@ 2023-11-10  0:25 ` cvs-commit at gcc dot gnu.org
  2023-11-10  0:28 ` crazylht at gmail dot com
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-11-10  0:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #6 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The releases/gcc-12 branch has been updated by hongtao Liu
<liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:1808ebfd8164cc90629b7308af3ef2b6fa965453

commit r12-9967-g1808ebfd8164cc90629b7308af3ef2b6fa965453
Author: liuhongt <hongtao.liu@intel.com>
Date:   Thu Nov 9 13:20:05 2023 +0800

    Fix wrong code due to vec_merge + pcmp to blendvb splitter.

    gcc/ChangeLog:

            PR target/112443
            * config/i386/sse.md (*avx2_pcmp<mode>3_4): Fix swap condition
            from LT to GT since there's not in the pattern.
            (*avx2_pcmp<mode>3_5): Ditto.

    gcc/testsuite/ChangeLog:

            * g++.target/i386/pr112443.C: New test.

    (cherry picked from commit 9a0cc04b9c9b02426762892b88efc5c44ba546bd)

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (7 preceding siblings ...)
  2023-11-10  0:25 ` cvs-commit at gcc dot gnu.org
@ 2023-11-10  0:28 ` crazylht at gmail dot com
  2023-11-25 17:27 ` mikpelinux at gmail dot com
  2023-11-26  1:19 ` sjames at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: crazylht at gmail dot com @ 2023-11-10  0:28 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #7 from Hongtao.liu <crazylht at gmail dot com> ---
Should be Fixed in GCC14/GCC13/GCC12

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (8 preceding siblings ...)
  2023-11-10  0:28 ` crazylht at gmail dot com
@ 2023-11-25 17:27 ` mikpelinux at gmail dot com
  2023-11-26  1:19 ` sjames at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: mikpelinux at gmail dot com @ 2023-11-25 17:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

--- Comment #8 from Mikael Pettersson <mikpelinux at gmail dot com> ---
Can this be closed now?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [Bug target/112443] [12/13/14 Regression] Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
  2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
                   ` (9 preceding siblings ...)
  2023-11-25 17:27 ` mikpelinux at gmail dot com
@ 2023-11-26  1:19 ` sjames at gcc dot gnu.org
  10 siblings, 0 replies; 12+ messages in thread
From: sjames at gcc dot gnu.org @ 2023-11-26  1:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

Sam James <sjames at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sjames at gcc dot gnu.org
           Assignee|unassigned at gcc dot gnu.org      |crazylht at gmail dot com
             Status|NEW                         |RESOLVED
         Resolution|---                         |FIXED

--- Comment #9 from Sam James <sjames at gcc dot gnu.org> ---
Fixed for 12.4/13.3/14.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-26  1:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
2023-11-09  0:54 ` [Bug target/112443] [12/13/14 Regression] " pinskia at gcc dot gnu.org
2023-11-09  2:54 ` crazylht at gmail dot com
2023-11-09  7:43 ` rguenth at gcc dot gnu.org
2023-11-09  9:11 ` alexander.grund@tu-dresden.de
2023-11-09 12:38 ` alexander.grund@tu-dresden.de
2023-11-10  0:22 ` cvs-commit at gcc dot gnu.org
2023-11-10  0:24 ` cvs-commit at gcc dot gnu.org
2023-11-10  0:25 ` cvs-commit at gcc dot gnu.org
2023-11-10  0:28 ` crazylht at gmail dot com
2023-11-25 17:27 ` mikpelinux at gmail dot com
2023-11-26  1:19 ` sjames at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).