public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl
@ 2023-11-08 14:27 alexander.grund@tu-dresden.de
  2023-11-09  0:54 ` [Bug target/112443] [12/13/14 Regression] " pinskia at gcc dot gnu.org
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: alexander.grund@tu-dresden.de @ 2023-11-08 14:27 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112443

            Bug ID: 112443
           Summary: Misoptimization of _mm256_blendv_epi8 intrinsic on
                    avx512bw+avx512vl
           Product: gcc
           Version: 12.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexander.grund@tu-dresden.de
  Target Milestone: ---

Created attachment 56533
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56533&action=edit
Reproducer code extracted from actual source

I came around some piece of code in PyTorch using AVX2 intrinsics that is
misoptimized producing wrong results, when compiled for newer CPUS.
In particular I was able to reproduce this with `-mavx512bw -mavx512vl -O2`

We usually compile with `-march=native` which on the Sapphire Rapids system
enables the above AVX512 flags, but so does `-march=cannonlake` and above.

The piece of code in question is a call to `_mm256_blendv_epi8(a, b, mask)`
that seemingly produces inverted semantics, i.e. I have a mask with all bits
set and it returns a and for a mask with all bits unset it returns b.

It is also a bit complicated to reproduce as it seems to require hiding some
details behind a lambda called through `std::function`.
In the attached example a zero and one vector is created once and copied into
the lambda where it is reused for potentially many iterations (removing the
loop also reproduces the issue)
Either of the following actions causes the bug to disappear:
- Removing either of the 2 `-mavx512` flags
- Reducing to `-O1` or lower
- Moving the zero_vec inside the lambda (moving one_vec makes no difference)
- Not calling through std::function (either run the lambda directly or pass
through as a template param instead of std::function)
- `-DREGEN_MASK` to create a new mask through a (superflous)
`_mm256_cmpeq_epi8` against all 1 bits

Reproducing:
g++ -std=c++17 -mavx512bw -mavx512vl -O2 bug.cpp && ./a.out

Expected output (last line, first line shows the inverted semantic):
vec[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1]

Actual output:
vec[255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255, 255,
255]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2023-11-26  1:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-08 14:27 [Bug tree-optimization/112443] New: Misoptimization of _mm256_blendv_epi8 intrinsic on avx512bw+avx512vl alexander.grund@tu-dresden.de
2023-11-09  0:54 ` [Bug target/112443] [12/13/14 Regression] " pinskia at gcc dot gnu.org
2023-11-09  2:54 ` crazylht at gmail dot com
2023-11-09  7:43 ` rguenth at gcc dot gnu.org
2023-11-09  9:11 ` alexander.grund@tu-dresden.de
2023-11-09 12:38 ` alexander.grund@tu-dresden.de
2023-11-10  0:22 ` cvs-commit at gcc dot gnu.org
2023-11-10  0:24 ` cvs-commit at gcc dot gnu.org
2023-11-10  0:25 ` cvs-commit at gcc dot gnu.org
2023-11-10  0:28 ` crazylht at gmail dot com
2023-11-25 17:27 ` mikpelinux at gmail dot com
2023-11-26  1:19 ` sjames at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).