public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/114560] New: Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2
@ 2024-04-02  9:19 meirav.grimberg at redis dot com
  2024-04-02 15:25 ` [Bug target/114560] " jakub at gcc dot gnu.org
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: meirav.grimberg at redis dot com @ 2024-04-02  9:19 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114560

            Bug ID: 114560
           Summary: Compilation error when using
                    _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2
           Product: gcc
           Version: 11.4.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: meirav.grimberg at redis dot com
  Target Milestone: ---

Hello,
I'm using gcc 11.4. The problem also exits in gcc13.

The following code fails to compile:

#include <immintrin.h>

int main(void) {
    unsigned short vec[16];
    for (size_t i =0; i < 16; i++) {
        vec[i] = 2;
    }
    __mmask32 mask = 0xAAAAAAAA;
    __m512i bf16_to_fp32 = _mm512_maskz_expandloadu_epi16(mask, vec);
    return 0;
}
```

g++ test.cpp -o test -mavx512vbmi2; 

In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:81,
                 from test.cpp:1:
/usr/lib/gcc/x86_64-linux-gnu/11/include/avx512vbmi2intrin.h: In function ‘int
main()’:
/usr/lib/gcc/x86_64-linux-gnu/11/include/avx512vbmi2intrin.h:451:1: error:
inlining failed in call to ‘always_inline’ ‘__m512i
_mm512_maskz_expandloadu_epi16(__mmask32, const void*)’: target specific option
mismatch
  451 | _mm512_maskz_expandloadu_epi16 (__mmask32 __A, const void * __B)
      | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test.cpp:9:58: note: called from here
    9 |     __m512i bf16_to_fp32 = _mm512_maskz_expandloadu_epi16(mask, vec);

According to Intel® Intrinsics Guide, only avx512vbmi2 flag is required to use
_mm512_maskz_expandloadu_epi16. 

However, when I add -mavx512bw flag to the compilation command, it works as
expected with no errors.
i notice that indeed, in avx512vbmi2intrin.h this function is located within
the section that requires both flags:

#if !defined(__AVX512VBMI2__) || !defined(__AVX512BW__)
#pragma GCC push_options
#pragma GCC target("avx512vbmi2,avx512bw")
#define __DISABLE_AVX512VBMI2BW__
#endif /* __AVX512VBMI2BW__ */

...

extern __inline __m512i
__attribute__((__gnu_inline__, __always_inline__, __artificial__))
_mm512_maskz_expand_epi16 (__mmask32 __A, __m512i __B)
{
  return (__m512i) __builtin_ia32_expandhi512_maskz ((__v32hi) __B,
                        (__v32hi) _mm512_setzero_si512 (), (__mmask32) __A);
}

...
#ifdef __DISABLE_AVX512VBMI2BW__
#undef __DISABLE_AVX512VBMI2BW__

#pragma GCC pop_options
#endif /* __DISABLE_AVX512VBMI2BW__ */


In addition, i tried to compile this code with clang14 and intel c++ compiler,
using only the -mavx512vbmi2 flag, and both succeeded.

Thank you.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114560] Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2
  2024-04-02  9:19 [Bug c++/114560] New: Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2 meirav.grimberg at redis dot com
@ 2024-04-02 15:25 ` jakub at gcc dot gnu.org
  2024-04-02 16:15 ` meirav.grimberg at redis dot com
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-02 15:25 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114560

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
AVX512BW is needed to be able to use __mmask32/__mmask64, those aren't
supported in AVX512F, which only supports __mmask16.  __mmask8 needs AVX512DQ
(though, guess for that one one can just use KMOV with 16-bit mask).
In GCC 13 and later, -mavx512bw has been added as the implicit requirement of
-mavx512vbmi2
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615906.html
and -mavx512bitalg
https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615905.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114560] Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2
  2024-04-02  9:19 [Bug c++/114560] New: Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2 meirav.grimberg at redis dot com
  2024-04-02 15:25 ` [Bug target/114560] " jakub at gcc dot gnu.org
@ 2024-04-02 16:15 ` meirav.grimberg at redis dot com
  2024-04-02 16:43 ` jakub at gcc dot gnu.org
  2024-04-03  4:50 ` meirav.grimberg at redis dot com
  3 siblings, 0 replies; 5+ messages in thread
From: meirav.grimberg at redis dot com @ 2024-04-02 16:15 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114560

--- Comment #2 from Meirav Grimberg <meirav.grimberg at redis dot com> ---
(In reply to Jakub Jelinek from comment #1)
> AVX512BW is needed to be able to use __mmask32/__mmask64, those aren't
> supported in AVX512F, which only supports __mmask16.  __mmask8 needs
> AVX512DQ (though, guess for that one one can just use KMOV with 16-bit mask).
> In GCC 13 and later, -mavx512bw has been added as the implicit requirement of
> -mavx512vbmi2
> https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615906.html
> and -mavx512bitalg
> https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615905.html

Hi,
thank you for the quick reply.

As i mentioned Intel Intrinsics Guide specifically specifies only the
AVX512_VBMI2 flag without referencing AVX512BW. Could you shed some light on
this?

Moreover, I noticed that both Clang and Intel's compiler allow compilation
without additional flags, suggesting an implementation that aligns with the
hardware requirements. Could you provide insights into why GCC necessitates an
additional flag?


Regarding the term "implicit requirement," could you please clarify its
meaning? I didn't observe any apparent differences when attempting compilation
with GCC 13.

Thank you for your assistance.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114560] Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2
  2024-04-02  9:19 [Bug c++/114560] New: Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2 meirav.grimberg at redis dot com
  2024-04-02 15:25 ` [Bug target/114560] " jakub at gcc dot gnu.org
  2024-04-02 16:15 ` meirav.grimberg at redis dot com
@ 2024-04-02 16:43 ` jakub at gcc dot gnu.org
  2024-04-03  4:50 ` meirav.grimberg at redis dot com
  3 siblings, 0 replies; 5+ messages in thread
From: jakub at gcc dot gnu.org @ 2024-04-02 16:43 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114560

--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Meirav Grimberg from comment #2)
> (In reply to Jakub Jelinek from comment #1)
> > AVX512BW is needed to be able to use __mmask32/__mmask64, those aren't
> > supported in AVX512F, which only supports __mmask16.  __mmask8 needs
> > AVX512DQ (though, guess for that one one can just use KMOV with 16-bit mask).
> > In GCC 13 and later, -mavx512bw has been added as the implicit requirement of
> > -mavx512vbmi2
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615906.html
> > and -mavx512bitalg
> > https://gcc.gnu.org/pipermail/gcc-patches/2023-April/615905.html
> 
> Hi,
> thank you for the quick reply.
> 
> As i mentioned Intel Intrinsics Guide specifically specifies only the
> AVX512_VBMI2 flag without referencing AVX512BW. Could you shed some light on
> this?

That is just a bug in the Intrinsic Guide IMNSHO.

> Moreover, I noticed that both Clang and Intel's compiler allow compilation
> without additional flags, suggesting an implementation that aligns with the
> hardware requirements. Could you provide insights into why GCC necessitates
> an additional flag?

The intrinsic needs to load the 32-bit mask into one of the %k{0,1,2,3,4,5,6,7}
registers.  And without AVX512BW there is just not an instruction for that.
If you'll compile your testcase with clang with -O0 -mavx512vbmi2, you can see
kmovd   %ecx, %k1
instruction, which requires AVX512BW CPUID.  So, supposedly it does what GCC14
and later does, enabling -mavx512bw implicitly when -mavx512vbmi2 is requested.
While the vpexpandw instruction indeed maybe only needs AVX512VBMI2, you can't
implement the intrinsic without AVX512BW.
When I check clang -E -dD -mavx512vbmi2 output on godbolt, I see
#define __AVX512BW__ 1
#define __AVX512VBMI2__ 1
defined there.

> Regarding the term "implicit requirement," could you please clarify its
> meaning? I didn't observe any apparent differences when attempting
> compilation with GCC 13.

Ah, sorry, it is indeed in GCC 14 only.  I was misled by the commit date of
January 2023, but it has been actually pushed into GCC trunk only in April
after GCC 13 branched.
In GCC 11-13 you need to use both -mavx512vbmi2 -mavx512bw to use these
intrinsics.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug target/114560] Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2
  2024-04-02  9:19 [Bug c++/114560] New: Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2 meirav.grimberg at redis dot com
                   ` (2 preceding siblings ...)
  2024-04-02 16:43 ` jakub at gcc dot gnu.org
@ 2024-04-03  4:50 ` meirav.grimberg at redis dot com
  3 siblings, 0 replies; 5+ messages in thread
From: meirav.grimberg at redis dot com @ 2024-04-03  4:50 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114560

Meirav Grimberg <meirav.grimberg at redis dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED

--- Comment #4 from Meirav Grimberg <meirav.grimberg at redis dot com> ---
Thank you!

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-04-03  4:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-02  9:19 [Bug c++/114560] New: Compilation error when using _mm512_maskz_expandloadu_epi16 with only -mavx512vbmi2 meirav.grimberg at redis dot com
2024-04-02 15:25 ` [Bug target/114560] " jakub at gcc dot gnu.org
2024-04-02 16:15 ` meirav.grimberg at redis dot com
2024-04-02 16:43 ` jakub at gcc dot gnu.org
2024-04-03  4:50 ` meirav.grimberg at redis dot com

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).