* [Bug c++/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
@ 2020-10-30 17:57 ` jakub at gcc dot gnu.org
2020-10-30 17:57 ` jakub at gcc dot gnu.org
` (5 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-30 17:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
Jakub Jelinek <jakub at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Ever confirmed|0 |1
CC| |crazylht at gmail dot com,
| |hjl.tools at gmail dot com,
| |jakub at gcc dot gnu.org
Last reconfirmed| |2020-10-30
Status|UNCONFIRMED |NEW
--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The problem is that in the RTL representation there is nothing that would tell
cse, forward propagation or combiner etc. not to optimize the
(insn 7 6 8 2 (set (reg:QI 89)
(const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77
{*movqi_internal}
(nil))
(insn 8 7 9 2 (set (reg:V8SI 87)
(vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8])
(reg:V8SI 88)
(reg:QI 89))) "include/avx512vlintrin.h":865:20 1423
{avx512vl_loadv8si_mask}
(nil))
into:
(insn 8 7 9 2 (set (reg:V8SI 87)
(vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8])
(reg:V8SI 88 [ tmp ])
(const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402
{avx2_pblenddv8si}
(expr_list:REG_DEAD (reg:QI 89)
(expr_list:REG_DEAD (reg:V8SI 88 [ tmp ])
(expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ])
(nil)))))
Guess we'd need to use some UNSPEC for the masked loads and have patterns for
combine to optimize those that have -1 masks into normal loads, or disable the
blend patterns with MEM operands for avx512f+ (i.e. force those into
registers).
Because the RTL representation really matches more the blend behavior than the
avx512 masking, where exceptions from the masked off elts just don't show up.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug c++/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
2020-10-30 17:57 ` [Bug c++/97642] " jakub at gcc dot gnu.org
@ 2020-10-30 17:57 ` jakub at gcc dot gnu.org
2020-11-03 7:46 ` [Bug target/97642] " crazylht at gmail dot com
` (4 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-30 17:57 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The testcase is:
#include <assert.h>
#include <immintrin.h>
#include <stdint.h>
#include <sys/mman.h>
#define N 5
// Faults with GCC because usage of vpblendd
__m256i __attribute__((noinline)) mask_load(uint32_t * arr) {
__m256i tmp;
return _mm256_mask_loadu_epi32(tmp, (1 << N) - 1, arr);
}
// Faults
__m256i __attribute__((noinline)) blend_load_asm(uint32_t * arr) {
__m256i tmp = _mm256_set1_epi64x(0);
asm volatile("vpblendd %[m], (%[arr]), %[tmp], %[tmp]\n\t"
: [ tmp ] "+x"(tmp)
: [ arr ] "r"(arr), [ m ] "i"(((1 << N) - 1))
:);
return tmp;
}
// Does not fault
__m256i __attribute__((noinline)) mask_load_asm(uint32_t * arr) {
__m256i tmp;
asm volatile(
"movb %[m], %%al\n\t"
"kmovb %%eax, %%k1\n\t"
"vmovdqu32 (%[arr]), %[tmp] %{%%k1} %{z%}\n\t"
: [ tmp ] "+x"(tmp)
: [ arr ] "r"(arr), [ m ] "i"(((1 << N) - 1))
: "eax", "k1");
return tmp;
}
void __attribute__((noinline)) mask_store(uint32_t * arr, __m256i v) {
return _mm256_mask_storeu_epi32(arr, (1 << N) - 1, v);
}
#define NPAGES (2)
#define END_OF_PAGE (1024 - N)
#ifndef LOAD_METHOD
#define LOAD_METHOD mask_load // mask_load_asm does not fault
#endif
int
main() {
uint32_t * addr =
(uint32_t *)mmap(NULL, NPAGES * 4096, PROT_READ | PROT_WRITE,
MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
for (uint32_t i = 0; i < NPAGES; i += 2) {
uint32_t page_offset = 1024 * i + END_OF_PAGE;
uint32_t next_page_offset = 1024 * (i + 1);
assert(!mprotect(addr + next_page_offset, 4096, PROT_NONE));
mask_store(addr + page_offset, LOAD_METHOD(addr + page_offset));
}
}
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
2020-10-30 17:57 ` [Bug c++/97642] " jakub at gcc dot gnu.org
2020-10-30 17:57 ` jakub at gcc dot gnu.org
@ 2020-11-03 7:46 ` crazylht at gmail dot com
2020-11-03 9:03 ` jakub at gcc dot gnu.org
` (3 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-11-03 7:46 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #1)
> The problem is that in the RTL representation there is nothing that would
> tell cse, forward propagation or combiner etc. not to optimize the
> (insn 7 6 8 2 (set (reg:QI 89)
> (const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77
> {*movqi_internal}
> (nil))
> (insn 8 7 9 2 (set (reg:V8SI 87)
> (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8])
> (reg:V8SI 88)
> (reg:QI 89))) "include/avx512vlintrin.h":865:20 1423
> {avx512vl_loadv8si_mask}
> (nil))
> into:
> (insn 8 7 9 2 (set (reg:V8SI 87)
> (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0 S32 A8])
> (reg:V8SI 88 [ tmp ])
> (const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402
> {avx2_pblenddv8si}
> (expr_list:REG_DEAD (reg:QI 89)
> (expr_list:REG_DEAD (reg:V8SI 88 [ tmp ])
> (expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ])
> (nil)))))
> Guess we'd need to use some UNSPEC for the masked loads and have patterns
> for combine to optimize those that have -1 masks into normal loads, or
> disable the blend patterns with MEM operands for avx512f+ (i.e. force those
> into registers).
I prefer UNSPEC solution, UNSPEC masked load patterns only needed for
intrinsics, <avx512>_load<mode>_mask could be keeped and renamed to
<avx512>_blendm<mode>.
> Because the RTL representation really matches more the blend behavior than
> the avx512 masking, where exceptions from the masked off elts just don't
> show up.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
` (2 preceding siblings ...)
2020-11-03 7:46 ` [Bug target/97642] " crazylht at gmail dot com
@ 2020-11-03 9:03 ` jakub at gcc dot gnu.org
2020-12-03 5:34 ` cvs-commit at gcc dot gnu.org
` (2 subsequent siblings)
6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-03 9:03 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ok, but please also during expansion try to detect the all ones mask case and
already during expansion emit normal non-masked load (+ we need a
define_insn_and_split for it with pre-reload splitting just in case it gets
propagated in combine).
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
` (3 preceding siblings ...)
2020-11-03 9:03 ` jakub at gcc dot gnu.org
@ 2020-12-03 5:34 ` cvs-commit at gcc dot gnu.org
2020-12-03 5:40 ` crazylht at gmail dot com
2023-01-30 19:10 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-12-03 5:34 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:
https://gcc.gnu.org/g:35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b
commit r11-5696-g35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b
Author: liuhongt <hongtao.liu@intel.com>
Date: Tue Nov 3 17:26:43 2020 +0800
Fix incorrect replacement of vmovdqu32 with vpblendd which can cause fault.
gcc/ChangeLog:
PR target/97642
* config/i386/i386-expand.c
(ix86_expand_special_args_builtin): Don't move all-ones mask
operands into register.
* config/i386/sse.md (UNSPEC_MASKLOAD): New unspec.
(*<avx512>_load<mode>_mask): New define_insns for masked load
instructions.
(<avx512>_load<mode>_mask): Changed to define_expands which
specifically handle memory or all-ones mask operands.
(<avx512>_blendm<mode>): Changed to define_insns which are same
as original <avx512>_load<mode>_mask with adjustment of
operands order.
(*<avx512>_load<mode>): New define_insn_and_split which is
used to optimize for masked load with all one mask.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to
make sure only masked load instruction is generated.
* gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto.
* gcc.target/i386/avx512f-vmovapd-1.c: Ditto.
* gcc.target/i386/avx512f-vmovaps-1.c: Ditto.
* gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto.
* gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovapd-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovaps-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto.
* gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto.
* gcc.target/i386/pr97642-1.c: New test.
* gcc.target/i386/pr97642-2.c: New test.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
` (4 preceding siblings ...)
2020-12-03 5:34 ` cvs-commit at gcc dot gnu.org
@ 2020-12-03 5:40 ` crazylht at gmail dot com
2023-01-30 19:10 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-12-03 5:40 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11, GCC10 is fine, no need to backport.
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
` (5 preceding siblings ...)
2020-12-03 5:40 ` crazylht at gmail dot com
@ 2023-01-30 19:10 ` pinskia at gcc dot gnu.org
6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-01-30 19:10 UTC (permalink / raw)
To: gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642
Andrew Pinski <pinskia at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Target Milestone|--- |11.0
Resolution|--- |FIXED
--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.
^ permalink raw reply [flat|nested] 8+ messages in thread