public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault
@ 2020-10-30 16:33 goldstein.w.n at gmail dot com
  2020-10-30 17:57 ` [Bug c++/97642] " jakub at gcc dot gnu.org
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: goldstein.w.n at gmail dot com @ 2020-10-30 16:33 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

            Bug ID: 97642
           Summary: Incorrect replacement of vmovdqu32 with vpblendd can
                    cause fault
           Product: gcc
           Version: 10.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: goldstein.w.n at gmail dot com
  Target Milestone: ---

GCC sometimes replaces 

_mm256_mask_loadu_epi32(__m256i src, __mask8 k, void const * mem_addr) //
vmovdqu32

With

vpblendd

If mem_addr points to a memory region with less than 32 bytes of accessible
memory and k is a mask that would prevent reading the inaccessible bytes from
mem_addr the replacement will cause a fault.

See: https://godbolt.org/z/Y5cTxz

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c++/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
@ 2020-10-30 17:57 ` jakub at gcc dot gnu.org
  2020-10-30 17:57 ` jakub at gcc dot gnu.org
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-30 17:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

Jakub Jelinek <jakub at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
     Ever confirmed|0                           |1
                 CC|                            |crazylht at gmail dot com,
                   |                            |hjl.tools at gmail dot com,
                   |                            |jakub at gcc dot gnu.org
   Last reconfirmed|                            |2020-10-30
             Status|UNCONFIRMED                 |NEW

--- Comment #1 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The problem is that in the RTL representation there is nothing that would tell
cse, forward propagation or combiner etc. not to optimize the
(insn 7 6 8 2 (set (reg:QI 89)
        (const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77
{*movqi_internal}
     (nil))
(insn 8 7 9 2 (set (reg:V8SI 87)
        (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0  S32 A8])
            (reg:V8SI 88)
            (reg:QI 89))) "include/avx512vlintrin.h":865:20 1423
{avx512vl_loadv8si_mask}
     (nil))
into:
(insn 8 7 9 2 (set (reg:V8SI 87)
        (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0  S32 A8])
            (reg:V8SI 88 [ tmp ])
            (const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402
{avx2_pblenddv8si}
     (expr_list:REG_DEAD (reg:QI 89)
        (expr_list:REG_DEAD (reg:V8SI 88 [ tmp ])
            (expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ])
                (nil)))))
Guess we'd need to use some UNSPEC for the masked loads and have patterns for
combine to optimize those that have -1 masks into normal loads, or disable the
blend patterns with MEM operands for avx512f+ (i.e. force those into
registers).
Because the RTL representation really matches more the blend behavior than the
avx512 masking, where exceptions from the masked off elts just don't show up.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug c++/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
  2020-10-30 17:57 ` [Bug c++/97642] " jakub at gcc dot gnu.org
@ 2020-10-30 17:57 ` jakub at gcc dot gnu.org
  2020-11-03  7:46 ` [Bug target/97642] " crazylht at gmail dot com
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-10-30 17:57 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

--- Comment #2 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
The testcase is:
#include <assert.h>
#include <immintrin.h>
#include <stdint.h>
#include <sys/mman.h>

#define N 5

// Faults with GCC because usage of vpblendd
__m256i __attribute__((noinline)) mask_load(uint32_t * arr) {
    __m256i tmp;
    return _mm256_mask_loadu_epi32(tmp, (1 << N) - 1, arr);
}

// Faults
__m256i __attribute__((noinline)) blend_load_asm(uint32_t * arr) {
    __m256i tmp = _mm256_set1_epi64x(0);
    asm volatile("vpblendd %[m], (%[arr]), %[tmp], %[tmp]\n\t"
                 : [ tmp ] "+x"(tmp)
                 : [ arr ] "r"(arr), [ m ] "i"(((1 << N) - 1))
                 :);
    return tmp;
}

// Does not fault
__m256i __attribute__((noinline)) mask_load_asm(uint32_t * arr) {
    __m256i           tmp;
    asm volatile(
        "movb %[m], %%al\n\t"
        "kmovb %%eax, %%k1\n\t"
        "vmovdqu32 (%[arr]), %[tmp] %{%%k1} %{z%}\n\t"
        : [ tmp ] "+x"(tmp)
        : [ arr ] "r"(arr), [ m ] "i"(((1 << N) - 1))
        : "eax", "k1");
    return tmp;
}


void __attribute__((noinline)) mask_store(uint32_t * arr, __m256i v) {
    return _mm256_mask_storeu_epi32(arr, (1 << N) - 1, v);
}


#define NPAGES      (2)
#define END_OF_PAGE (1024 - N)

#ifndef LOAD_METHOD
#define LOAD_METHOD mask_load // mask_load_asm does not fault
#endif


int
main() {
    uint32_t * addr =
        (uint32_t *)mmap(NULL, NPAGES * 4096, PROT_READ | PROT_WRITE,
                         MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);

    for (uint32_t i = 0; i < NPAGES; i += 2) {

        uint32_t page_offset      = 1024 * i + END_OF_PAGE;
        uint32_t next_page_offset = 1024 * (i + 1);

        assert(!mprotect(addr + next_page_offset, 4096, PROT_NONE));
        mask_store(addr + page_offset, LOAD_METHOD(addr + page_offset));
    }
}

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
  2020-10-30 17:57 ` [Bug c++/97642] " jakub at gcc dot gnu.org
  2020-10-30 17:57 ` jakub at gcc dot gnu.org
@ 2020-11-03  7:46 ` crazylht at gmail dot com
  2020-11-03  9:03 ` jakub at gcc dot gnu.org
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-11-03  7:46 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

--- Comment #3 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Jakub Jelinek from comment #1)
> The problem is that in the RTL representation there is nothing that would
> tell cse, forward propagation or combiner etc. not to optimize the
> (insn 7 6 8 2 (set (reg:QI 89)
>         (const_int 31 [0x1f])) "include/avx512vlintrin.h":865:20 77
> {*movqi_internal}
>      (nil))
> (insn 8 7 9 2 (set (reg:V8SI 87)
>         (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0  S32 A8])
>             (reg:V8SI 88)
>             (reg:QI 89))) "include/avx512vlintrin.h":865:20 1423
> {avx512vl_loadv8si_mask}
>      (nil))
> into:
> (insn 8 7 9 2 (set (reg:V8SI 87)
>         (vec_merge:V8SI (mem:V8SI (reg/v/f:DI 86 [ arr ]) [0  S32 A8])
>             (reg:V8SI 88 [ tmp ])
>             (const_int 31 [0x1f]))) "include/avx512vlintrin.h":865:20 4402
> {avx2_pblenddv8si}
>      (expr_list:REG_DEAD (reg:QI 89)
>         (expr_list:REG_DEAD (reg:V8SI 88 [ tmp ])
>             (expr_list:REG_DEAD (reg/v/f:DI 86 [ arr ])
>                 (nil)))))
> Guess we'd need to use some UNSPEC for the masked loads and have patterns
> for combine to optimize those that have -1 masks into normal loads, or
> disable the blend patterns with MEM operands for avx512f+ (i.e. force those
> into registers).

I prefer UNSPEC solution, UNSPEC masked load patterns only needed for
intrinsics, <avx512>_load<mode>_mask could be keeped and renamed to
<avx512>_blendm<mode>.



> Because the RTL representation really matches more the blend behavior than
> the avx512 masking, where exceptions from the masked off elts just don't
> show up.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
                   ` (2 preceding siblings ...)
  2020-11-03  7:46 ` [Bug target/97642] " crazylht at gmail dot com
@ 2020-11-03  9:03 ` jakub at gcc dot gnu.org
  2020-12-03  5:34 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: jakub at gcc dot gnu.org @ 2020-11-03  9:03 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ok, but please also during expansion try to detect the all ones mask case and
already during expansion emit normal non-masked load (+ we need a
define_insn_and_split for it with pre-reload splitting just in case it gets
propagated in combine).

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
                   ` (3 preceding siblings ...)
  2020-11-03  9:03 ` jakub at gcc dot gnu.org
@ 2020-12-03  5:34 ` cvs-commit at gcc dot gnu.org
  2020-12-03  5:40 ` crazylht at gmail dot com
  2023-01-30 19:10 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2020-12-03  5:34 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

--- Comment #5 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by hongtao Liu <liuhongt@gcc.gnu.org>:

https://gcc.gnu.org/g:35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b

commit r11-5696-g35c4c67e6c534ef3d6ba7a7752ab7e0fbc91755b
Author: liuhongt <hongtao.liu@intel.com>
Date:   Tue Nov 3 17:26:43 2020 +0800

    Fix incorrect replacement of vmovdqu32 with vpblendd which can cause fault.

    gcc/ChangeLog:

            PR target/97642
            * config/i386/i386-expand.c
            (ix86_expand_special_args_builtin): Don't move all-ones mask
            operands into register.
            * config/i386/sse.md (UNSPEC_MASKLOAD): New unspec.
            (*<avx512>_load<mode>_mask): New define_insns for masked load
            instructions.
            (<avx512>_load<mode>_mask): Changed to define_expands which
            specifically handle memory or all-ones mask operands.
            (<avx512>_blendm<mode>): Changed to define_insns which are same
            as original <avx512>_load<mode>_mask with adjustment of
            operands order.
            (*<avx512>_load<mode>): New define_insn_and_split which is
            used to optimize for masked load with all one mask.

    gcc/testsuite/ChangeLog:

            * gcc.target/i386/avx512bw-vmovdqu16-1.c: Adjust testcase to
            make sure only masked load instruction is generated.
            * gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto.
            * gcc.target/i386/avx512f-vmovapd-1.c: Ditto.
            * gcc.target/i386/avx512f-vmovaps-1.c: Ditto.
            * gcc.target/i386/avx512f-vmovdqa32-1.c: Ditto.
            * gcc.target/i386/avx512f-vmovdqa64-1.c: Ditto.
            * gcc.target/i386/avx512vl-vmovapd-1.c: Ditto.
            * gcc.target/i386/avx512vl-vmovaps-1.c: Ditto.
            * gcc.target/i386/avx512vl-vmovdqa32-1.c: Ditto.
            * gcc.target/i386/avx512vl-vmovdqa64-1.c: Ditto.
            * gcc.target/i386/pr97642-1.c: New test.
            * gcc.target/i386/pr97642-2.c: New test.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
                   ` (4 preceding siblings ...)
  2020-12-03  5:34 ` cvs-commit at gcc dot gnu.org
@ 2020-12-03  5:40 ` crazylht at gmail dot com
  2023-01-30 19:10 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: crazylht at gmail dot com @ 2020-12-03  5:40 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

--- Comment #6 from Hongtao.liu <crazylht at gmail dot com> ---
Fixed in GCC11, GCC10 is fine, no need to backport.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Bug target/97642] Incorrect replacement of vmovdqu32 with vpblendd can cause fault
  2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
                   ` (5 preceding siblings ...)
  2020-12-03  5:40 ` crazylht at gmail dot com
@ 2023-01-30 19:10 ` pinskia at gcc dot gnu.org
  6 siblings, 0 replies; 8+ messages in thread
From: pinskia at gcc dot gnu.org @ 2023-01-30 19:10 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97642

Andrew Pinski <pinskia at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
   Target Milestone|---                         |11.0
         Resolution|---                         |FIXED

--- Comment #7 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-30 19:10 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-30 16:33 [Bug c++/97642] New: Incorrect replacement of vmovdqu32 with vpblendd can cause fault goldstein.w.n at gmail dot com
2020-10-30 17:57 ` [Bug c++/97642] " jakub at gcc dot gnu.org
2020-10-30 17:57 ` jakub at gcc dot gnu.org
2020-11-03  7:46 ` [Bug target/97642] " crazylht at gmail dot com
2020-11-03  9:03 ` jakub at gcc dot gnu.org
2020-12-03  5:34 ` cvs-commit at gcc dot gnu.org
2020-12-03  5:40 ` crazylht at gmail dot com
2023-01-30 19:10 ` pinskia at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).