public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug target/107093] New: AVX512 mask operations not simplified in fully masked loop
@ 2022-09-30  9:06 rguenth at gcc dot gnu.org
  2022-09-30  9:09 ` [Bug target/107093] " rguenth at gcc dot gnu.org
                   ` (9 more replies)
  0 siblings, 10 replies; 11+ messages in thread
From: rguenth at gcc dot gnu.org @ 2022-09-30  9:06 UTC (permalink / raw)
  To: gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107093

            Bug ID: 107093
           Summary: AVX512 mask operations not simplified in fully masked
                    loop
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Trying to implement WHILE_ULT for AVX512 I run into optimization issues. 
Consider

double a[1024], b[1024];

void foo (int n)
{
  for (int i = 0; i < n; ++i)
    a[i] = b[i] * 3.;
}

compiled with -O3 -march=cascadelake --param vect-partial-vector-usage=2

I get snippets like

        kxnorb  %k1, %k1, %k1
        kortestb        %k1, %k1
        je      .L11

or

        kxorb   %k1, %k1, %k1
        kxnorb  %k1, %k1, %k1

where we fail to simplify the operations.  Looking at the RTL it looks
like missed jump threading, but I do see the ops being

(insn 18 72 74 5 (parallel [
            (set (reg:QI 69 k1 [orig:86 loop_mask_15 ] [86])
                (not:QI (xor:QI (reg:QI 69 k1 [orig:86 loop_mask_15 ] [86])
                        (reg:QI 69 k1 [orig:86 loop_mask_15 ] [86]))))
            (unspec [
                    (const_int 0 [0])
                ] UNSPEC_MASKOP)
        ]) 1912 {kxnorqi}
     (expr_list:REG_EQUAL (const_int -1 [0xffffffffffffffff])
        (nil)))

thus having an UNSPEC in them.  When emitting a SET from constm1 I end up
with mask<->GPR moves and if-converted code which isn't optimal either.
When doing -fno-if-conversion I get

.L7:
        vmovapd b(%rax), %ymm1{%k1}
        addl    $4, %ecx
        movl    %edi, %edx
        vmulpd  %ymm2, %ymm1, %ymm0
        subl    %ecx, %edx
        vmovapd %ymm0, a(%rax){%k1}
        kxnorb  %k1, %k1, %k1
        cmpl    $4, %edx
        jge     .L5
        vpbroadcastd    %edx, %xmm0
        vpcmpd  $1, %xmm0, %xmm3, %k1
.L5:
        addq    $32, %rax
        kortestb        %k1, %k1
        jne     .L7

which also doesn't have the desired short-cut from the cmpl $4, %edx.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-07-24  8:21 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-30  9:06 [Bug target/107093] New: AVX512 mask operations not simplified in fully masked loop rguenth at gcc dot gnu.org
2022-09-30  9:09 ` [Bug target/107093] " rguenth at gcc dot gnu.org
2022-10-10  1:50 ` crazylht at gmail dot com
2022-10-11  9:23 ` cvs-commit at gcc dot gnu.org
2022-10-11 10:05 ` crazylht at gmail dot com
2022-10-11 10:13 ` crazylht at gmail dot com
2022-10-11 10:51 ` rguenth at gcc dot gnu.org
2022-10-11 10:59 ` rguenth at gcc dot gnu.org
2022-10-11 11:08 ` crazylht at gmail dot com
2022-10-11 11:14 ` rguenther at suse dot de
2023-07-24  8:21 ` rguenth at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).