From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-bugzilla@gcc.gnu.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 8D73C3858D38; Fri, 30 Sep 2022 09:06:45 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 8D73C3858D38
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org;
	s=default; t=1664528805;
	bh=Gn0BuHYAEJxR2FqnsvRLA6/nQ8WJUPIE+TNSV6Cea+g=;
	h=From:To:Subject:Date:From;
	b=XLHZhhtp9tOljXu6pSme2FiuJGn0DmliIG3rRx5dXOt+0ldysWPNHGTNGk/TDeWuI
	 4/pDP4Q7K0pWK7homIIA6To4EUym7QID1+r/8KqdGVtydqmm22U7CVAPKPf5ovTXa6
	 9ZfUc2Hbp95QGCTSg86QsYrWT2HonIczCPhI3uL4=
From: "rguenth at gcc dot gnu.org" <gcc-bugzilla@gcc.gnu.org>
To: gcc-bugs@gcc.gnu.org
Subject: [Bug target/107093] New: AVX512 mask operations not simplified in
 fully masked loop
Date: Fri, 30 Sep 2022 09:06:43 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: gcc
X-Bugzilla-Component: target
X-Bugzilla-Version: 13.0
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: rguenth at gcc dot gnu.org
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P3
X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-107093-4@http.gcc.gnu.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <gcc-bugs.sourceware.org>

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107093

            Bug ID: 107093
           Summary: AVX512 mask operations not simplified in fully masked
                    loop
           Product: gcc
           Version: 13.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

Trying to implement WHILE_ULT for AVX512 I run into optimization issues.=20
Consider

double a[1024], b[1024];

void foo (int n)
{
  for (int i =3D 0; i < n; ++i)
    a[i] =3D b[i] * 3.;
}

compiled with -O3 -march=3Dcascadelake --param vect-partial-vector-usage=3D2

I get snippets like

        kxnorb  %k1, %k1, %k1
        kortestb        %k1, %k1
        je      .L11

or

        kxorb   %k1, %k1, %k1
        kxnorb  %k1, %k1, %k1

where we fail to simplify the operations.  Looking at the RTL it looks
like missed jump threading, but I do see the ops being

(insn 18 72 74 5 (parallel [
            (set (reg:QI 69 k1 [orig:86 loop_mask_15 ] [86])
                (not:QI (xor:QI (reg:QI 69 k1 [orig:86 loop_mask_15 ] [86])
                        (reg:QI 69 k1 [orig:86 loop_mask_15 ] [86]))))
            (unspec [
                    (const_int 0 [0])
                ] UNSPEC_MASKOP)
        ]) 1912 {kxnorqi}
     (expr_list:REG_EQUAL (const_int -1 [0xffffffffffffffff])
        (nil)))

thus having an UNSPEC in them.  When emitting a SET from constm1 I end up
with mask<->GPR moves and if-converted code which isn't optimal either.
When doing -fno-if-conversion I get

.L7:
        vmovapd b(%rax), %ymm1{%k1}
        addl    $4, %ecx
        movl    %edi, %edx
        vmulpd  %ymm2, %ymm1, %ymm0
        subl    %ecx, %edx
        vmovapd %ymm0, a(%rax){%k1}
        kxnorb  %k1, %k1, %k1
        cmpl    $4, %edx
        jge     .L5
        vpbroadcastd    %edx, %xmm0
        vpcmpd  $1, %xmm0, %xmm3, %k1
.L5:
        addq    $32, %rax
        kortestb        %k1, %k1
        jne     .L7

which also doesn't have the desired short-cut from the cmpl $4, %edx.=