From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 5BA2C3858413; Tue, 11 Oct 2022 10:05:15 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 5BA2C3858413 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1665482715; bh=md7k/JePwqcumu5rzN97TZHwYCUOFjbIMDSrUAM5VXM=; h=From:To:Subject:Date:In-Reply-To:References:From; b=smni5JlrOyYsUNycaugxm4wZXn4hpB9TiYIC6aY/FRzu7LZSg4Z1yCJGcIzjQ/ar9 umVYXtX8Fp4y9fT2uHWpCCRm4aNZGFrhpPrA1/jWak5lDdgWTx1YXK7mB6rTS3+zZ0 oParoOs/EB+2ockj4TNfFEosj6fT0pDOtatn4qgI= From: "crazylht at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/107093] AVX512 mask operations not simplified in fully masked loop Date: Tue, 11 Oct 2022 10:05:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: crazylht at gmail dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107093 --- Comment #4 from Hongtao.liu --- change "*k, CBC" to "?k, CBC", in *mov{qi,hi,si,di}_internal. then RA works good to choose kxnor for setting constm1_rtx to mask register, and i got below with your attached patch(change #if 0 to #if 1), seems bett= er than orginal patch. 6foo: 7.LFB0: 8 .cfi_startproc 9 testl %edi, %edi 10 jle .L9 11 kxnorb %k1, %k1, %k1 12 cmpl $4, %edi 13 jl .L11 14.L3: 15 vbroadcastsd .LC2(%rip), %ymm3 16 vmovdqa .LC0(%rip), %xmm2 17 xorl %eax, %eax 18 xorl %ecx, %ecx 19 .p2align 4,,10 20 .p2align 3 21.L7: 22 vmovapd b(%rax), %ymm0{%k1} 23 addl $4, %ecx 24 movl %edi, %edx 25 vmulpd %ymm3, %ymm0, %ymm1 26 subl %ecx, %edx 27 cmpl $4, %edx 28 vmovapd %ymm1, a(%rax){%k1} 29 vpbroadcastd %edx, %xmm1 30 movl $-1, %edx 31 vpcmpd $1, %xmm1, %xmm2, %k1 32 kmovb %k1, %esi 33 cmovge %edx, %esi 34 addq $32, %rax 35 kmovb %esi, %k1 36 kortestb %k1, %k1 37 jne .L7 38 vzeroupper 39.L9: 40 ret 41 .p2align 4,,10 42 .p2align 3 43.L11: 44 vmovdqa .LC0(%rip), %xmm2 45 vpbroadcastd %edi, %xmm1 46 vpcmpd $1, %xmm1, %xmm2, %k1 47 jmp .L3 48 .cfi_endproc=