From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id ECD703858D3C; Tue, 8 Nov 2022 11:24:16 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org ECD703858D3C DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1667906656; bh=sOp9jTS+m+MLU2HH9htliiw/mOgytNvxoaPU7avJPxc=; h=From:To:Subject:Date:In-Reply-To:References:From; b=fOmXjMVAgdMDTj2LF+Bw6OOnuc7zdSCkL4mDXohA2ToOYEolCvFP32PaQwTW7BQV3 ct0cllsFGzhHiotXi0epmbV52kV0luz2zzbRvM9RvJORDaVdkW80Rw0rtpLTTAWE7l j5+hhPYqYjmBd22x4hHu3c9+t1z5GVuWXFNK9LC8= From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/107546] [10/11/12/13 Regression] simd, redundant pcmpeqb and pxor Date: Tue, 08 Nov 2022 11:24:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.2.0 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: 10.5 X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D107546 --- Comment #11 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:fa271afb58423014e2feef9f15c1a87428e64ddc commit r13-3803-gfa271afb58423014e2feef9f15c1a87428e64ddc Author: Jakub Jelinek Date: Tue Nov 8 12:21:55 2022 +0100 i386: Improve vector [GL]E{,U} comparison against vector constants [PR107546] For integer vector comparisons without XOP before AVX512{F,VL} we are constrained by only GT and EQ being supported in HW. For GTU we play tricks to implement it using GT or unsigned saturating subtraction, for LT/LTU we swap the operands and thus turn it into GT/GTU. For LE/LEU we handle it by using GT/GTU and negating the result and for GE/GEU by using GT/GTU on swapped operands and negating the result. If the second operand is a CONST_VECTOR, we can usually do better thoug= h, we can avoid the negation. For LE/LEU cst by doing LT/LTU cst+1 (and then cst+1 GT/GTU x) and for GE/GEU cst by doing GT/GTU cst-1, provided there is no wrap-around on those cst+1 or cst-1. GIMPLE canonicalizes x < cst to x <=3D cst-1 etc. (the rule is smaller absolute value on constant), but only for scalars or uniform vectors, so in some cases this undoes that canonicalization in order to avoid the extra negation, but it handles also non-uniform constants. E.g. with -mavx2 the testcase assembly difference is: - movl $47, %eax + movl $48, %eax vmovdqa %xmm0, %xmm1 vmovd %eax, %xmm0 vpbroadcastb %xmm0, %xmm0 - vpminsb %xmm0, %xmm1, %xmm0 - vpcmpeqb %xmm1, %xmm0, %xmm0 + vpcmpgtb %xmm1, %xmm0, %xmm0 and - vmovdqa %xmm0, %xmm1 - vmovdqa .LC1(%rip), %xmm0 - vpminsb %xmm1, %xmm0, %xmm1 - vpcmpeqb %xmm1, %xmm0, %xmm0 + vpcmpgtb .LC1(%rip), %xmm0, %xmm0 while with just SSE2: - pcmpgtb .LC0(%rip), %xmm0 - pxor %xmm1, %xmm1 - pcmpeqb %xmm1, %xmm0 + movdqa %xmm0, %xmm1 + movdqa .LC0(%rip), %xmm0 + pcmpgtb %xmm1, %xmm0 and - movdqa %xmm0, %xmm1 - movdqa .LC1(%rip), %xmm0 - pcmpgtb %xmm1, %xmm0 - pxor %xmm1, %xmm1 - pcmpeqb %xmm1, %xmm0 + pcmpgtb .LC1(%rip), %xmm0 2022-11-08 Jakub Jelinek PR target/107546 * config/i386/predicates.md (vector_or_const_vector_operand): N= ew predicate. * config/i386/sse.md (vec_cmp, vec_cmpv2div2di, vec_cmpu, vec_cmpuv2div2di): Use nonimmediate_or_const_vector_operand predicate instead of nonimmediate_operand and vector_or_const_vector_operand instead of vector_operand. * config/i386/i386-expand.cc (ix86_expand_int_sse_cmp): For LE/LEU or GE/GEU with CONST_VECTOR cop1 try to transform those into LE/LEU or GT/GTU with larger or smaller by one cop1 if there is no wrap-around. Force CONST_VECTOR cop0 or cop1 into REG. Formatting fix. * gcc.target/i386/pr107546.c: New test.=