From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 9218F385701D; Sat, 31 Jul 2021 07:21:07 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 9218F385701D From: "cvs-commit at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/78103] Failure to optimize with __builtin_clzl Date: Sat, 31 Jul 2021 07:21:06 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 6.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: cvs-commit at gcc dot gnu.org X-Bugzilla-Status: ASSIGNED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: jakub at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 31 Jul 2021 07:21:07 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D78103 --- Comment #22 from CVS Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:91425e2adecd00091d7443104ecb367686e88663 commit r12-2649-g91425e2adecd00091d7443104ecb367686e88663 Author: Jakub Jelinek Date: Sat Jul 31 09:19:32 2021 +0200 i386: Improve extensions of __builtin_clz and constant - __builtin_clz = for -mno-lzcnt [PR78103] This patch improves emitted code for the non-TARGET_LZCNT case. As __builtin_clz* is UB on 0 argument and for !TARGET_LZCNT CLZ_VALUE_DEFINED_AT_ZERO is 0, it is UB even at RTL time and so we can take advantage of that and assume the result will be 0 to 31 or 0 to 63. Given that, sign or zero extension of that result are the same and are actually already performed by bsrl or xorl instructions. And constant - __builtin_clz* can be simplified into bsr + constant - bitmask. For TARGET_LZCNT, a lot of this is already fine as is (e.g. the sign or zero extensions), and other optimizations are IMHO not possible (if we have lzcnt, we've lost information on whether it is UB at zero or not and so can't transform it into bsr even when that is 1-2 insns shorter). The changes on the 3 testcases between unpatched and patched gcc are for -m64: pr78103-1.s: bsrq %rdi, %rax - xorq $63, %rax - cltq + xorl $63, %eax ... bsrq %rdi, %rax - xorq $63, %rax - cltq + xorl $63, %eax ... bsrl %edi, %eax xorl $31, %eax - cltq ... bsrl %edi, %eax xorl $31, %eax - cltq pr78103-2.s: bsrl %edi, %edi - movl $32, %eax - xorl $31, %edi - subl %edi, %eax + leal 1(%rdi), %eax ... - bsrl %edi, %edi - movl $31, %eax - xorl $31, %edi - subl %edi, %eax + bsrl %edi, %eax ... bsrq %rdi, %rdi - movl $64, %eax - xorq $63, %rdi - subl %edi, %eax + leal 1(%rdi), %eax ... - bsrq %rdi, %rdi - movl $63, %eax - xorq $63, %rdi - subl %edi, %eax + bsrq %rdi, %rax pr78103-3.s: bsrl %edi, %edi - movl $32, %eax - xorl $31, %edi - movslq %edi, %rdi - subq %rdi, %rax + leaq 1(%rdi), %rax ... - bsrl %edi, %edi - movl $31, %eax - xorl $31, %edi - movslq %edi, %rdi - subq %rdi, %rax + bsrl %edi, %eax ... bsrq %rdi, %rdi - movl $64, %eax - xorq $63, %rdi - movslq %edi, %rdi - subq %rdi, %rax + leaq 1(%rdi), %rax ... - bsrq %rdi, %rdi - movl $63, %eax - xorq $63, %rdi - movslq %edi, %rdi - subq %rdi, %rax + bsrq %rdi, %rax Most of the changes are done with combine splitters, but for *bsr_rex64_2 and *bsr_2 I had to use define_insn_and_split, because as mentioned in the PR the combiner unfortunately doesn't create LOG_LI= NKS in between the two insns created by combine splitter, so it can't be combined further with following instructions. 2021-07-31 Jakub Jelinek PR target/78103 * config/i386/i386.md (bsr_rex64_1, bsr_1, bsr_zext_1): New define_insn patterns. (*bsr_rex64_2, *bsr_2): New define_insn_and_split patterns. Add combine splitters for constant - clz. (clz2): Use a temporary pseudo for bsr result. * gcc.target/i386/pr78103-1.c: New test. * gcc.target/i386/pr78103-2.c: New test. * gcc.target/i386/pr78103-3.c: New test.=