From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 95BFB3858D20; Sun, 11 Feb 2024 11:34:32 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 95BFB3858D20 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1707651272; bh=5IMD4NxAiD88bhtlBkWADgRVxPmNuSbgPGZ002Xh6To=; h=From:To:Subject:Date:In-Reply-To:References:From; b=qOfVl2cXfBFSWi+Cz8sAXykVLnPuLaQPAmD2iFb/c+viJjPnxGSmi+7ff4ztsmWRJ ndMtbiRhyHKs6BKAAvGgQ21C6DqhQoMXXKsmglQ3T0vzlMvV+tBM1ho5MEpInJbS+a L1STWcHtd2fWGNZT2l7VJOr/Pf2a7ZGdHr2aMcI0= From: "roger at nextmovesoftware dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/113764] [X86] __builtin_clz generates lzcnt when bsr is sufficient Date: Sun, 11 Feb 2024 11:34:31 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 13.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: normal X-Bugzilla-Who: roger at nextmovesoftware dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: short_desc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D113764 Roger Sayle changed: What |Removed |Added ---------------------------------------------------------------------------- Summary|[X86] Generates lzcnt when |[X86] __builtin_clz |bsr is sufficient |generates lzcnt when bsr is | |sufficient --- Comment #4 from Roger Sayle --- Yep, CLZ_DEFINED_VALUE_AT_ZERO really complicates things. With a single "global" macro it's currently impossible for a backend to support two diffe= rent CLZ instructions; one with defined behavior at zero, and the other with undefined behavior at zero. It might just be possible to do something encoding LZCNT patterns in RTL us= ing: (if_then_else:SI (ne:SI (reg:SI x) (const_int 0)) (clz:SI (reg:SI x)) (const_int VALUE)) Additionally on x86_64, the BSR instruction sets the zero flag if it's inpu= t is zero, when the destination register becomes undefined, which can be useful = with CMOV, i.e. it's possible to get defined behavior without an additional test= and branch. But for Pawel's original tescase, __builtin_clz is undefined at ze= ro, so this really is a missed optimization, with either -Os or a modern -march such as cascadelake or znver4. I agree with Jakub, this is a can of worms; potentially a lot of effort for= a marginal improvement.=