From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id A78D63857C52; Mon, 25 Jan 2021 19:19:54 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org A78D63857C52 From: "peter at cordes dot ca" To: gcc-bugs@gcc.gnu.org Subject: [Bug middle-end/98801] Request for a conditional move built-in function Date: Mon, 25 Jan 2021 19:19:54 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: middle-end X-Bugzilla-Version: 10.2.1 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: peter at cordes dot ca X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: cc Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 25 Jan 2021 19:19:54 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D98801 Peter Cordes changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |peter at cordes dot ca --- Comment #5 from Peter Cordes --- (In reply to Richard Biener from comment #4) > Slight complication arises because people will want to have cmoves with a > memory destination. Do we even want to provide this? Most ISAs can't branchlessly conditionally store, except via an RMW (which wouldn't be thread-safe for the no-store ca= se if not atomic) or something really clunky. (Like x86 rep stos with count= =3D0 or 1.) ARM predicated instructions allow branchless load or store that doesn't dis= turb the memory operand (and won't even fault on a bad address). I guess another option to emulate it could be to make a dummy local and cmo= v to select a store address =3D dummy : real. But that's something users can bu= ild in the source using a non-memory conditional-select builtin that exposes the m= uch more widely available ALU conditional-select functionality like x86 CMOV, AArch64 CSEL, MIPS MVN, etc. > That won't solve the eventual request to have cmov _from_ memory ... (if = we > leave all of the memory combining to RTL people will again complain that > it's subject to compilers discretion). It might be sufficient for most use-cases like defending against timing side-channels to not really try to allow conditional loads (from maybe-inva= lid pointers). ---- I'm not sure if the motivation for this includes trying to make code without data-dependent branching, to defend against timing side-channels. But if we do provide something like this, people are going to want to use it that way. That's one case where best-effort behaviour at the mercy of the optimizer for a ternary (or having to manually check the asm) is not great.= =20 Stack Overflow has gotten a few Q&As from people looking for guaranteed CMOV for reasons like that. So I think we should be wary of exposing functionality that most ISAs don't have. OTOH, failing to provide a way to take advantage of functionality th= at some ISAs *do* have is not great, e.g. ISO C failing to provide popcnt and bit-scan (clz / ctz) has been a problem for C for a long time. But for something like __builtin_clz, emulating on machines that don't have hardware support still works. If we're trying to support a guarantee of no data-dependent branching, that limits the emulation possibilities or makes = them clunkier. Especially if we want to support ARM's ability to not fault / not access memory if the condition is false. The ALU-select part can be emulated with AND/OR, so that's something we can provide on any target. Folding memory operands into a predicated load on ARM could actually introd= uce data-dependent cache access, vs. an unconditional load and a predicated reg= -reg MOV. So this becomes somewhat thorny, and some design work to figure out w= hat documented guarantees to provide will be necessary. Performance use-cases would certainly rather just have a conditional load in one instruction.=