From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 65DB73858C60; Thu, 25 Jan 2024 08:48:25 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 65DB73858C60 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gcc.gnu.org; s=default; t=1706172506; bh=X7uKMFDmnYvtP8k59UY56Jl2rkXmyeiYxqhhHP5f6ZY=; h=From:To:Subject:Date:In-Reply-To:References:From; b=gkXgttgDLDe9w8lKeJwe/2Gkn+coTTPrVf33X+A+17r9mret9ftMEK2tMOEHr/j1g fOmyYrBFG6UkkgJTFESNW/b/kBhiKBoFjTAs7NnNFV9aW7FW668Xa27OB27vsOvTGF FM6MGGaxTe9gQ+Q1x8ppTv6uuUM814F8gN6c4xCE= From: "pinskia at gcc dot gnu.org" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/90582] AArch64 stack-protector wastes an instruction on address-generation Date: Thu, 25 Jan 2024 08:48:14 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 8.2.1 X-Bugzilla-Keywords: missed-optimization X-Bugzilla-Severity: enhancement X-Bugzilla-Who: pinskia at gcc dot gnu.org X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: unassigned at gcc dot gnu.org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_severity Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D90582 Andrew Pinski changed: What |Removed |Added ---------------------------------------------------------------------------- Severity|normal |enhancement --- Comment #2 from Andrew Pinski --- (In reply to Andrew Pinski from comment #1) > > I assume EOR / CBNZ is as at least as efficient as SUBS / BNE on > > all/most AArch64 microarchitectures, but someone should check. >=20 > It is similar as x86 with that respect on some cores (Marvell's cores > mostly). > That is ThunderX, ThunderX 2 and OcteonTX and OcteonTX2 all have the abil= ity > to do macro-combining of the two instructions into one micro-op. Even on non-most Marvell cores now, subs/bne is better than eor/cbnz. Anyways starting GCC 10.3/9.4 we get: ldr x2, [x0] subs x1, x1, x2 mov x2, 0 bne .L5 Which we can't fuse anyways. I wonder if we should clobber x1 too. Note for -fomit-frame-pointer issue, it is not really an issue as only -momit-leaf-frame-pointer is turned on by default and now the function is N= OT a leaf function due to the call to __stack_chk_fail . > mov x1,0 # and destroy the reg > mov w1, 3 # right before it's alread= y destroyed This is by design, GCC does not go back and figure out if we could remove t= he zeroing as if it deletes it on accident, it might introduce a "security hol= e". So emitting it always allows that NOT to happen. As far as the other issue dealing with the address formation, it is a small missed optmization and might not help in general or at all.=