From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id C8FE7385843C; Thu, 28 Oct 2021 22:26:13 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org C8FE7385843C From: "hjl.tools at gmail dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102952] New code-gen options for retpolines and straight line speculation Date: Thu, 28 Oct 2021 22:26:13 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: hjl.tools at gmail dot com X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: hjl.tools at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2021 22:26:13 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102952 H.J. Lu changed: What |Removed |Added ---------------------------------------------------------------------------- Status|WAITING |NEW --- Comment #23 from H.J. Lu --- (In reply to Andrew Cooper from comment #22) > One curious thing I have discovered. While auditing the -mharden-sls=3Da= ll > code generation in Xen, I found examples where I got "ret int3 ret int3" > with no intervening instructions. >=20 > It turns out this is not a regression in this change. It is a pre-existi= ng > missing optimisation, which is made more obvious when every ret is extend= ed > with an int3. >=20 > It occurs for functions with either no stack frame at all, or functions > which have an early exit before setting up the stack frame. Some examples > which occur at -O1 do not occur at -O2. >=20 > One curious example which does still repro at -O2 is this. We have a hash > lookup function: >=20 > struct context *sidtab_search(struct sidtab *s, u32 sid) > { > int hvalue; > struct sidtab_node *cur; >=20 > if ( !s ) > return NULL; >=20 > hvalue =3D SIDTAB_HASH(sid); > cur =3D s->htable[hvalue]; > while ( cur !=3D NULL && sid > cur->sid ) > cur =3D cur->next; >=20 > if ( cur =3D=3D NULL || sid !=3D cur->sid ) > { > /* Remap invalid SIDs to the unlabeled SID. */ > sid =3D SECINITSID_UNLABELED; > hvalue =3D SIDTAB_HASH(sid); > cur =3D s->htable[hvalue]; > while ( cur !=3D NULL && sid > cur->sid ) > cur =3D cur->next; > if ( !cur || sid !=3D cur->sid ) > return NULL; > } >=20 > return &cur->context; > } >=20 > which compiles (reformatted a little for width - unmodified: > https://paste.debian.net/hidden/7bf675d6/) to: >=20 > : > 48 85 ff test %rdi,%rdi > /------- 74 63 je > | 48 8b 17 mov (%rdi),%rdx > | 89 f0 mov %esi,%eax > | 83 e0 7f and $0x7f,%eax > | 48 8b 04 c2 mov (%rdx,%rax,8),%rax > | 48 85 c0 test %rax,%rax > | /--- 75 13 jne > | /|--- eb 17 jmp > | || 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) > | || 00=20 > | ||/-> 48 8b 40 48 mov 0x48(%rax),%rax > | ||| 48 85 c0 test %rax,%rax > | +||-- 74 06 je > | |\|-> 39 30 cmp %esi,(%rax) > | | \-- 72 f3 jb > | /|---- 74 24 je > | |\---> 48 8b 42 28 mov 0x28(%rdx),%rax > | | 48 85 c0 test %rax,%rax > | | /--- 75 11 jne > |/|-|--- eb 32 jmp // (1) > ||| | 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > ||| |/-> 48 8b 40 48 mov 0x48(%rax),%rax > ||| || 48 85 c0 test %rax,%rax > |||/||-- 74 17 je // (2) > ||||\|-> 83 38 04 cmpl $0x4,(%rax) > |||| \-- 76 f2 jbe > |||| 83 38 05 cmpl $0x5,(%rax) > +|||---- 75 15 jne > ||\|---> 48 83 c0 08 add $0x8,%rax > || | c3 retq=20=20=20 > || | cc int3=20=20=20 > || | 0f 1f 80 00 00 00 00 nopl 0x0(%rax) > || \---> c3 retq // Target of (2) > || cc int3=20=20=20 > || 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) > \|-----> 31 c0 xor %eax,%eax > | c3 retq=20=20=20 > | cc int3=20=20=20 > \-----> c3 retq // Target of (1) > cc int3=20=20=20 > 66 90 xchg %ax,%ax >=20 > There are 4 exits in total. Two have to set up %eax, so they can't usefu= lly > be merged. >=20 > However, the unconditional jmp at (1) is 2 bytes, and could fully contain > its target ret;int3 without even impacting the surrounding padding. Whet= her > it inlines or merges, this drops 4 bytes. >=20 > The conditional jump at (2) could be folded in to any of the other exit > paths, dropping 16 bytes from the total size size. >=20 > I have no idea how easy/hard this may be to track down, or whether it is > worth pursuing urgently, but it probably does want looking at, seeing as = SLS > hardening doubles the hit. Please open a separate bug to track it. Should shrink-wrap handle it?=