From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 0429A3858437; Thu, 28 Oct 2021 22:07:47 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 0429A3858437 From: "andrew.cooper3 at citrix dot com" To: gcc-bugs@gcc.gnu.org Subject: [Bug target/102952] New code-gen options for retpolines and straight line speculation Date: Thu, 28 Oct 2021 22:07:47 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: gcc X-Bugzilla-Component: target X-Bugzilla-Version: 12.0 X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: andrew.cooper3 at citrix dot com X-Bugzilla-Status: WAITING X-Bugzilla-Resolution: X-Bugzilla-Priority: P3 X-Bugzilla-Assigned-To: hjl.tools at gmail dot com X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://gcc.gnu.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: gcc-bugs@gcc.gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gcc-bugs mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 28 Oct 2021 22:07:48 -0000 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D102952 --- Comment #22 from Andrew Cooper --- One curious thing I have discovered. While auditing the -mharden-sls=3Dall= code generation in Xen, I found examples where I got "ret int3 ret int3" with no intervening instructions. It turns out this is not a regression in this change. It is a pre-existing missing optimisation, which is made more obvious when every ret is extended with an int3. It occurs for functions with either no stack frame at all, or functions whi= ch have an early exit before setting up the stack frame. Some examples which occur at -O1 do not occur at -O2. One curious example which does still repro at -O2 is this. We have a hash lookup function: struct context *sidtab_search(struct sidtab *s, u32 sid) { int hvalue; struct sidtab_node *cur; if ( !s ) return NULL; hvalue =3D SIDTAB_HASH(sid); cur =3D s->htable[hvalue]; while ( cur !=3D NULL && sid > cur->sid ) cur =3D cur->next; if ( cur =3D=3D NULL || sid !=3D cur->sid ) { /* Remap invalid SIDs to the unlabeled SID. */ sid =3D SECINITSID_UNLABELED; hvalue =3D SIDTAB_HASH(sid); cur =3D s->htable[hvalue]; while ( cur !=3D NULL && sid > cur->sid ) cur =3D cur->next; if ( !cur || sid !=3D cur->sid ) return NULL; } return &cur->context; } which compiles (reformatted a little for width - unmodified: https://paste.debian.net/hidden/7bf675d6/) to: : 48 85 ff test %rdi,%rdi /------- 74 63 je | 48 8b 17 mov (%rdi),%rdx | 89 f0 mov %esi,%eax | 83 e0 7f and $0x7f,%eax | 48 8b 04 c2 mov (%rdx,%rax,8),%rax | 48 85 c0 test %rax,%rax | /--- 75 13 jne | /|--- eb 17 jmp | || 0f 1f 84 00 00 00 00 nopl 0x0(%rax,%rax,1) | || 00=20 | ||/-> 48 8b 40 48 mov 0x48(%rax),%rax | ||| 48 85 c0 test %rax,%rax | +||-- 74 06 je | |\|-> 39 30 cmp %esi,(%rax) | | \-- 72 f3 jb | /|---- 74 24 je | |\---> 48 8b 42 28 mov 0x28(%rdx),%rax | | 48 85 c0 test %rax,%rax | | /--- 75 11 jne |/|-|--- eb 32 jmp // (1) ||| | 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) ||| |/-> 48 8b 40 48 mov 0x48(%rax),%rax ||| || 48 85 c0 test %rax,%rax |||/||-- 74 17 je // (2) ||||\|-> 83 38 04 cmpl $0x4,(%rax) |||| \-- 76 f2 jbe |||| 83 38 05 cmpl $0x5,(%rax) +|||---- 75 15 jne ||\|---> 48 83 c0 08 add $0x8,%rax || | c3 retq=20=20=20 || | cc int3=20=20=20 || | 0f 1f 80 00 00 00 00 nopl 0x0(%rax) || \---> c3 retq // Target of (2) || cc int3=20=20=20 || 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1) \|-----> 31 c0 xor %eax,%eax | c3 retq=20=20=20 | cc int3=20=20=20 \-----> c3 retq // Target of (1) cc int3=20=20=20 66 90 xchg %ax,%ax There are 4 exits in total. Two have to set up %eax, so they can't usefull= y be merged. However, the unconditional jmp at (1) is 2 bytes, and could fully contain i= ts target ret;int3 without even impacting the surrounding padding. Whether it inlines or merges, this drops 4 bytes. The conditional jump at (2) could be folded in to any of the other exit pat= hs, dropping 16 bytes from the total size size. I have no idea how easy/hard this may be to track down, or whether it is wo= rth pursuing urgently, but it probably does want looking at, seeing as SLS hardening doubles the hit.=