From: Joao Moreira <joao@overdrivepizza.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Rui Ueyama <rui314@gmail.com>,
"Moreira, Joao" <joao.moreira@intel.com>,
Andi Kleen <andi@firstfloor.org>,
x86-64-abi <x86-64-abi@googlegroups.com>,
Binutils <binutils@sourceware.org>,
i@maskray.me
Subject: Re: x86-64: new CET-enabled PLT format proposal
Date: Tue, 01 Mar 2022 01:16:58 -0800 [thread overview]
Message-ID: <0e246cb968d3da5d8e9afa4055d432a1@overdrivepizza.com> (raw)
In-Reply-To: <CAMe9rOoVn0LKNCjiQKj31Fyoq_i8CsCvQzmiDvTsEUJCTd1TvQ@mail.gmail.com>
On 2022-02-28 16:04, H.J. Lu wrote:
> On Sun, Feb 27, 2022 at 7:46 PM Rui Ueyama <rui314@gmail.com> wrote:
>>
>> On Mon, Feb 28, 2022 at 12:07 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> > On Sat, Feb 26, 2022 at 7:19 PM Rui Ueyama via Binutils
>> > <binutils@sourceware.org> wrote:
>> > >
>> > > Hello,
>> > >
>> > > I'd like to propose an alternative instruction sequence for the Intel
>> > > CET-enabled PLT section. Compared to the existing one, the new scheme is
>> > > simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not
>> > > require a separate second PLT section (.plt.sec).
>> > >
>> > > Here is the proposed code sequence:
>> > >
>> > > PLT0:
>> > >
>> > > f3 0f 1e fa // endbr64
>> > > 41 53 // push %r11
>> > > ff 35 00 00 00 00 // push GOT[1]
>> > > ff 25 00 00 00 00 // jmp *GOT[2]
>> > > 0f 1f 40 00 // nop
>> > > 0f 1f 40 00 // nop
>> > > 0f 1f 40 00 // nop
>> > > 66 90 // nop
>> > >
>> > > PLTn:
>> > >
>> > > f3 0f 1e fa // endbr64
>> > > 41 bb 00 00 00 00 // mov $namen_reloc_index %r11d
>> > > ff 25 00 00 00 00 // jmp *GOT[namen_index]
>> >
>> > All PLT calls will have an extra MOV.
>>
>> One extra load-immediate mov instruction is executed per a function
>> call through a PLT entry. It's so tiny that I couldn't see any
>> difference in real-world apps.
(also replying to Fangrui, whose e-mail, for whatever reason, did not
come to this mailbox).
I can see the benefits of having 16 byte/single plt entries. Yet, the
R11 clobbering on every PLT transition is not amusing... If we want PLT
entries to have only 16 bytes and not have a sec.plt section, maybe we
could try:
<plt_header>
pop %r11
sub %r11d, plt_header
shr $0x5, %r11
push %r11
jmp _dl_runtime_resolve_shstk_thunk
<foo>:
endbr // 4b
jmp GOT[foo] // 6b
call plt_header // 5b
Here, the plt entry has 16 bytes and it pushes the PLT entry address to
the stack by calling it. The address is then popped in the plt_header
and worked to retrieve the index by subbing the plt offset from the
address and then dividing it by 16. Then, the final step to make it
shstk compatible is jmping to a special implementation of
_dl_runtime_resolve (shstk_thnk) which will have the following snippet
(similarly to glibc's __longjmp):
testl $X86_FEATURE_1_SHSTK, %fs:FEATURE_1_OFFSET
jz 1
mov $1, %r11
incsspq %r11
1:
jmp _dl_runtime_resolve
I don't think the above test fits along with the other instructions in
the plt_header if we want it 32b at most, thus the suggestion for having
it as a __dl_runtime_resolve thunk. Another possibility is to also
resolve the relocation to the special thunk only if shstk is in place,
if not, resolve it directly to _dl_runtime_resolve to prevent resolving
overheads in the absence of shstk.
I think this solves both the size and the dummy mov overheads. The logic
is a bit more convoluted, but perhaps we can work on making it simpler.
Fwiiw, I did not test nor implement anything.
Ah, also, pardon any asm mistakes/obvious details that I may have missed
:)
next prev parent reply other threads:[~2022-03-01 9:17 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-02-27 3:18 Rui Ueyama
2022-02-27 15:06 ` H.J. Lu
2022-02-28 3:46 ` Rui Ueyama
2022-03-01 0:04 ` H.J. Lu
2022-03-01 0:30 ` Rui Ueyama
2022-03-01 2:22 ` Fangrui Song
2022-03-01 9:16 ` Joao Moreira [this message]
2022-03-01 9:25 ` Rui Ueyama
2022-03-01 9:27 ` Joao Moreira
2022-03-01 9:32 ` Rui Ueyama
2022-03-01 9:45 ` Joao Moreira
2022-03-01 9:48 ` Rui Ueyama
2022-03-01 10:35 ` Florian Weimer
2022-03-01 22:16 ` Fangrui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0e246cb968d3da5d8e9afa4055d432a1@overdrivepizza.com \
--to=joao@overdrivepizza.com \
--cc=andi@firstfloor.org \
--cc=binutils@sourceware.org \
--cc=hjl.tools@gmail.com \
--cc=i@maskray.me \
--cc=joao.moreira@intel.com \
--cc=rui314@gmail.com \
--cc=x86-64-abi@googlegroups.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).