public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Joao Moreira <joao@overdrivepizza.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Rui Ueyama <rui314@gmail.com>,
	"Moreira, Joao" <joao.moreira@intel.com>,
	Andi Kleen <andi@firstfloor.org>,
	x86-64-abi <x86-64-abi@googlegroups.com>,
	Binutils <binutils@sourceware.org>,
	i@maskray.me
Subject: Re: x86-64: new CET-enabled PLT format proposal
Date: Tue, 01 Mar 2022 01:16:58 -0800	[thread overview]
Message-ID: <0e246cb968d3da5d8e9afa4055d432a1@overdrivepizza.com> (raw)
In-Reply-To: <CAMe9rOoVn0LKNCjiQKj31Fyoq_i8CsCvQzmiDvTsEUJCTd1TvQ@mail.gmail.com>

On 2022-02-28 16:04, H.J. Lu wrote:
> On Sun, Feb 27, 2022 at 7:46 PM Rui Ueyama <rui314@gmail.com> wrote:
>> 
>> On Mon, Feb 28, 2022 at 12:07 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>> >
>> > On Sat, Feb 26, 2022 at 7:19 PM Rui Ueyama via Binutils
>> > <binutils@sourceware.org> wrote:
>> > >
>> > > Hello,
>> > >
>> > > I'd like to propose an alternative instruction sequence for the Intel
>> > > CET-enabled PLT section. Compared to the existing one, the new scheme is
>> > > simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not
>> > > require a separate second PLT section (.plt.sec).
>> > >
>> > > Here is the proposed code sequence:
>> > >
>> > >   PLT0:
>> > >
>> > >   f3 0f 1e fa        // endbr64
>> > >   41 53              // push %r11
>> > >   ff 35 00 00 00 00  // push GOT[1]
>> > >   ff 25 00 00 00 00  // jmp *GOT[2]
>> > >   0f 1f 40 00        // nop
>> > >   0f 1f 40 00        // nop
>> > >   0f 1f 40 00        // nop
>> > >   66 90              // nop
>> > >
>> > >   PLTn:
>> > >
>> > >   f3 0f 1e fa        // endbr64
>> > >   41 bb 00 00 00 00  // mov $namen_reloc_index %r11d
>> > >   ff 25 00 00 00 00  // jmp *GOT[namen_index]
>> >
>> > All PLT calls will have an extra MOV.
>> 
>> One extra load-immediate mov instruction is executed per a function
>> call through a PLT entry. It's so tiny that I couldn't see any
>> difference in real-world apps.

(also replying to Fangrui, whose e-mail, for whatever reason, did not 
come to this mailbox).

I can see the benefits of having 16 byte/single plt entries. Yet, the 
R11 clobbering on every PLT transition is not amusing... If we want PLT 
entries to have only 16 bytes and not have a sec.plt section, maybe we 
could try:

<plt_header>
pop %r11
sub %r11d, plt_header
shr $0x5, %r11
push %r11
jmp _dl_runtime_resolve_shstk_thunk

<foo>:
endbr // 4b
jmp GOT[foo] // 6b
call plt_header // 5b

Here, the plt entry has 16 bytes and it pushes the PLT entry address to 
the stack by calling it. The address is then popped in the plt_header 
and worked to retrieve the index by subbing the plt offset from the 
address and then dividing it by 16. Then, the final step to make it 
shstk compatible is jmping to a special implementation of 
_dl_runtime_resolve (shstk_thnk) which will have the following snippet 
(similarly to glibc's __longjmp):

testl $X86_FEATURE_1_SHSTK, %fs:FEATURE_1_OFFSET
jz 1
mov $1, %r11
incsspq %r11
1:
jmp _dl_runtime_resolve

I don't think the above test fits along with the other instructions in 
the plt_header if we want it 32b at most, thus the suggestion for having 
it as a __dl_runtime_resolve thunk. Another possibility is to also 
resolve the relocation to the special thunk only if shstk is in place, 
if not, resolve it directly to _dl_runtime_resolve to prevent resolving 
overheads in the absence of shstk.

I think this solves both the size and the dummy mov overheads. The logic 
is a bit more convoluted, but perhaps we can work on making it simpler. 
Fwiiw, I did not test nor implement anything.

Ah, also, pardon any asm mistakes/obvious details that I may have missed 
:)

  parent reply	other threads:[~2022-03-01  9:17 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-27  3:18 Rui Ueyama
2022-02-27 15:06 ` H.J. Lu
2022-02-28  3:46   ` Rui Ueyama
2022-03-01  0:04     ` H.J. Lu
2022-03-01  0:30       ` Rui Ueyama
2022-03-01  2:22         ` Fangrui Song
2022-03-01  9:16       ` Joao Moreira [this message]
2022-03-01  9:25         ` Rui Ueyama
2022-03-01  9:27           ` Joao Moreira
2022-03-01  9:32             ` Rui Ueyama
2022-03-01  9:45               ` Joao Moreira
2022-03-01  9:48                 ` Rui Ueyama
2022-03-01 10:35   ` Florian Weimer
2022-03-01 22:16     ` Fangrui Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0e246cb968d3da5d8e9afa4055d432a1@overdrivepizza.com \
    --to=joao@overdrivepizza.com \
    --cc=andi@firstfloor.org \
    --cc=binutils@sourceware.org \
    --cc=hjl.tools@gmail.com \
    --cc=i@maskray.me \
    --cc=joao.moreira@intel.com \
    --cc=rui314@gmail.com \
    --cc=x86-64-abi@googlegroups.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).