public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* x86-64: new CET-enabled PLT format proposal
@ 2022-02-27  3:18 Rui Ueyama
  2022-02-27 15:06 ` H.J. Lu
  0 siblings, 1 reply; 14+ messages in thread
From: Rui Ueyama @ 2022-02-27  3:18 UTC (permalink / raw)
  To: binutils

Hello,

I'd like to propose an alternative instruction sequence for the Intel
CET-enabled PLT section. Compared to the existing one, the new scheme is
simple, compact (32 bytes vs. 16 bytes for each PLT entry) and does not
require a separate second PLT section (.plt.sec).

Here is the proposed code sequence:

  PLT0:

  f3 0f 1e fa        // endbr64
  41 53              // push %r11
  ff 35 00 00 00 00  // push GOT[1]
  ff 25 00 00 00 00  // jmp *GOT[2]
  0f 1f 40 00        // nop
  0f 1f 40 00        // nop
  0f 1f 40 00        // nop
  66 90              // nop

  PLTn:

  f3 0f 1e fa        // endbr64
  41 bb 00 00 00 00  // mov $namen_reloc_index %r11d
  ff 25 00 00 00 00  // jmp *GOT[namen_index]

GOT[namen_index] is initialized to PLT0 for all PLT entries, so that when a
PLT entry is called for the first time, the control is passed to PLT0 to call
the resolver function.

It uses %r11 as a scratch register. x86-64 psABI explicitly allows PLT entries
to clobber this register (*1), and the resolve function (__dl_runtime_resolve)
already clobbers it.

(*1) x86-64 psABI p.24 footnote 17: "Note that %r11 is neither required to be
preserved, nor is it used to pass arguments. Making this register available as
scratch register means that code in the PLT need not spill any registers when
computing the address to which control needs to be transferred."

FYI, this is the current CET-enabled PLT:

  PLT0:

  ff 35 00 00 00 00    // push GOT[0]
  f2 ff 25 e3 2f 00 00 // bnd jmp *GOT[1]
  0f 1f 00             // nop

  PLTn in .plt:

  f3 0f 1e fa          // endbr64
  68 00 00 00 00       // push $namen_reloc_index
  f2 e9 e1 ff ff ff    // bnd jmpq PLT0
  90                   // nop

  PLTn in .plt.sec:

  f3 0f 1e fa          // endbr64
  f2 ff 25 ad 2f 00 00 // bnd jmpq *GOT[namen_index]
  0f 1f 44 00 00       // nop

In the proposed format, PLT0 is 32 bytes long and each entry is 16 bytes. In
the existing format, PLT0 is 16 bytes and each entry is 32 bytes. Usually, we
have many PLT sections while we have only one header, so in practice, the
proposed format is almost 50% smaller than the existing one.

The proposed PLT does not use jump instructions with BND prefix, as Intel MPX
has been deprecated.

I already implemented the proposed scheme to my linker
(https://github.com/rui314/mold) and it looks like it's working fine.

Any thoughts?

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-03-01 22:16 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-27  3:18 x86-64: new CET-enabled PLT format proposal Rui Ueyama
2022-02-27 15:06 ` H.J. Lu
2022-02-28  3:46   ` Rui Ueyama
2022-03-01  0:04     ` H.J. Lu
2022-03-01  0:30       ` Rui Ueyama
2022-03-01  2:22         ` Fangrui Song
2022-03-01  9:16       ` Joao Moreira
2022-03-01  9:25         ` Rui Ueyama
2022-03-01  9:27           ` Joao Moreira
2022-03-01  9:32             ` Rui Ueyama
2022-03-01  9:45               ` Joao Moreira
2022-03-01  9:48                 ` Rui Ueyama
2022-03-01 10:35   ` Florian Weimer
2022-03-01 22:16     ` Fangrui Song

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).