public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/6] .text.subsections for some questionable benefit
@ 2023-05-15 14:48 Sergey Bugaev
  2023-05-15 14:48 ` [RFC PATCH 1/6] Mark more functions as __COLD Sergey Bugaev
                   ` (6 more replies)
  0 siblings, 7 replies; 14+ messages in thread
From: Sergey Bugaev @ 2023-05-15 14:48 UTC (permalink / raw)
  To: libc-alpha

Hello,

this patch series is the continuation of the __COLD patchset, and the
result of me looking into how GCC places some code into the
.text.xxxxxx subsections instead of the regular .text. Namely, as far
as I was able to understand, GCC does the following:

1. Functions marked with __atrribute__ ((cold)) are (among other
   effects) placed into .text.unlikely;
2. Similarly, functions marked with __atrribute__ ((hot)) get placed
   into .text.hot;
3. ELF constructors and main () are placed into .text.startup;
4. ELF destructors are placed into .text.exit.

When using profile-guiaded optimization, GCC may be able to make
decisions about this differently based on the profile data, but those
are the static rules.

The default linker script (ld --verbose) contains the following
stanza for constructing the .text of the final executable/library:

  .text           :
  {
    *(.text.unlikely .text.*_unlikely .text.unlikely.*)
    *(.text.exit .text.exit.*)
    *(.text.startup .text.startup.*)
    *(.text.hot .text.hot.*)
    *(SORT(.text.sorted.*))
    *(.text .stub .text.* .gnu.linkonce.t.*)
    /* .gnu.warning sections are handled specially by elf.em.  */
    *(.gnu.warning)
  }

So: the contents of .text.{unlikely,hot,startup.exit} of the linked
object files are grouped together during linking, but all end up
inside the final binary's .text.

Since GCC does not intrinsically know about glibc specifics, it makes
some sense to try and help it with finding startup- and exit-only
code. Hence, __TEXT_STARTUP and __TEXT_EXIT macros.

The supposed benefit of this is cache locality. As I understand it,
it's two-sided. For instance, talking about .text.exit:

1. During normal runtime (when not exiting yet), the .text.exit
   functions don't "get in the way", i.e. don't take up the precious
   place in the caches.
2. During exit, the code to be run (a large part of it anyway) is
   located in mostly the same place, and now it _is_, rightfully,
   taking up the cache space, and making full use of it.

The same applies to .text.startup. And depending on how lucky you are,
your system may not need to page in .text.unlikely at all -- if
nothing on the system abort ()s or error ()s out.

That's the idea anyway.

I have checked that indeed, the various startup, exit, and cold
functions are all neatly grouped together with this patchset. What I
have not done is I have not run any benchmarks (what would be the
relevant benchmarks to run?), so I can't tell if this provides any
noticeable benefit.

But having spent countless hours over the last few weeks single-
stepping through x86_64 Hurd startup in QEMU, I can confidently say
that during libc startup, it page faults on missing code pages way too
often. This is normally invisible to the program and to the debugger,
but very visible when you're debugging the whole system.

One more thing: the Linux kernel has a somewhat similar thing with
__init and __exit macros, which place the annotated function into
.init.text and .exit.text. They then do further tricks with this, such
(potentially?) unmapping the pages containing .init.text after startup
is completed. The SerenityOS Kernel similarly has UNMAP_AFTER_INIT
(and READONLY_AFTER_INIT, which is like attribute_relro).

This patchset *does not* introduce anything like that. It only does
grouping (and even that is done by GCC/ld), not any unmapping. It is
still 100% safe to call any __TEXT_STARTUP function after startup
(such as if a function has been mistakenly marked __TEXT_STARTUP, or
only normally used during startup, but may also be called later in
some exceptional / rare cases).

Now to the downsides:
1. This adds __TEXT_STARTUP annotations all over the place,
   particularly in elf/. So much code churn for some questionable and
   frankly theoretical benefit.
2. Even worse, this modifies assembly code! -- on all architectures.
   These are the architectures I have not even *heard* of, and cannot
   cross-compile for or test on. Surely I should not be allowed
   anywhere near writing assembly code for them!

   Counterpoint: I'm not altering the actual assembly code, I'm only
   really changing ".text" to ".section .text.startup", what could
   possibly go wrong?

Sergey

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-05-22 20:42 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-15 14:48 [RFC PATCH 0/6] .text.subsections for some questionable benefit Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 1/6] Mark more functions as __COLD Sergey Bugaev
2023-05-15 15:22   ` Andreas Schwab
2023-05-15 15:27     ` Sergey Bugaev
2023-05-18 17:06       ` [PATCH v2] " Sergey Bugaev
2023-05-18 19:43         ` Adhemerval Zanella Netto
2023-05-19 10:35           ` Sergey Bugaev
2023-05-22 20:41             ` Adhemerval Zanella Netto
2023-05-15 14:48 ` [RFC PATCH 2/6] mcheck: Microoptimize Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 3/6] sys/cdefs.h: Define __TEXT_STARTUP & __TEXT_EXIT Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 4/6] Mark various functions as __TEXT_STARTUP and __TEXT_EXIT Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 5/6] Also place entry points into .text.startup Sergey Bugaev
2023-05-15 14:48 ` [RFC PATCH 6/6] mach: In rtld, mark MIG routines as __TEXT_STARTUP Sergey Bugaev
2023-05-15 15:33 ` [RFC PATCH 0/6] .text.subsections for some questionable benefit Cristian Rodríguez

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).