Re: [PATCH] elf/dl-deps.c: Make _dl_build_local_scope breadth first

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: Mark Hatle <mark.hatle@kernel.crashing.org>,
	Khem Raj <raj.khem@gmail.com>,
	Libc-alpha <libc-alpha@sourceware.org>,
	Carlos O'Donell <carlos@redhat.com>
Subject: Re: [PATCH] elf/dl-deps.c: Make _dl_build_local_scope breadth first
Date: Thu, 13 Jan 2022 15:37:48 -0300	[thread overview]
Message-ID: <d3a89187-9e36-a9e9-037c-68ed0c559fd6@linaro.org> (raw)
In-Reply-To: <9f77a6e3-fa70-d445-0e70-45a176cb0a7f@kernel.crashing.org>



On 13/01/2022 15:00, Mark Hatle wrote:
> 
> 
> On 1/13/22 11:20 AM, Adhemerval Zanella wrote:
>>> When I last profiled this, roughly 3 1/2 years ago, the run-time linking speedup was huge.  There were two main advantages to this:
>>>
>>> * Run Time linking speedup - primarily helped initial application loads.  System boot times went from 10-15 seconds down to 4-5 seconds.  For embedded systems this was massive.
>>
>> Right, this is interesting.  Any profile data where exactly the speed is coming
>> from? I wonder if we could get any gain by optimizing the normal patch without
>> the need to resort to prelink.
> 
> glibc's runtime linker is very efficient, I don't honestly expect many speedups at this point.
> 
> This is partially from memory, so I may have a few details wrong... but
> 
> LD_DEBUG=statistics
> 
> On my ubuntu machine, just setting then and running /bin/bash results in:
> 
>     334067:   
>     334067:    runtime linker statistics:
>     334067:      total startup time in dynamic loader: 252415 cycles
>     334067:                time needed for relocation: 119006 cycles (47.1%)
>     334067:                     number of relocations: 412
>     334067:          number of relocations from cache: 3
>     334067:            number of relative relocations: 5100
>     334067:               time needed to load objects: 92655 cycles (36.7%)
>     334068:   
>     334068:    runtime linker statistics:
>     334068:      total startup time in dynamic loader: 125018 cycles
>     334068:                time needed for relocation: 40554 cycles (32.4%)
>     334068:                     number of relocations: 176
>     334068:          number of relocations from cache: 3
>     334068:            number of relative relocations: 1534
>     334068:               time needed to load objects: 45882 cycles (36.7%)
>     334069:   
>     334069:    runtime linker statistics:
>     334069:      total startup time in dynamic loader: 121500 cycles
>     334069:                time needed for relocation: 39067 cycles (32.1%)
>     334069:                     number of relocations: 136
>     334069:          number of relocations from cache: 3
>     334069:            number of relative relocations: 1274
>     334069:               time needed to load objects: 47505 cycles (39.0%)
>     334071:   
>     334071:    runtime linker statistics:
>     334071:      total startup time in dynamic loader: 111850 cycles
>     334071:                time needed for relocation: 35089 cycles (31.3%)
>     334071:                     number of relocations: 135
>     334071:          number of relocations from cache: 3
>     334071:            number of relative relocations: 1272
>     334071:               time needed to load objects: 45746 cycles (40.8%)
>     334072:   
>     334072:    runtime linker statistics:
>     334072:      total startup time in dynamic loader: 109827 cycles
>     334072:                time needed for relocation: 34863 cycles (31.7%)
>     334072:                     number of relocations: 145
>     334072:          number of relocations from cache: 3
>     334072:            number of relative relocations: 1351
>     334072:               time needed to load objects: 45565 cycles (41.4%)
> 
> (why so many, because it's running through the profile and other bash startups which end up running additional items.)
> 
> When prelinker worked...  the number of relocations (and especially cycles) required dropped to about 1-10% of the original application.  This compounded by the large number of executables loaded at boot (think of sysvinit with all of the shells started and destroyed) turned into a massive speedup during early boot process.
> 
> As a normal "user" behavior, the speedup is negligible, because the amount of time spent loading vs running is nothing.... but in automated processing where something, like bash, is started runs for a fraction of a second, exits.. "repeat" 1000s of times.. it really becomes a massive part of the time scale.
> 
> So back to the above, I know that in one instance that bash would end up with about 4 relocations, with 400+ from cache with the prelinker.  Resulting in the cycles required for relocations to be in the 10% of overall load time, with time needed to load objects being roughly 90%.
> 

Right, the compound improvements over all binaries make sense.

>>>
>>> * Memory usage.  The COW page usage for runtime linking can be significant on memory constrained systems.  Prelinking dropped the COW page usage in the systems I was looking at to about 10% of what it was prior.  This is believed to have further contributed to the boot time optimizations.
>>
>> Interesting, why exactly is prelinking help in COW usage? I would expect memory
>> utilization to be rough the same, is prelinking helping in aligning the segment
>> in a better way?
> 
> Each time a relocation occurs, the runtime linker needs to write into a page with that address.  No relocation, no runtime write, no COW page created.
> 
> Add to this mmap usage between applications, and you can run say 100 bash sessions and each session would use a fraction of the COW pages that it would without prelinking.

Yes, but not taking in consideration TEXTREL or writable PLT I am not 
seeing on how COW would help since the GOT (where mostly if not all 
relocation would happen) is anonymous mappings.

> 
> At one point I had statistics on this, but I don't even remember how this was calculated or done anymore.  (I had help from some kernel people to show me kernel memory use, contiguous pages, etc..)
> 
>>>
>>> Last I looked at this, only about 20-30% of the system is capable of prelinking anymore due to the evolutionary changes in the various toolchain elements, introducing new relocations and related components.  Even things like re-ordering sections (done a couple years ago in binutils) has broken the prelinker in mysterious ways.
>>
>> Yes and it is even harder to have a project that is dependent of both
>> static in dynamic linker to have out-of-tree developement without a
>> proper ABI definition.  That's why I think currently prelink is hackish
>> solution with a niche usage that adds a lot complexity to the code base.
>>
>> For instance, we are aiming to support DT_RELR which would help to
>> decrease the relocation segment size for PIE binaries.  It would be
>> probably another feature that prelink will lack support.
>>
>> In fact this information you provided that only 20-30% of all binaries
>> are supported makes even more willing to really deprecate prelink.
> 
> prelink has a huge advantage on embedded systems -- but it hasn't worked well for about 3 years now...  I was hoping other then life support someone would step up and contribute, and it never really happened.  There were a few bugs/fixes sent by Mentor that kept things going on a few platforms -- but even that eventually dried up.  (This is meant to thank them for the code and contributions they did!)
> 
>>>
>>> Add to this the IMHO mistaken belief that ASLR is some magic security device.  I see it more as a component of security that needs to be part of a broader system that you can decide to trade off against load performance (including memory). But the broad "if you don't use ASLR your device if vulnerable" mentality has seriously restricted the desire for people to use, improve and contribute back to the prelinker.
>>
>> ASLR/PIE is not a silver bullet, specially with limited mmap entropy on
>> 32 bit systems. But it is a gradual improvement over the multiple security
>> features we support (like the generic ones as relro, malloc safelink, etc.
>> to arch-specific one such as x86_64 CET or aarch64 BTI or PAC/RET).
> 
> Exactly it's multiple security features work together for a purpose.  But everyone got convinced ASLR was a silver bullet and that is what started the final death spiral of the prelinker (as it is today).
> 
>> My point is more that usually what we see is generic distribution is to
>> use more broader security features. I am not sure about embedded though.
> 
> Embedded needs security, no doubt.. but with the limited entropy (even on 64-bit, the entropy is truly limited.. great I now have to run my attack 15 times instead of 5..  that really isn't much of an improvement!) ASLR has become a check list item for some security consultant to approve a product release.
> 
> Things like the CET, BTI / PAC/RET have a much larger re-world security impact, IMHO.
> 
> So in the end the embedded development that I've been involved with has always had a series of "these are our options, in a perfect world we'd use them all -- but we don't have the memory (prelink helped), we've got disk space limits (can't use PAC/RET, binaries get bigger), we need to be able to upgrade the software (prelink on the device?  send pre-prelinked to all devices), we've got industry requirements (not all devices should have the same memory map, prelink ranomize addresses?), we've got maximum boot time requirements, etc.  It's not cut and dried what combination of those requirements, and which technologies (such as the prelinker) should be used to meet them.   As we have less operating system engineers, the preference is going away from using tools like prelink and lots of simple utilities into alternatives like "jumbo do it all" binaries that only get loaded once.  Avoiding initscript systems and packing system initialization into those binaries, or even moving
> to other libc's that have less relocation pressure (due to smaller libraires, feature sets, etc.)
> 
> If you declare prelink dead, then it's dead.. nobody will be bringing it back. But I do still believe technology wise it's a good technology for the embedded systems (remember embedded doesn't mean "small") to help the meet specific integration needs.  But without help from people with the appropriate knowledge to implement new features, like DT_RELR, in the prelinker -- there is little chance that it is anything but on life support.
> 

It is more and more I see that proper static linking is a *much* more
simple solution than prelink, with the advantage it also decrease code
complexity and attack surface and can keep up with ABI extension way
more easily.  For instance, static pie is now support on both glibc
and musl.

And it is not that I declare dead, but it will become a dead weight
support that we will need to provide for the sake of handful specific
usage that due lack of maintenance will have subtle issues and missing
support with the new ABI extensions.

next prev parent reply	other threads:[~2022-01-13 18:37 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 23:53 Khem Raj
2022-01-11 19:26 ` Adhemerval Zanella
2022-01-12 19:08   ` Mark Hatle
2022-01-12 20:12     ` Adhemerval Zanella
2022-01-12 20:41       ` Mark Hatle
2022-01-13 11:52         ` Adhemerval Zanella
2022-01-13 16:33           ` Mark Hatle
2022-01-13 17:20             ` Adhemerval Zanella
2022-01-13 18:00               ` Mark Hatle
2022-01-13 18:37                 ` Adhemerval Zanella [this message]
2022-01-13 19:01                   ` Carlos O'Donell
2022-01-13 19:02 ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d3a89187-9e36-a9e9-037c-68ed0c559fd6@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=mark.hatle@kernel.crashing.org \
    --cc=raj.khem@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).