public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Mark Hatle <mark.hatle@kernel.crashing.org>
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>,
	Khem Raj <raj.khem@gmail.com>,
	Libc-alpha <libc-alpha@sourceware.org>,
	"Carlos O'Donell" <carlos@redhat.com>
Subject: Re: [PATCH] elf/dl-deps.c: Make _dl_build_local_scope breadth first
Date: Thu, 13 Jan 2022 10:33:44 -0600	[thread overview]
Message-ID: <631ee5ae-c334-39bb-91b0-8e8531967567@kernel.crashing.org> (raw)
In-Reply-To: <9384a4f0-095a-a818-e48e-026dfdfc8efd@linaro.org>



On 1/13/22 5:52 AM, Adhemerval Zanella wrote:
> 
> 
> On 12/01/2022 17:41, Mark Hatle wrote:
>>
>>
>> On 1/12/22 2:12 PM, Adhemerval Zanella wrote:
>>>
>>>
>>> On 12/01/2022 16:08, Mark Hatle wrote:
>>>>
>>>>
>>>> On 1/11/22 1:26 PM, Adhemerval Zanella wrote:
>>>>>
>>>>>
>>>>> On 09/12/2021 20:53, Khem Raj via Libc-alpha wrote:
>>>>>> From: Mark Hatle <mark.hatle@windriver.com>
>>>>>>
>>>>>> According to the ELF specification:
>>>>>>
>>>>>> When resolving symbolic references, the dynamic linker examines the symbol
>>>>>> tables with a breadth-first search.
>>>>>>
>>>>>> This function was using a depth first search.  By doing so the conflict
>>>>>> resolution reported to the prelinker (when LD_TRACE_PRELINKING=1 is set)
>>>>>> was incorrect.  This caused problems when their were various circular
>>>>>> dependencies between libraries.  The problem usually manifested itself by
>>>>>> the wrong IFUNC being executed.
>>>>>>
>>>>>> Similar issue has been reported here [1]
>>>>>>
>>>>>> [BZ# 20488]
>>>>>>
>>>>>> [1] https://sourceware.org/legacy-ml/libc-alpha/2016-05/msg00034.html
>>>>>>
>>>>>> Signed-off-by: Mark Hatle <mark.hatle@kernel.crashing.org>
>>>>>> Signed-off-by: Khem Raj <raj.khem@gmail.com>
>>>>>
>>>>> I am trying to understand why it this only an issue for LD_TRACE_PRELINKING=1,
>>>>> do we have a testcase that stress it for a default usercase?
>>>>
>>>> The underlying issue here is that resolution is happening depth first and not breadth first.  According to the ELF spec, all resolution should be breadth-first.
>>>>
>>>>
>>>> As noted in item in above, the prelinker just happens to be a way to actually show that the behavior is incorrect.  (There even appears to be a related defect with a reproducer.)
>>>>
>>>>
>>>> When taking the values from LD_TRACE_PRELINKING=1, various addresses and conflict resolutions are specified.  When you compare what is reported, vs what happens, vs what the spec says they don't align as they should.
>>>
>>> That was pretty clear from bug report, what I am trying to understand is
>>> why it seems to not being reported before in default non-prelinked usage.
>>>
>>> Also, the patch only changes the semantic to prelinked binaries which is
>>> at least troublesome: this semantic difference at symbol resolution is
>>> a potential source of issues where user see different behavior depending
>>> whether prelinked was used or not.
>>>
>>> I am also worried because making such change might potentially trigger
>>> some hard to diagnostic breakage.
>>>
>>>>
>>>>
>>>>> When you say the 'wrong IFUNC being executed' what exactly you mean here?
>>>>> Could we use a testcase based on this?
>>>>
>>>>
>>>> The prelinker (and possibly just in general), the IFUNC address used is the one from the wrong library scope.  I personally have never tried to reproduce this outside of the prelinking use-case, but based on the referenced report and the code at the time of the change, it is believed this could happen (but rarely) without a prelinked system in a very complex case with multiple libraries providing the same functions.
>>>>
>>>> The main issue (my memory is sketchy sorry this might be wrong) is that if one or more functions is an IFUNC and one or more functions is NOT, then the you can get into a situation with a conflict of the wrong function being called using the wrong mechanism.
>>>>
>>>> Definitely in the prelink case, the resolver gave the address of a regular function which was then placed into an IFUNC (or maybe it was the other way around) triggering the runtime segfault.
>>>
>>> I asked about ifunc because it should not really be dependent where ifunc
>>> is used or not, what might be happening is this issue triggers the long
>>> standing ifunc ordering issue more often.
>>
>> IFUNC is the only place we ever reproduced it.  At least one of the functions had to be an ifunc.  (Or maybe a better way to say it is at least one function was an ifunc and one wasn't, but they both had the same name.)
>>
>>> To summarize, although this change only affects prelinked binaries I think
>>> it would be better to change for default semantic as well.  We can make
>>> it a tunable for the transition, like new DSO sorting algorithm with
>>> glibc.rtld.dynamic_sort, and evaluate on Fedora rawhide, and then hit
>>> the switch to make it the default.
>>
>> It absolutely affects prelinked binaries.  At the time we believed it COULD affect regular binaries, but we had never seen a failure in the wild.
>>
>>> We will also need regression testcases, although not sure what kind
>>> of coverage we will need to provide (our own testsuite currently does not
>>> trigger any issue though).
>>
>> This is the test case that was developed at the time for including inside of the prelinker:
>>
>> https://git.yoctoproject.org/prelink-cross/commit/testsuite?id=8f55afd84b3580b1f1d6af904e8c2a39221055b7
>>
>> It essentially makes three libraries, with two functions (two libraries have the same function).  The resulting value expects the breadth-first loading behavior in order to generate the correct output.
>>
>> The main purpose of the prelink test cases was to ensure the same behavior before and after the prelinking.  The same should be true with this change to glibc, everything should work the same before and after the change, unless the stuff before was an error (wrong function used).
> 
> The test does not really exercises IFUNC, however on the patch description it
> seems that actually tries to fix a upstream bug that we already fixed [1].
> 
> And checking prelink output with this test, the patch does not really seem
> to be changing the binding of the 'value' function:
> 
> * master

Master branch is for conventional prelink, batches are cross-ported with the 
cross_prelink branch.   cross_prelink is where this was developed, so the chunk 
of the commit message about changing the linker has nothing to do with the bit I 
was showing you.  (Find the corresponding patch in cross_prelink, and you'll see 
the RTLD side of things.)

> $ LD_TRACE_PRELINKING=1 [...] ./order
> [...]
>          ./order => ./order (0x00007f10eca07000, 0x00007f10eca07000)
>          orderlib.so => ./orderlib.so (0x00007f10eca00000, 0x00007f10eca00000)
>   [***]  orderlib3.so => ./orderlib3.so (0x00007f10ec9fb000, 0x00007f10ec9fb000)
>          libc.so.6 => /home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so.6 (0x00007f10ec7db000, 0x00007f10ec7db000) TLS(0x1, 0x0000000000000090)
>          orderlib1.so => ./orderlib1.so (0x00007f10ec7d4000, 0x00007f10ec7d4000)
>          orderlib2.so => ./orderlib2.so (0x00007f10ec7cf000, 0x00007f10ec7cf000)
>          /lib64/ld-linux-x86-64.so.2 => /home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld-linux-x86-64.so.2 (0x00007f10eca0c000, 0x00007f10eca0c000)
> [...]
>          lookup 0x00007f10eca07000 0x0000000000000468 -> 0x00007f10ec9fb000 0x0000000000001100 /1 value
> [...]
> 
> * patch
> 
> $ LD_TRACE_PRELINKING=1 [...] ./order
> [...]
>          ./order => ./order (0x00007f30cf770000, 0x00007f30cf770000)
>          orderlib.so => ./orderlib.so (0x00007f30cf769000, 0x00007f30cf769000)
>   [***]  orderlib3.so => ./orderlib3.so (0x00007f30cf764000, 0x00007f30cf764000) *
>          libc.so.6 => /home/azanella/Projects/glibc/build/x86_64-linux-gnu/libc.so.6 (0x00007f30cf544000, 0x00007f30cf544000) TLS(0x1, 0x0000000000000090)
>          orderlib1.so => ./orderlib1.so (0x00007f30cf53d000, 0x00007f30cf53d000)
>          orderlib2.so => ./orderlib2.so (0x00007f30cf538000, 0x00007f30cf538000)
>          /lib64/ld-linux-x86-64.so.2 => /home/azanella/Projects/glibc/build/x86_64-linux-gnu/elf/ld-linux-x86-64.so.2 (0x00007f30cf775000, 0x00007f30cf775000)
> [...]
> lookup 0x00007f30cf770000 0x0000000000000468 -> 0x00007f30cf764000 0x0000000000001100 /1 value
> [...]
> 
> So on both cases 'value' binds to orderlib3.so, which is the expected value.  I really
> think we need to came with a testcase to actually stress it.

It's more then just LD_TRACE_PRELINKING=1, the environment used is usually 
similar to:

LD_TRACE_PRELINKING=1
LD_BIND_NOW=1
LD_TRACE_LOADED_OBJECTS=1

Then for each library and executable it runs the ld.so and captures this 
information.

> At least  _dl_build_local_scope is localized to prelink support, which I think it should
> be safer to change.

Due to relocation and other changes, I don't know if you can run the prelinker 
for the test cases, but if you do -- it's pretty easy to show the issue in this 
case.

>>
>> We have also been using this patch in the Yocto Project since 2016.  And we've never had a report of an incompatibility/failure.  So it really is quite low risk, but I'll never say it's "no risk".
>>
> 
> Recently Florian has asked about prelinked support [1], and Joseph
> answer seems that Yocto still provide support for it [3].  IMHO I
> would just deprecate it, so we can eventually clean this up: it
> requires a lot of hacks within loader, its support is not straightforward
> for newer ports, and it even make less sense with ASLR and PIE executable.

I have stopped maintaining the prelinker at this time, as I don't have the 
knowledge for all of the latest fixups.  Over the last 3 or so years, the 
prelinker itself has fallen into a state of disrepair as additional relocations, 
functions, etc have been added to the system.  I don't have the time or more 
importantly knowledge to deal with the recent changes.

I would be happy to resume some of the maintenance activities if others would 
provide the fixes necessary for the latest compiler/library implementations -- 
but the number of people who understand ELF at that level AND have any desire 
for prelinking can probably be counted on one hand.

Even trying to use the prelinker with a somewhat modern Ubuntu system reports 
errors like:

../src/prelink: ./order: Could not parse `lookup 0x00007fffd8b0a000 
0xfffffffffffcd180 -> 0x00007fffd8b0a000 0x00000000000008e0 /0 __vdso_clock_gettime'

While I'm sure that one wouldn't be too hard to fix, then there is another and 
another..  (I believe the issue is that the VDSO items are presenting BEFORE the 
list of libraries which is change in behavior....)

(I did try to just skip those lookups, and it results in the prelinked code 
crashing..)

> It would be good to know if the performance optimization it should bring
> does pay off the clutter and complexity it adds on loader code.

When I last profiled this, roughly 3 1/2 years ago, the run-time linking speedup 
was huge.  There were two main advantages to this:

* Run Time linking speedup - primarily helped initial application loads.  System 
boot times went from 10-15 seconds down to 4-5 seconds.  For embedded systems 
this was massive.

* Memory usage.  The COW page usage for runtime linking can be significant on 
memory constrained systems.  Prelinking dropped the COW page usage in the 
systems I was looking at to about 10% of what it was prior.  This is believed to 
have further contributed to the boot time optimizations.

Last I looked at this, only about 20-30% of the system is capable of prelinking 
anymore due to the evolutionary changes in the various toolchain elements, 
introducing new relocations and related components.  Even things like 
re-ordering sections (done a couple years ago in binutils) has broken the 
prelinker in mysterious ways.

Add to this the IMHO mistaken belief that ASLR is some magic security device.  I 
see it more as a component of security that needs to be part of a broader system 
that you can decide to trade off against load performance (including memory). 
But the broad "if you don't use ASLR your device if vulnerable" mentality has 
seriously restricted the desire for people to use, improve and contribute back 
to the prelinker.

> 
> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=19861
> [2] https://sourceware.org/pipermail/libc-alpha/2021-August/130404.html
> [3] https://git.yoctoproject.org/prelink-cross/
> 

  reply	other threads:[~2022-01-13 16:36 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-09 23:53 Khem Raj
2022-01-11 19:26 ` Adhemerval Zanella
2022-01-12 19:08   ` Mark Hatle
2022-01-12 20:12     ` Adhemerval Zanella
2022-01-12 20:41       ` Mark Hatle
2022-01-13 11:52         ` Adhemerval Zanella
2022-01-13 16:33           ` Mark Hatle [this message]
2022-01-13 17:20             ` Adhemerval Zanella
2022-01-13 18:00               ` Mark Hatle
2022-01-13 18:37                 ` Adhemerval Zanella
2022-01-13 19:01                   ` Carlos O'Donell
2022-01-13 19:02 ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=631ee5ae-c334-39bb-91b0-8e8531967567@kernel.crashing.org \
    --to=mark.hatle@kernel.crashing.org \
    --cc=adhemerval.zanella@linaro.org \
    --cc=carlos@redhat.com \
    --cc=libc-alpha@sourceware.org \
    --cc=raj.khem@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).