public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
* Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands
@ 2022-07-09  1:28 N
  2022-07-11 18:21 ` Adhemerval Zanella
  0 siblings, 1 reply; 4+ messages in thread
From: N @ 2022-07-09  1:28 UTC (permalink / raw)
  To: libc-help

Good Afternoon,

I've found a bug in the dynamic loader, ld_trace_loaded_objects (ldd).
Output from ldd will introduce non-deterministic behavior to any piped 
commands that receive it as input.

I've confirmed it currently impacts Ubuntu 18.04 LTS (ldd 2.27) and the 
latest 2.35 (master) release as well.

_*Bug replication:*_

Test:

ldd /usr/sbin/sshd | cut -d' ' -f1,2       { or insert any binary }

Bug Present Output:

_Normal (without pipe) :_

      linux-vdso.so.1 (0x00007ffc99f9c000)
      libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 
(0x00007fe195c57000)
      ...
      /lib64/ld-linux-x86-64.so.2 (0x00007fe19612b000)
      ...

_Test :
_
      linux-vdso.so.1 (0x00007fff357cb000)
      libwrap.so.0 =>
      ...
      /lib64/ld-linux-x86-64.so.2 (0x00007f557d923000)
      ...

The issue is with how the output for in-memory structures for 
linux-vdso.so.1 and ld-linux-x86-64.so.2 are handled.
They are both in-memory structure listings, and when fields are empty it 
appears to flatten output of different fields to the same field/column.

_Observations/Test Symptoms_*:* The flattening of the output for these 
different listings causes 'cut' to incorrectly process the field/columns 
which are empty depending, in a manner that depends on the listing type, 
adjusting the output with grep fails as well, but in other ways.

Fundamentally, the resulting flattened output fails the 1:1 state 
requirement/property needed by discrete automata/DTI systems to function 
deterministically.
Depending on how the empty fields get flattened there are multiple 
different next-state edges on the resulting graph (a NFA) which then 
gets passed to the pipe.

_*Workaround:*_

To run piped commands deterministically from ldd, any output needs to be 
non-trivially pre-processed to temporarily remove in-memory structures 
from the output, process in-memory structures to correct the flattening 
issue, and then add the listings back in while preserving the original 
sequence order.

_*Potential Solutions (RFC):*_

The core of the challenge is multiple implicit states after output has 
been flattened on the pipe.

A consistent output by using a placeholder for empty values would be a 
simple solution, but could it potentially be too simple?

For my use, I ended up changing the format of output to something along 
the lines as below though I'm unsure how that change would impact other 
potential use-cases.
My case is semi-niche since I was using it to extract dynamic dependency 
information to be used as metadata in an automated compilation/packaging 
pipeline I've been working on.

Proposed Output Format (| regex or):

so-basename => memory|abspath (mmap address)

ex:
linux-vdso.so.1 => heap (address)

ld-linux-x86-64.so.2 => abspath (mmap address)


Option 2) <RFC...>


_*Additional Notes:*_

I've since had conversations with a few people about this challenge, and 
have been told this bug may not be present in lddtree (pax-utils).
I'll be looking into this later, if my solution ends up needing revision.

I attempted to report this issue to the official sourceware bug tracker 
but new account creation has been disabled.
Since this hopefully will be a one-off report, please see the request below.

The issue appears to have been present since at least 2.27, a more 
in-depth look will likely be needed to determine if this is a regression 
(for unit-test development/TDD).

_*History:*_
I originally posted this issue on Ubuntu's Launchpad about 3 years ago 
(2018/2019); that posting appears to have since vanished, and it doesn't 
appear they ever reached out to the glibc project to have this addressed 
upstream. While trying to get this second report in to upstream, I 
initially mistakenly thought this was part of the gnu coreutils project, 
thankfully they were helpful and understanding in helping me get this to 
the right project.

I haven't reopened a new report with Ubuntu because Canonical's bug 
tracker is also effectively down, blocking new reports from being submitted.

_*Request*__:_
I'd appreciate it if someone on the glibc mailing list can populate a 
bug report on the tracker for this issue with the information I've 
provided.

I've spent more time on this than I originally wanted to both in 
documentation and a lot more getting this report submitted for 
correction to the right people.

I have solved it for myself by writing a python helper that I place 
in-line on the pipe. Output differs enough between the three listing 
cases that a program can correct the flattening (at least in my case).

The proposed solution seems like a relatively straight-forward fix so 
hopefully others won't need to run down this rabbit-hole again.

Best Regards,
N

--@Paul, bcc'ed to keep you in the loop regarding how this all turned 
out. No further action or follow-up is needed. Thank you again for your 
assistance with this. Hopefully the report will make it onto the glibc 
project's bug tracker for correction this time.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands
  2022-07-09  1:28 Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands N
@ 2022-07-11 18:21 ` Adhemerval Zanella
  2022-07-11 18:41   ` Florian Weimer
  0 siblings, 1 reply; 4+ messages in thread
From: Adhemerval Zanella @ 2022-07-11 18:21 UTC (permalink / raw)
  To: N; +Cc: libc-help



> On 8 Jul 2022, at 22:28, N via Libc-help <libc-help@sourceware.org> wrote:
> 
> Good Afternoon,
> 
> I've found a bug in the dynamic loader, ld_trace_loaded_objects (ldd).
> Output from ldd will introduce non-deterministic behavior to any piped commands that receive it as input.
> 
> I've confirmed it currently impacts Ubuntu 18.04 LTS (ldd 2.27) and the latest 2.35 (master) release as well.
> 
> _*Bug replication:*_
> 
> Test:
> 
> ldd /usr/sbin/sshd | cut -d' ' -f1,2       { or insert any binary }
> 
> Bug Present Output:
> 
> _Normal (without pipe) :_
> 
>      linux-vdso.so.1 (0x00007ffc99f9c000)
>      libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007fe195c57000)
>      ...
>      /lib64/ld-linux-x86-64.so.2 (0x00007fe19612b000)
>      ...
> 
> _Test :
> _
>      linux-vdso.so.1 (0x00007fff357cb000)
>      libwrap.so.0 =>
>      ...
>      /lib64/ld-linux-x86-64.so.2 (0x00007f557d923000)
>      ...
> 
> The issue is with how the output for in-memory structures for linux-vdso.so.1 and ld-linux-x86-64.so.2 are handled.
> They are both in-memory structure listings, and when fields are empty it appears to flatten output of different fields to the same field/column.
> 
> _Observations/Test Symptoms_*:* The flattening of the output for these different listings causes 'cut' to incorrectly process the field/columns which are empty depending, in a manner that depends on the listing type, adjusting the output with grep fails as well, but in other ways.
> 
> Fundamentally, the resulting flattened output fails the 1:1 state requirement/property needed by discrete automata/DTI systems to function deterministically.
> Depending on how the empty fields get flattened there are multiple different next-state edges on the resulting graph (a NFA) which then gets passed to the pipe.

Although I do not characterize this as a bug, since it represents the
ELF objects are already being loaded by the kernel, it is already 
done since d7703d3176d225d5743b21811d888619eba39e82 (to be included
in 2.36):

$ LD_TRACE_LOADED_OBJECTS=1 ./elf/ld-linux-x86-64.so.2 /bin/true
        linux-vdso.so.1 => linux-vdso.so.1 (0x00007fff4c1d1000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e75469000)
        /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f7e756ba000)

Using LD_TRACE_LOADED_OBJECTS=2 also prints the binary itself:

$ LD_TRACE_LOADED_OBJECTS=2 ./elf/ld-linux-x86-64.so.2 /bin/true
        /bin/true => /bin/true (0x00007f0aebc1c000)
        linux-vdso.so.1 => linux-vdso.so.1 (0x00007ffc0f9b3000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0aeb9d3000)
        /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f0aebc24000)

And now that you brought it, I wonder if this would case some disruption.
I think we might need to filter this out to keep the current lld behavior,
I am not sure.


> 
> _*Workaround:*_
> 
> To run piped commands deterministically from ldd, any output needs to be non-trivially pre-processed to temporarily remove in-memory structures from the output, process in-memory structures to correct the flattening issue, and then add the listings back in while preserving the original sequence order.
> 
> _*Potential Solutions (RFC):*_
> 
> The core of the challenge is multiple implicit states after output has been flattened on the pipe.
> 
> A consistent output by using a placeholder for empty values would be a simple solution, but could it potentially be too simple?
> 
> For my use, I ended up changing the format of output to something along the lines as below though I'm unsure how that change would impact other potential use-cases.
> My case is semi-niche since I was using it to extract dynamic dependency information to be used as metadata in an automated compilation/packaging pipeline I've been working on.
> 
> Proposed Output Format (| regex or):
> 
> so-basename => memory|abspath (mmap address)
> 
> ex:
> linux-vdso.so.1 => heap (address)
> 
> ld-linux-x86-64.so.2 => abspath (mmap address)
> 
> 
> Option 2) <RFC...>
> 
> 
> _*Additional Notes:*_
> 
> I've since had conversations with a few people about this challenge, and have been told this bug may not be present in lddtree (pax-utils).
> I'll be looking into this later, if my solution ends up needing revision.
> 
> I attempted to report this issue to the official sourceware bug tracker but new account creation has been disabled.
> Since this hopefully will be a one-off report, please see the request below.
> 
> The issue appears to have been present since at least 2.27, a more in-depth look will likely be needed to determine if this is a regression (for unit-test development/TDD).
> 
> _*History:*_
> I originally posted this issue on Ubuntu's Launchpad about 3 years ago (2018/2019); that posting appears to have since vanished, and it doesn't appear they ever reached out to the glibc project to have this addressed upstream. While trying to get this second report in to upstream, I initially mistakenly thought this was part of the gnu coreutils project, thankfully they were helpful and understanding in helping me get this to the right project.
> 
> I haven't reopened a new report with Ubuntu because Canonical's bug tracker is also effectively down, blocking new reports from being submitted.

Unfortunately Ubuntu Launchpad is not the best tool to keep track of
this.

> 
> _*Request*__:_
> I'd appreciate it if someone on the glibc mailing list can populate a bug report on the tracker for this issue with the information I've provided.
> 
> I've spent more time on this than I originally wanted to both in documentation and a lot more getting this report submitted for correction to the right people.
> 
> I have solved it for myself by writing a python helper that I place in-line on the pipe. Output differs enough between the three listing cases that a program can correct the flattening (at least in my case).
> 
> The proposed solution seems like a relatively straight-forward fix so hopefully others won't need to run down this rabbit-hole again.
> 
> Best Regards,
> N
> 
> --@Paul, bcc'ed to keep you in the loop regarding how this all turned out. No further action or follow-up is needed. Thank you again for your assistance with this. Hopefully the report will make it onto the glibc project's bug tracker for correction this time.
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands
  2022-07-11 18:21 ` Adhemerval Zanella
@ 2022-07-11 18:41   ` Florian Weimer
  2022-07-12 10:40     ` Adhemerval Zanella Netto
  0 siblings, 1 reply; 4+ messages in thread
From: Florian Weimer @ 2022-07-11 18:41 UTC (permalink / raw)
  To: Adhemerval Zanella via Libc-help; +Cc: N, Adhemerval Zanella

* Adhemerval Zanella via Libc-help:

> Although I do not characterize this as a bug, since it represents the
> ELF objects are already being loaded by the kernel, it is already 
> done since d7703d3176d225d5743b21811d888619eba39e82 (to be included
> in 2.36):
>
> $ LD_TRACE_LOADED_OBJECTS=1 ./elf/ld-linux-x86-64.so.2 /bin/true
>         linux-vdso.so.1 => linux-vdso.so.1 (0x00007fff4c1d1000)
>         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e75469000)
>         /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f7e756ba000)
>
> Using LD_TRACE_LOADED_OBJECTS=2 also prints the binary itself:
>
> $ LD_TRACE_LOADED_OBJECTS=2 ./elf/ld-linux-x86-64.so.2 /bin/true
>         /bin/true => /bin/true (0x00007f0aebc1c000)
>         linux-vdso.so.1 => linux-vdso.so.1 (0x00007ffc0f9b3000)
>         libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0aeb9d3000)
>         /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f0aebc24000)
>
> And now that you brought it, I wonder if this would case some disruption.
> I think we might need to filter this out to keep the current lld behavior,
> I am not sure.

Should we always print the soname on the LHS (except maybe when printing
the main executable)?  /lib64/ld-linux-x86-64.so.2 is a bit of an
outlier because it's not the soname, but its default installation path.

Thanks,
Florian


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands
  2022-07-11 18:41   ` Florian Weimer
@ 2022-07-12 10:40     ` Adhemerval Zanella Netto
  0 siblings, 0 replies; 4+ messages in thread
From: Adhemerval Zanella Netto @ 2022-07-12 10:40 UTC (permalink / raw)
  To: Florian Weimer, Adhemerval Zanella via Libc-help; +Cc: N



On 11/07/22 15:41, Florian Weimer wrote:
> * Adhemerval Zanella via Libc-help:
> 
>> Although I do not characterize this as a bug, since it represents the
>> ELF objects are already being loaded by the kernel, it is already
>> done since d7703d3176d225d5743b21811d888619eba39e82 (to be included
>> in 2.36):
>>
>> $ LD_TRACE_LOADED_OBJECTS=1 ./elf/ld-linux-x86-64.so.2 /bin/true
>>          linux-vdso.so.1 => linux-vdso.so.1 (0x00007fff4c1d1000)
>>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e75469000)
>>          /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f7e756ba000)
>>
>> Using LD_TRACE_LOADED_OBJECTS=2 also prints the binary itself:
>>
>> $ LD_TRACE_LOADED_OBJECTS=2 ./elf/ld-linux-x86-64.so.2 /bin/true
>>          /bin/true => /bin/true (0x00007f0aebc1c000)
>>          linux-vdso.so.1 => linux-vdso.so.1 (0x00007ffc0f9b3000)
>>          libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0aeb9d3000)
>>          /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f0aebc24000)
>>
>> And now that you brought it, I wonder if this would case some disruption.
>> I think we might need to filter this out to keep the current lld behavior,
>> I am not sure.
> 
> Should we always print the soname on the LHS (except maybe when printing
> the main executable)?  /lib64/ld-linux-x86-64.so.2 is a bit of an
> outlier because it's not the soname, but its default installation path.

The LHS for loader does make sense if the loader is issued manually
(as per testrun.sh for instance), although it is not usual.  What
does not make much sense is printing the vDSO path, but since the
idea is keep all the entries in same format I don't see it as a deal
breaker.

I am more worried about the possible breakage of LD_TRACE_LOADED_OBJECTS
consumers.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-07-12 10:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-09  1:28 Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands N
2022-07-11 18:21 ` Adhemerval Zanella
2022-07-11 18:41   ` Florian Weimer
2022-07-12 10:40     ` Adhemerval Zanella Netto

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).