public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella <adhemerval.zanella@linaro.org>
To: N <dundir@gmail.com>
Cc: libc-help@sourceware.org
Subject: Re: Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands
Date: Mon, 11 Jul 2022 15:21:29 -0300	[thread overview]
Message-ID: <544CCC5B-009C-4E74-AB78-67CC28674C56@linaro.org> (raw)
In-Reply-To: <0672659a-10d8-99b5-5f76-cdd28a282eb8@gmail.com>



> On 8 Jul 2022, at 22:28, N via Libc-help <libc-help@sourceware.org> wrote:
> 
> Good Afternoon,
> 
> I've found a bug in the dynamic loader, ld_trace_loaded_objects (ldd).
> Output from ldd will introduce non-deterministic behavior to any piped commands that receive it as input.
> 
> I've confirmed it currently impacts Ubuntu 18.04 LTS (ldd 2.27) and the latest 2.35 (master) release as well.
> 
> _*Bug replication:*_
> 
> Test:
> 
> ldd /usr/sbin/sshd | cut -d' ' -f1,2       { or insert any binary }
> 
> Bug Present Output:
> 
> _Normal (without pipe) :_
> 
>      linux-vdso.so.1 (0x00007ffc99f9c000)
>      libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007fe195c57000)
>      ...
>      /lib64/ld-linux-x86-64.so.2 (0x00007fe19612b000)
>      ...
> 
> _Test :
> _
>      linux-vdso.so.1 (0x00007fff357cb000)
>      libwrap.so.0 =>
>      ...
>      /lib64/ld-linux-x86-64.so.2 (0x00007f557d923000)
>      ...
> 
> The issue is with how the output for in-memory structures for linux-vdso.so.1 and ld-linux-x86-64.so.2 are handled.
> They are both in-memory structure listings, and when fields are empty it appears to flatten output of different fields to the same field/column.
> 
> _Observations/Test Symptoms_*:* The flattening of the output for these different listings causes 'cut' to incorrectly process the field/columns which are empty depending, in a manner that depends on the listing type, adjusting the output with grep fails as well, but in other ways.
> 
> Fundamentally, the resulting flattened output fails the 1:1 state requirement/property needed by discrete automata/DTI systems to function deterministically.
> Depending on how the empty fields get flattened there are multiple different next-state edges on the resulting graph (a NFA) which then gets passed to the pipe.

Although I do not characterize this as a bug, since it represents the
ELF objects are already being loaded by the kernel, it is already 
done since d7703d3176d225d5743b21811d888619eba39e82 (to be included
in 2.36):

$ LD_TRACE_LOADED_OBJECTS=1 ./elf/ld-linux-x86-64.so.2 /bin/true
        linux-vdso.so.1 => linux-vdso.so.1 (0x00007fff4c1d1000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f7e75469000)
        /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f7e756ba000)

Using LD_TRACE_LOADED_OBJECTS=2 also prints the binary itself:

$ LD_TRACE_LOADED_OBJECTS=2 ./elf/ld-linux-x86-64.so.2 /bin/true
        /bin/true => /bin/true (0x00007f0aebc1c000)
        linux-vdso.so.1 => linux-vdso.so.1 (0x00007ffc0f9b3000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f0aeb9d3000)
        /lib64/ld-linux-x86-64.so.2 => ./elf/ld-linux-x86-64.so.2 (0x00007f0aebc24000)

And now that you brought it, I wonder if this would case some disruption.
I think we might need to filter this out to keep the current lld behavior,
I am not sure.


> 
> _*Workaround:*_
> 
> To run piped commands deterministically from ldd, any output needs to be non-trivially pre-processed to temporarily remove in-memory structures from the output, process in-memory structures to correct the flattening issue, and then add the listings back in while preserving the original sequence order.
> 
> _*Potential Solutions (RFC):*_
> 
> The core of the challenge is multiple implicit states after output has been flattened on the pipe.
> 
> A consistent output by using a placeholder for empty values would be a simple solution, but could it potentially be too simple?
> 
> For my use, I ended up changing the format of output to something along the lines as below though I'm unsure how that change would impact other potential use-cases.
> My case is semi-niche since I was using it to extract dynamic dependency information to be used as metadata in an automated compilation/packaging pipeline I've been working on.
> 
> Proposed Output Format (| regex or):
> 
> so-basename => memory|abspath (mmap address)
> 
> ex:
> linux-vdso.so.1 => heap (address)
> 
> ld-linux-x86-64.so.2 => abspath (mmap address)
> 
> 
> Option 2) <RFC...>
> 
> 
> _*Additional Notes:*_
> 
> I've since had conversations with a few people about this challenge, and have been told this bug may not be present in lddtree (pax-utils).
> I'll be looking into this later, if my solution ends up needing revision.
> 
> I attempted to report this issue to the official sourceware bug tracker but new account creation has been disabled.
> Since this hopefully will be a one-off report, please see the request below.
> 
> The issue appears to have been present since at least 2.27, a more in-depth look will likely be needed to determine if this is a regression (for unit-test development/TDD).
> 
> _*History:*_
> I originally posted this issue on Ubuntu's Launchpad about 3 years ago (2018/2019); that posting appears to have since vanished, and it doesn't appear they ever reached out to the glibc project to have this addressed upstream. While trying to get this second report in to upstream, I initially mistakenly thought this was part of the gnu coreutils project, thankfully they were helpful and understanding in helping me get this to the right project.
> 
> I haven't reopened a new report with Ubuntu because Canonical's bug tracker is also effectively down, blocking new reports from being submitted.

Unfortunately Ubuntu Launchpad is not the best tool to keep track of
this.

> 
> _*Request*__:_
> I'd appreciate it if someone on the glibc mailing list can populate a bug report on the tracker for this issue with the information I've provided.
> 
> I've spent more time on this than I originally wanted to both in documentation and a lot more getting this report submitted for correction to the right people.
> 
> I have solved it for myself by writing a python helper that I place in-line on the pipe. Output differs enough between the three listing cases that a program can correct the flattening (at least in my case).
> 
> The proposed solution seems like a relatively straight-forward fix so hopefully others won't need to run down this rabbit-hole again.
> 
> Best Regards,
> N
> 
> --@Paul, bcc'ed to keep you in the loop regarding how this all turned out. No further action or follow-up is needed. Thank you again for your assistance with this. Hopefully the report will make it onto the glibc project's bug tracker for correction this time.
> 


  reply	other threads:[~2022-07-11 18:21 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-09  1:28 N
2022-07-11 18:21 ` Adhemerval Zanella [this message]
2022-07-11 18:41   ` Florian Weimer
2022-07-12 10:40     ` Adhemerval Zanella Netto

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=544CCC5B-009C-4E74-AB78-67CC28674C56@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=dundir@gmail.com \
    --cc=libc-help@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).