From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oa1-x35.google.com (mail-oa1-x35.google.com [IPv6:2001:4860:4864:20::35]) by sourceware.org (Postfix) with ESMTPS id 4AA60385842F for ; Mon, 11 Jul 2022 18:21:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 4AA60385842F Received: by mail-oa1-x35.google.com with SMTP id 586e51a60fabf-10be0d7476aso7694045fac.2 for ; Mon, 11 Jul 2022 11:21:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=L+yML+cUk2tdvfCj+CULmTMBeEhpvfv7AfrpR3G30+A=; b=m4iL2spKgANPDasmzfu2jP/FHS6qAVJQJqOY8g+vmWfhuI+BpVwnZ37EhrvkHj5ghq tfQshcuVq1wiofqcdql+RJVo/GO/z7NLccreCT3Y2ZVcURxsT7nFatG8AbzvyqTEb143 IeHA0OE4jradgYA+ZVh4BUd83V/Id4kKS/NhMbDAwORPtN4XjVPSiRIppB7iirnYn6n6 HdNOOIewHvW7gne9RJTXaUTRljHZZFvL9WjYbB7tMJLJWAsnoBBjMsWtBkVSAEKVkkIr XoDunZaUxIhtD3px/XWcwzOrson1h1tyMFvAJwEdjzE/VU7f/RF75cSogeZnTWh1EZri qYdQ== X-Gm-Message-State: AJIora/aSztEGAWMdrV1Lx7Biw03sSUdit3U+WYSiql3JSBnKVgG979N qrk42XagCs/gl29VgkbPsh7/ZQ== X-Google-Smtp-Source: AGRyM1umifddWE0JYHF1AjLfhm6ONbDq4KtJdh/eWV6KO/5Hj+xkJ8bMgC6C0JS4ZUv2m9vRcyzeFw== X-Received: by 2002:a05:6870:ac20:b0:10c:1fb5:fba4 with SMTP id kw32-20020a056870ac2000b0010c1fb5fba4mr7967930oab.98.1657563696294; Mon, 11 Jul 2022 11:21:36 -0700 (PDT) Received: from smtpclient.apple ([2804:431:c7cb:5bec:a591:bf2:d8b8:7a84]) by smtp.gmail.com with ESMTPSA id q16-20020a9d7c90000000b0061c35e7aceesm2922779otn.42.2022.07.11.11.21.35 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 11 Jul 2022 11:21:36 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.100.31\)) Subject: Re: Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands From: Adhemerval Zanella In-Reply-To: <0672659a-10d8-99b5-5f76-cdd28a282eb8@gmail.com> Date: Mon, 11 Jul 2022 15:21:29 -0300 Cc: libc-help@sourceware.org Content-Transfer-Encoding: quoted-printable Message-Id: <544CCC5B-009C-4E74-AB78-67CC28674C56@linaro.org> References: <0672659a-10d8-99b5-5f76-cdd28a282eb8@gmail.com> To: N X-Mailer: Apple Mail (2.3696.100.31) X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 11 Jul 2022 18:21:47 -0000 > On 8 Jul 2022, at 22:28, N via Libc-help = wrote: >=20 > Good Afternoon, >=20 > I've found a bug in the dynamic loader, ld_trace_loaded_objects (ldd). > Output from ldd will introduce non-deterministic behavior to any piped = commands that receive it as input. >=20 > I've confirmed it currently impacts Ubuntu 18.04 LTS (ldd 2.27) and = the latest 2.35 (master) release as well. >=20 > _*Bug replication:*_ >=20 > Test: >=20 > ldd /usr/sbin/sshd | cut -d' ' -f1,2 { or insert any binary } >=20 > Bug Present Output: >=20 > _Normal (without pipe) :_ >=20 > linux-vdso.so.1 (0x00007ffc99f9c000) > libwrap.so.0 =3D> /lib/x86_64-linux-gnu/libwrap.so.0 = (0x00007fe195c57000) > ... > /lib64/ld-linux-x86-64.so.2 (0x00007fe19612b000) > ... >=20 > _Test : > _ > linux-vdso.so.1 (0x00007fff357cb000) > libwrap.so.0 =3D> > ... > /lib64/ld-linux-x86-64.so.2 (0x00007f557d923000) > ... >=20 > The issue is with how the output for in-memory structures for = linux-vdso.so.1 and ld-linux-x86-64.so.2 are handled. > They are both in-memory structure listings, and when fields are empty = it appears to flatten output of different fields to the same = field/column. >=20 > _Observations/Test Symptoms_*:* The flattening of the output for these = different listings causes 'cut' to incorrectly process the field/columns = which are empty depending, in a manner that depends on the listing type, = adjusting the output with grep fails as well, but in other ways. >=20 > Fundamentally, the resulting flattened output fails the 1:1 state = requirement/property needed by discrete automata/DTI systems to function = deterministically. > Depending on how the empty fields get flattened there are multiple = different next-state edges on the resulting graph (a NFA) which then = gets passed to the pipe. Although I do not characterize this as a bug, since it represents the ELF objects are already being loaded by the kernel, it is already=20 done since d7703d3176d225d5743b21811d888619eba39e82 (to be included in 2.36): $ LD_TRACE_LOADED_OBJECTS=3D1 ./elf/ld-linux-x86-64.so.2 /bin/true linux-vdso.so.1 =3D> linux-vdso.so.1 (0x00007fff4c1d1000) libc.so.6 =3D> /lib/x86_64-linux-gnu/libc.so.6 = (0x00007f7e75469000) /lib64/ld-linux-x86-64.so.2 =3D> ./elf/ld-linux-x86-64.so.2 = (0x00007f7e756ba000) Using LD_TRACE_LOADED_OBJECTS=3D2 also prints the binary itself: $ LD_TRACE_LOADED_OBJECTS=3D2 ./elf/ld-linux-x86-64.so.2 /bin/true /bin/true =3D> /bin/true (0x00007f0aebc1c000) linux-vdso.so.1 =3D> linux-vdso.so.1 (0x00007ffc0f9b3000) libc.so.6 =3D> /lib/x86_64-linux-gnu/libc.so.6 = (0x00007f0aeb9d3000) /lib64/ld-linux-x86-64.so.2 =3D> ./elf/ld-linux-x86-64.so.2 = (0x00007f0aebc24000) And now that you brought it, I wonder if this would case some = disruption. I think we might need to filter this out to keep the current lld = behavior, I am not sure. >=20 > _*Workaround:*_ >=20 > To run piped commands deterministically from ldd, any output needs to = be non-trivially pre-processed to temporarily remove in-memory = structures from the output, process in-memory structures to correct the = flattening issue, and then add the listings back in while preserving the = original sequence order. >=20 > _*Potential Solutions (RFC):*_ >=20 > The core of the challenge is multiple implicit states after output has = been flattened on the pipe. >=20 > A consistent output by using a placeholder for empty values would be a = simple solution, but could it potentially be too simple? >=20 > For my use, I ended up changing the format of output to something = along the lines as below though I'm unsure how that change would impact = other potential use-cases. > My case is semi-niche since I was using it to extract dynamic = dependency information to be used as metadata in an automated = compilation/packaging pipeline I've been working on. >=20 > Proposed Output Format (| regex or): >=20 > so-basename =3D> memory|abspath (mmap address) >=20 > ex: > linux-vdso.so.1 =3D> heap (address) >=20 > ld-linux-x86-64.so.2 =3D> abspath (mmap address) >=20 >=20 > Option 2) >=20 >=20 > _*Additional Notes:*_ >=20 > I've since had conversations with a few people about this challenge, = and have been told this bug may not be present in lddtree (pax-utils). > I'll be looking into this later, if my solution ends up needing = revision. >=20 > I attempted to report this issue to the official sourceware bug = tracker but new account creation has been disabled. > Since this hopefully will be a one-off report, please see the request = below. >=20 > The issue appears to have been present since at least 2.27, a more = in-depth look will likely be needed to determine if this is a regression = (for unit-test development/TDD). >=20 > _*History:*_ > I originally posted this issue on Ubuntu's Launchpad about 3 years ago = (2018/2019); that posting appears to have since vanished, and it doesn't = appear they ever reached out to the glibc project to have this addressed = upstream. While trying to get this second report in to upstream, I = initially mistakenly thought this was part of the gnu coreutils project, = thankfully they were helpful and understanding in helping me get this to = the right project. >=20 > I haven't reopened a new report with Ubuntu because Canonical's bug = tracker is also effectively down, blocking new reports from being = submitted. Unfortunately Ubuntu Launchpad is not the best tool to keep track of this. >=20 > _*Request*__:_ > I'd appreciate it if someone on the glibc mailing list can populate a = bug report on the tracker for this issue with the information I've = provided. >=20 > I've spent more time on this than I originally wanted to both in = documentation and a lot more getting this report submitted for = correction to the right people. >=20 > I have solved it for myself by writing a python helper that I place = in-line on the pipe. Output differs enough between the three listing = cases that a program can correct the flattening (at least in my case). >=20 > The proposed solution seems like a relatively straight-forward fix so = hopefully others won't need to run down this rabbit-hole again. >=20 > Best Regards, > N >=20 > --@Paul, bcc'ed to keep you in the loop regarding how this all turned = out. No further action or follow-up is needed. Thank you again for your = assistance with this. Hopefully the report will make it onto the glibc = project's bug tracker for correction this time. >=20