From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pg1-x52e.google.com (mail-pg1-x52e.google.com [IPv6:2607:f8b0:4864:20::52e]) by sourceware.org (Postfix) with ESMTPS id C99813857352 for ; Sat, 9 Jul 2022 01:28:40 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org C99813857352 Received: by mail-pg1-x52e.google.com with SMTP id s27so314699pga.13 for ; Fri, 08 Jul 2022 18:28:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:from:subject; bh=VFZQ0DvzOHenQxHTVUTNb6cV/uj/xL3foqX9AoD8Vc4=; b=xugeCG270NeDOyL1eyGR0C5GEKuEtxW/3d/zxH0OOlMPYq0HZP8ay+THqv6mCLhQoo JrPYgI/UnaT1QPokt3FuQeUqapNGWddE1VleBoBrqh4N9MlE/05DEw41FkXtR3IH9bq9 SDbXxzT9cTUmfZV1LZNGtmzw9Zl6jxyjjZpWijVH7+NL2aLJdbKNSbb5Eaf2Em5AF0iM dg+1LHC3SHVnwXkKPc+iWtze6PHLcdB1BSEL9Whtg6V3/m8g0QqQB9Ch/rkuQE+jk+8S SDpam17H0CxehUdTKZXQ/57l0KCJfklFINQz1AIhi3rMSJO1HKqlVrBG4DUEyWQBh9r+ 4fJA== X-Gm-Message-State: AJIora8nNyC9ZX2vN4h1CxIm/oBGgZiDgfnh7ImTFKkYi15eRXy7Fpt3 iMwxqlkWS/AUSxmajOUn4piOV9hB6rE= X-Google-Smtp-Source: AGRyM1vM36TRtnWBHiFKEE43wNNGrlFSzDD4fzuySX/KHCQ9H8v50Gqtfk7Td/wOx2tCPo3riw+z1Q== X-Received: by 2002:a05:6a00:b8b:b0:51c:2487:57b5 with SMTP id g11-20020a056a000b8b00b0051c248757b5mr6647766pfj.56.1657330119436; Fri, 08 Jul 2022 18:28:39 -0700 (PDT) Received: from ?IPV6:2600:8801:8000:d3e6:b01f:e5b5:5a1a:3680? ([2600:8801:8000:d3e6:b01f:e5b5:5a1a:3680]) by smtp.gmail.com with ESMTPSA id c5-20020a170902d48500b001638a171558sm154478plg.202.2022.07.08.18.28.38 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 08 Jul 2022 18:28:38 -0700 (PDT) Message-ID: <0672659a-10d8-99b5-5f76-cdd28a282eb8@gmail.com> Date: Fri, 8 Jul 2022 18:28:37 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Content-Language: en-US To: libc-help@sourceware.org From: N Subject: Bug Report: ldd introduces non-deterministic behavior in subsequent piped commands X-Spam-Status: No, score=-0.0 required=5.0 tests=BAYES_00, BODY_8BITS, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, FREEMAIL_FROM, HTML_MESSAGE, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_PASS, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Content-Filtered-By: Mailman/MimeDel 2.1.29 X-BeenThere: libc-help@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-help mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 09 Jul 2022 01:28:43 -0000 Good Afternoon, I've found a bug in the dynamic loader, ld_trace_loaded_objects (ldd). Output from ldd will introduce non-deterministic behavior to any piped commands that receive it as input. I've confirmed it currently impacts Ubuntu 18.04 LTS (ldd 2.27) and the latest 2.35 (master) release as well. _*Bug replication:*_ Test: ldd /usr/sbin/sshd | cut -d' ' -f1,2       { or insert any binary } Bug Present Output: _Normal (without pipe) :_      linux-vdso.so.1 (0x00007ffc99f9c000)      libwrap.so.0 => /lib/x86_64-linux-gnu/libwrap.so.0 (0x00007fe195c57000)      ...      /lib64/ld-linux-x86-64.so.2 (0x00007fe19612b000)      ... _Test : _      linux-vdso.so.1 (0x00007fff357cb000)      libwrap.so.0 =>      ...      /lib64/ld-linux-x86-64.so.2 (0x00007f557d923000)      ... The issue is with how the output for in-memory structures for linux-vdso.so.1 and ld-linux-x86-64.so.2 are handled. They are both in-memory structure listings, and when fields are empty it appears to flatten output of different fields to the same field/column. _Observations/Test Symptoms_*:* The flattening of the output for these different listings causes 'cut' to incorrectly process the field/columns which are empty depending, in a manner that depends on the listing type, adjusting the output with grep fails as well, but in other ways. Fundamentally, the resulting flattened output fails the 1:1 state requirement/property needed by discrete automata/DTI systems to function deterministically. Depending on how the empty fields get flattened there are multiple different next-state edges on the resulting graph (a NFA) which then gets passed to the pipe. _*Workaround:*_ To run piped commands deterministically from ldd, any output needs to be non-trivially pre-processed to temporarily remove in-memory structures from the output, process in-memory structures to correct the flattening issue, and then add the listings back in while preserving the original sequence order. _*Potential Solutions (RFC):*_ The core of the challenge is multiple implicit states after output has been flattened on the pipe. A consistent output by using a placeholder for empty values would be a simple solution, but could it potentially be too simple? For my use, I ended up changing the format of output to something along the lines as below though I'm unsure how that change would impact other potential use-cases. My case is semi-niche since I was using it to extract dynamic dependency information to be used as metadata in an automated compilation/packaging pipeline I've been working on. Proposed Output Format (| regex or): so-basename => memory|abspath (mmap address) ex: linux-vdso.so.1 => heap (address) ld-linux-x86-64.so.2 => abspath (mmap address) Option 2) _*Additional Notes:*_ I've since had conversations with a few people about this challenge, and have been told this bug may not be present in lddtree (pax-utils). I'll be looking into this later, if my solution ends up needing revision. I attempted to report this issue to the official sourceware bug tracker but new account creation has been disabled. Since this hopefully will be a one-off report, please see the request below. The issue appears to have been present since at least 2.27, a more in-depth look will likely be needed to determine if this is a regression (for unit-test development/TDD). _*History:*_ I originally posted this issue on Ubuntu's Launchpad about 3 years ago (2018/2019); that posting appears to have since vanished, and it doesn't appear they ever reached out to the glibc project to have this addressed upstream. While trying to get this second report in to upstream, I initially mistakenly thought this was part of the gnu coreutils project, thankfully they were helpful and understanding in helping me get this to the right project. I haven't reopened a new report with Ubuntu because Canonical's bug tracker is also effectively down, blocking new reports from being submitted. _*Request*__:_ I'd appreciate it if someone on the glibc mailing list can populate a bug report on the tracker for this issue with the information I've provided. I've spent more time on this than I originally wanted to both in documentation and a lot more getting this report submitted for correction to the right people. I have solved it for myself by writing a python helper that I place in-line on the pipe. Output differs enough between the three listing cases that a program can correct the flattening (at least in my case). The proposed solution seems like a relatively straight-forward fix so hopefully others won't need to run down this rabbit-hole again. Best Regards, N --@Paul, bcc'ed to keep you in the loop regarding how this all turned out. No further action or follow-up is needed. Thank you again for your assistance with this. Hopefully the report will make it onto the glibc project's bug tracker for correction this time.